Paperid: 1, https://arxiv.org/pdf/2506.22803.pdf   GitHub
Authors:Nuoye Xiong, Anqi Dong, Ning Wang, Cong Hua, Guangming Zhu, Lin Mei, Peiyi Shen, Liang Zhang
Title: Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding
Abstract:
Recent advances in deep learning have led to increasingly complex models with deeper layers and more parameters, reducing interpretability and making their decisions harder to understand. While many methods explain black-box reasoning, most lack effective interventions or only operate at sample-level without modifying the model itself. To address this, we propose the Concept Bottleneck Model for Enhancing Human-Neural Network Mutual Understanding (CBM-HNMU). CBM-HNMU leverages the Concept Bottleneck Model (CBM) as an interpretable framework to approximate black-box reasoning and communicate conceptual understanding. Detrimental concepts are automatically identified and refined (removed/replaced) based on global gradient contributions. The modified CBM then distills corrected knowledge back into the black-box model, enhancing both interpretability and accuracy. We evaluate CBM-HNMU on various CNN and transformer-based models across Flower-102, CIFAR-10, CIFAR-100, FGVC-Aircraft, and CUB-200, achieving a maximum accuracy improvement of 2.64% and a maximum increase in average accuracy across 1.03%. Source code is available at: https://github.com/XiGuaBo/CBM-HNMU.
Chinese: 本文提出了增强人机互理解的概念瓶颈模型(CBM-HNMU),通过自动识别并优化有害概念来提升模型可解释性与准确率,在多个数据集上最高实现了2.64%的精度提升。
English: This paper introduces the Concept Bottleneck Model for Enhancing Human-Neural Network Mutual Understanding (CBM-HNMU), which automatically identifies and refines detrimental concepts to improve both model interpretability and accuracy, achieving up to a 2.64% accuracy gain across multiple datasets.

Authors:Boyuan Sun, Jiaxing Zhao, Xihan Wei, Qibin Hou
Title: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
Abstract:
In this paper, we present LLaVA-Scissor, a training-free token compression strategy designed for video multimodal large language models. Previous methods mostly attempt to compress tokens based on attention scores, but fail to effectively capture all semantic regions and often lead to token redundancy. Differently, we propose to leverage the Semantic Connected Components (SCC) approach that assigns tokens to distinct semantic regions within the token set, ensuring comprehensive semantic coverage. The outcome is a two-step spatio-temporal token compression strategy that utilizes SCC in both spatial and temporal domains. This strategy can effectively compress tokens by representing the entire video with a set of non-overlapping semantic tokens. We conduct extensive evaluations of the token compression capabilities of LLaVA-Scissor across diverse video understanding benchmarks, including video question answering, long video understanding, and comprehensive multi-choices benchmarks. Experimental results show that the proposed LLaVA-Scissor outperforms other token compression methods, achieving superior performance in various video understanding benchmarks, particularly at low token retention ratios. Project page: https://github.com/HumanMLLM/LLaVA-Scissor.
Chinese: 本文提出LLaVA-Scissor,一种无需训练的视频多模态令牌压缩方法,通过语义连通组件实现全面语义覆盖,在多种视频理解基准测试中表现出优越性能。
English: This paper introduces LLaVA-Scissor, a training-free token compression method for video multimodal models that uses Semantic Connected Components to achieve comprehensive semantic coverage and superior performance across video understanding benchmarks.

Authors:Varun Mannam, Fang Wang, Xin Chen
Title: Evaluating VisualRAG: Quantifying Cross-Modal Performance in Enterprise Document Understanding
Abstract:
Current evaluation frameworks for multimodal generative AI struggle to establish trustworthiness, hindering enterprise adoption where reliability is paramount. We introduce a systematic, quantitative benchmarking framework to measure the trustworthiness of progressively integrating cross-modal inputs such as text, images, captions, and OCR within VisualRAG systems for enterprise document intelligence. Our approach establishes quantitative relationships between technical metrics and user-centric trust measures. Evaluation reveals that optimal modality weighting with weights of 30% text, 15% image, 25% caption, and 30% OCR improves performance by 57.3% over text-only baselines while maintaining computational efficiency. We provide comparative assessments of foundation models, demonstrating their differential impact on trustworthiness in caption generation and OCR extraction-a vital consideration for reliable enterprise AI. This work advances responsible AI deployment by providing a rigorous framework for quantifying and enhancing trustworthiness in multimodal RAG for critical enterprise applications.

Authors:Tin Dizdarević, Ravi Hammond, Tobias Gessler, Anisoara Calinescu, Jonathan Cook, Matteo Gallici, Andrei Lupu, Darius Muglich, Johannes Forkel, Jakob Nicolaus Foerster
Title: Ad-Hoc Human-AI Coordination Challenge
Abstract:
Achieving seamless coordination between AI agents and humans is crucial for real-world applications, yet it remains a significant open challenge. Hanabi is a cooperative card game featuring imperfect information, constrained communication, theory of mind requirements, and coordinated action -- making it an ideal testbed for human-AI coordination. However, its use for human-AI interaction has been limited by the challenges of human evaluation. In this work, we introduce the Ad-Hoc Human-AI Coordination Challenge (AH2AC2) to overcome the constraints of costly and difficult-to-reproduce human evaluations. We develop \textit{human proxy agents} on a large-scale human dataset that serve as robust, cheap, and reproducible human-like evaluation partners in AH2AC2. To encourage the development of data-efficient methods, we open-source a dataset of 3,079 games, deliberately limiting the amount of available human gameplay data. We present baseline results for both two- and three- player Hanabi scenarios. To ensure fair evaluation, we host the proxy agents through a controlled evaluation system rather than releasing them publicly. The code is available at \href{https://github.com/FLAIROx/ah2ac2}{https://github.com/FLAIROx/ah2ac2}.
中文: 本文提出Ad-Hoc Human-AI协调挑战(AH2AC2),通过开发人类代理智能体并提供有限数据集,以解决《花火》游戏中人类评估的局限性,促进数据高效的人机协调方法发展。
English: This paper introduces the Ad-Hoc Human-AI Coordination Challenge (AH2AC2) to address the limitations of human evaluations in Hanabi by developing human proxy agents and providing a limited dataset to promote data-efficient methods for human-AI coordination.

Authors:Can Liu, Chunlin Da, Xiaoxiao Long, Yuxiao Yang, Yu Zhang, Yong Wang
Title: SimVecVis: A Dataset for Enhancing MLLMs in Visualization Understanding
Abstract:
Current multimodal large language models (MLLMs), while effective in natural image understanding, struggle with visualization understanding due to their inability to decode the data-to-visual mapping and extract structured information. To address these challenges, we propose SimVec, a novel simplified vector format that encodes chart elements such as mark type, position, and size. The effectiveness of SimVec is demonstrated by using MLLMs to reconstruct chart information from SimVec formats. Then, we build a new visualization dataset, SimVecVis, to enhance the performance of MLLMs in visualization understanding, which consists of three key dimensions: bitmap images of charts, their SimVec representations, and corresponding data-centric question-answering (QA) pairs with explanatory chain-of-thought (CoT) descriptions. We finetune state-of-the-art MLLMs (e.g., MiniCPM and Qwen-VL), using SimVecVis with different dataset dimensions. The experimental results show that it leads to substantial performance improvements of MLLMs with good spatial perception capabilities (e.g., MiniCPM) in data-centric QA tasks. Our dataset and source code are available at: https://github.com/VIDA-Lab/SimVecVis.
Chinese: 当前多模态大语言模型在可视化理解方面存在不足,因此我们提出了SimVec这一简化向量格式来编码图表元素,并构建了SimVecVis数据集,显著提升了模型在数据问答任务中的表现。
English: Current multimodal large language models (MLLMs) struggle with visualization understanding, so we propose SimVec, a simplified vector format that encodes chart elements, and build the SimVecVis dataset to significantly enhance MLLMs' performance in data-centric question-answering tasks.

Authors:Jisu Shin, Juhyun Oh, Eunsu Kim, Hoyun Song, Alice Oh
Title: Spotting Out-of-Character Behavior: Atomic-Level Evaluation of Persona Fidelity in Open-Ended Generation
Abstract:
Ensuring persona fidelity in large language models (LLMs) is essential for maintaining coherent and engaging human-AI interactions. However, LLMs often exhibit Out-of-Character (OOC) behavior, where generated responses deviate from an assigned persona, leading to inconsistencies that affect model reliability. Existing evaluation methods typically assign single scores to entire responses, struggling to capture subtle persona misalignment, particularly in long-form text generation. To address this limitation, we propose an atomic-level evaluation framework that quantifies persona fidelity at a finer granularity. Our three key metrics measure the degree of persona alignment and consistency within and across generations. Our approach enables a more precise and realistic assessment of persona fidelity by identifying subtle deviations that real users would encounter. Through our experiments, we demonstrate that our framework effectively detects persona inconsistencies that prior methods overlook. By analyzing persona fidelity across diverse tasks and personality types, we reveal how task structure and persona desirability influence model adaptability, highlighting challenges in maintaining consistent persona expression.
中文: 提出的原子级评估框架以更精细的粒度衡量大语言模型的人物忠实度,有效检测出现有方法忽略的细微不一致性,并揭示任务结构和角色期望如何影响模型的适应性。
English: The proposed atomic-level evaluation framework measures persona fidelity in LLMs with finer granularity, effectively detecting subtle inconsistencies overlooked by existing methods and revealing how task structure and persona desirability impact model adaptability.

Authors:Yitong Zhu, Guanxuan Jiang, Zhuowen Liang, Yuyang Wang
Title: Flow-Aware Diffusion for Real-Time VR Restoration: Enhancing Spatiotemporal Coherence and Efficiency
Abstract:
Cybersickness remains a critical barrier to the widespread adoption of Virtual Reality (VR), particularly in scenarios involving intense or artificial motion cues. Among the key contributors is excessive optical flow-perceived visual motion that, when unmatched by vestibular input, leads to sensory conflict and discomfort. While previous efforts have explored geometric or hardware based mitigation strategies, such methods often rely on predefined scene structures, manual tuning, or intrusive equipment. In this work, we propose U-MAD, a lightweight, real-time, AI-based solution that suppresses perceptually disruptive optical flow directly at the image level. Unlike prior handcrafted approaches, this method learns to attenuate high-intensity motion patterns from rendered frames without requiring mesh-level editing or scene specific adaptation. Designed as a plug and play module, U-MAD integrates seamlessly into existing VR pipelines and generalizes well to procedurally generated environments. The experiments show that U-MAD consistently reduces average optical flow and enhances temporal stability across diverse scenes. A user study further confirms that reducing visual motion leads to improved perceptual comfort and alleviated cybersickness symptoms. These findings demonstrate that perceptually guided modulation of optical flow provides an effective and scalable approach to creating more user-friendly immersive experiences. The code will be released at https://github.com/XXXXX (upon publication).
中文: U-MAD是一种轻量级AI解决方案,通过实时抑制图像中的干扰性光流来减轻VR晕动症,无需场景特定调整即可提升用户舒适度。
English: U-MAD is a lightweight AI solution that reduces cybersickness in VR by suppressing disruptive optical flow in real-time, improving user comfort without requiring scene-specific adjustments.

Authors:Min Yin, Haoyu Liu, Boyi Lian, Chunlei Chai
Title: Co-persona: Leveraging LLMs and Expert Collaboration to Understand User Personas through Social Media Data Analysis
Abstract:
This study introduces Co-Persona, a methodological framework bridging large-scale social media analysis with authentic user understanding through systematic integration of Large Language Models and expert validation. Through a case study of B.Co, a Chinese manufacturer, we investigated Co-Persona application in bedside lamp development. Our methodology analyzed over 38 million posts from Xiao Hongshu, employing multi-stage data processing combining advanced NLP with expert validation. Analysis revealed five user personas derived from bedtime behaviors: Health Aficionados, Night Owls, Interior Decorators, Child-care Workers, and Workaholics-each showing unique pre-sleep activities and product preferences. Findings demonstrate Co-Persona enhances manufacturers' ability to process large datasets while maintaining user understanding. The methodology provides structured approaches for targeted marketing and product strategies. Research contributes to theoretical understanding of data-driven persona development and practical applications in consumer-driven innovation. Code and data available at https://github.com/INFPa/LLMwithPersona.
中文: 本研究提出Co-Persona框架,通过整合大规模社交媒体分析与大语言模型及专家验证,识别出五类床头灯用户画像,为企业提供了数据驱动的产品开发和精准营销策略。
English: This study presents Co-Persona, a framework combining large-scale social media analysis with LLMs and expert validation to identify user personas for targeted product development, as demonstrated through a bedside lamp case study revealing five distinct user types.

Authors:Matthew Ebisu, Hang Yu, Reuben Aronson, Elaine Short
Title: See What I Mean? Expressiveness and Clarity in Robot Display Design
Abstract:
Nonverbal visual symbols and displays play an important role in communication when humans and robots work collaboratively. However, few studies have investigated how different types of non-verbal cues affect objective task performance, especially in a dynamic environment that requires real time decision-making. In this work, we designed a collaborative navigation task where the user and the robot only had partial information about the map on each end and thus the users were forced to communicate with a robot to complete the task. We conducted our study in a public space and recruited 37 participants who randomly passed by our setup. Each participant collaborated with a robot utilizing either animated anthropomorphic eyes and animated icons, or static anthropomorphic eyes and static icons. We found that participants that interacted with a robot with animated displays reported the greatest level of trust and satisfaction; that participants interpreted static icons the best; and that participants with a robot with static eyes had the highest completion success. These results suggest that while animation can foster trust with robots, human-robot communication can be optimized by the addition of familiar static icons that may be easier for users to interpret. We published our code, designed symbols, and collected results online at: https://github.com/mattufts/huamn_Cozmo_interaction.
Chinese: 在人与机器人协作中,动态显示增强信任与满意度,而静态图标提高任务完成度与可理解性,表明结合两者可实现最优沟通效果。
English: Animated displays in human-robot collaboration enhance trust and satisfaction, but static icons improve task success and interpretability, suggesting a balanced approach for optimal communication.

Authors:Myke C. Cohen, Zhe Su, Hsien-Te Kao, Daniel Nguyen, Spencer Lynch, Maarten Sap, Svitlana Volkova
Title: Exploring Big Five Personality and AI Capability Effects in LLM-Simulated Negotiation Dialogues
Abstract:
This paper presents an evaluation framework for agentic AI systems in mission-critical negotiation contexts, addressing the need for AI agents that can adapt to diverse human operators and stakeholders. Using Sotopia as a simulation testbed, we present two experiments that systematically evaluated how personality traits and AI agent characteristics influence LLM-simulated social negotiation outcomes--a capability essential for a variety of applications involving cross-team coordination and civil-military interactions. Experiment 1 employs causal discovery methods to measure how personality traits impact price bargaining negotiations, through which we found that Agreeableness and Extraversion significantly affect believability, goal achievement, and knowledge acquisition outcomes. Sociocognitive lexical measures extracted from team communications detected fine-grained differences in agents' empathic communication, moral foundations, and opinion patterns, providing actionable insights for agentic AI systems that must operate reliably in high-stakes operational scenarios. Experiment 2 evaluates human-AI job negotiations by manipulating both simulated human personality and AI system characteristics, specifically transparency, competence, adaptability, demonstrating how AI agent trustworthiness impact mission effectiveness. These findings establish a repeatable evaluation methodology for experimenting with AI agent reliability across diverse operator personalities and human-agent team dynamics, directly supporting operational requirements for reliable AI systems. Our work advances the evaluation of agentic AI workflows by moving beyond standard performance metrics to incorporate social dynamics essential for mission success in complex operations.

Authors:Hasan Balci, Augustin Luna
Title: User-Guided Force-Directed Graph Layout
Abstract:
Visual analysis of relational data is essential for many real-world analytics tasks, with layout quality being key to interpretability. However, existing layout algorithms often require users to navigate complex parameters to express their intent. We present a user-guided force-directed layout approach that enables intuitive control through freehand sketching. Our method uses classical image analysis techniques to extract structural information from sketches, which is then used to generate positional constraints that guide the layout process. We evaluate the approach on various real and synthetic graphs ranging from small to medium scale, demonstrating its ability to produce layouts aligned with user expectations. An implementation of our method along with documentation and a demo page is freely available on GitHub at https://github.com/sciluna/uggly.
Chinese: 该研究提出了一种用户引导的力导向布局方法,通过手绘草图实现直观控制,利用图像分析提取结构信息并生成位置约束,经多种图验证有效,且已在GitHub上开源。
English: The study introduces a user-guided force-directed layout method that allows intuitive control via freehand sketching, using image analysis to extract structural cues and generate positional constraints, which is validated on various graphs and made available on GitHub.

Authors:Paige Tuttösí, Shivam Mehta, Zachary Syvenky, Bermet Burkanova, Gustav Eje Henter, Angelica Lim
Title: EmojiVoice: Towards long-term controllable expressivity in robot speech
Abstract:
Humans vary their expressivity when speaking for extended periods to maintain engagement with their listener. Although social robots tend to be deployed with ``expressive'' joyful voices, they lack this long-term variation found in human speech. Foundation model text-to-speech systems are beginning to mimic the expressivity in human speech, but they are difficult to deploy offline on robots. We present EmojiVoice, a free, customizable text-to-speech (TTS) toolkit that allows social roboticists to build temporally variable, expressive speech on social robots. We introduce emoji-prompting to allow fine-grained control of expressivity on a phase level and use the lightweight Matcha-TTS backbone to generate speech in real-time. We explore three case studies: (1) a scripted conversation with a robot assistant, (2) a storytelling robot, and (3) an autonomous speech-to-speech interactive agent. We found that using varied emoji prompting improved the perception and expressivity of speech over a long period in a storytelling task, but expressive voice was not preferred in the assistant use case.
中文摘要:人类在长时间说话时会自然变化表达力以保持听众参与,而社交机器人常缺乏这种动态特性;EmojiVoice通过表情符号提示提供可定制的实时语音合成工具,能实现富有表现力的机器人语音,但其效果因应用场景而异。
English Summary: Humans naturally vary their speech expressivity over time to engage listeners, while social robots often lack this dynamic quality, but EmojiVoice offers a customizable, real-time text-to-speech toolkit using emoji prompts to enable expressive, variable robotic speech, with effectiveness varying by application context.

Authors:Evdoxia Taka, Debadyuti Bhattacharya, Joanne Garde-Hansen, Sanjay Sharma, Tanaya Guha
Title: Analyzing Character Representation in Media Content using Multimodal Foundation Model: Effectiveness and Trust
Abstract:
Recent advances in AI has made automated analysis of complex media content at scale possible while generating actionable insights regarding character representation along such dimensions as gender and age. Past works focused on quantifying representation from audio/video/text using AI models, but without having the audience in the loop. We ask, even if character distribution along demographic dimensions are available, how useful are those to the general public? Do they actually trust the numbers generated by AI models? Our work addresses these open questions by proposing a new AI-based character representation tool and performing a thorough user study. Our tool has two components: (i) An analytics extraction model based on the Contrastive Language Image Pretraining (CLIP) foundation model that analyzes visual screen data to quantify character representation across age and gender; (ii) A visualization component effectively designed for presenting the analytics to lay audience. The user study seeks empirical evidence on the usefulness and trustworthiness of the AI-generated results for carefully chosen movies presented in the form of our visualizations. We found that participants were able to understand the analytics in our visualizations, and deemed the tool `overall useful'. Participants also indicated a need for more detailed visualizations to include more demographic categories and contextual information of the characters. Participants' trust in AI-based gender and age models is seen to be moderate to low, although they were not against the use of AI in this context. Our tool including code, benchmarking, and the user study data can be found at https://github.com/debadyuti0510/Character-Representation-Media.
中文摘要:本研究开发了一种基于CLIP模型分析媒体角色表征的AI工具,用户研究显示该工具被认为具有实用性,但参与者对AI生成的年龄性别数据信任度普遍中等偏低。
English Summary: This research introduces an AI-powered tool that quantifies character representation in media through CLIP-based analytics and visualizations, finding it useful but revealing moderate to low trust in AI-generated demographics among users.

Authors:Kevin L. Wei, Patricia Paskov, Sunishchal Dev, Michael J. Byun, Anka Reuel, Xavier Roberts-Gaal, Rachel Calcott, Evie Coxon, Chinmay Deshpande
Title: Recommendations and Reporting Checklist for Rigorous & Transparent Human Baselines in Model Evaluations
Abstract:
In this position paper, we argue that human baselines in foundation model evaluations must be more rigorous and more transparent to enable meaningful comparisons of human vs. AI performance, and we provide recommendations and a reporting checklist towards this end. Human performance baselines are vital for the machine learning community, downstream users, and policymakers to interpret AI evaluations. Models are often claimed to achieve "super-human" performance, but existing baselining methods are neither sufficiently rigorous nor sufficiently well-documented to robustly measure and assess performance differences. Based on a meta-review of the measurement theory and AI evaluation literatures, we derive a framework with recommendations for designing, executing, and reporting human baselines. We synthesize our recommendations into a checklist that we use to systematically review 115 human baselines (studies) in foundation model evaluations and thus identify shortcomings in existing baselining methods; our checklist can also assist researchers in conducting human baselines and reporting results. We hope our work can advance more rigorous AI evaluation practices that can better serve both the research community and policymakers. Data is available at: https://github.com/kevinlwei/human-baselines
中文摘要:本立场文件主张在基础模型评估中采用更严谨透明的人类基线以实现有效的人机性能比较,通过提供建议框架和报告清单来改进现有基线方法的不足。
English Summary: This position paper advocates for more rigorous and transparent human baselines in foundation model evaluations to enable accurate human-AI performance comparisons, offering a framework with recommendations and a reporting checklist to address current methodological shortcomings.

Authors:Bo Pan, Yixiao Fu, Ke Wang, Junyu Lu, Lunke Pan, Ziyang Qian, Yuhan Chen, Guoliang Wang, Yitao Zhou, Li Zheng, Yinghao Tang, Zhen Wen, Yuchen Wu, Junhua Lu, Biao Zhu, Minfeng Zhu, Bo Zhang, Wei Chen
Title: VIS-Shepherd: Constructing Critic for LLM-based Data Visualization Generation
Abstract:
Data visualization generation using Large Language Models (LLMs) has shown promising results but often produces suboptimal visualizations that require human intervention for improvement. In this work, we introduce VIS-Shepherd, a specialized Multimodal Large Language Model (MLLM)-based critic to evaluate and provide feedback for LLM-generated data visualizations. At the core of our approach is a framework to construct a high-quality visualization critique dataset, where we collect human-created visualization instances, synthesize corresponding LLM-generated instances, and construct high-quality critiques. We conduct both model-based automatic evaluation and human preference studies to evaluate the effectiveness of our approach. Our experiments show that even small (7B parameters) open-source MLLM models achieve substantial performance gains by leveraging our high-quality visualization critique dataset, reaching levels comparable to much larger open-source or even proprietary models. Our work demonstrates significant potential for MLLM-based automated visualization critique and indicates promising directions for enhancing LLM-based data visualization generation. Our project page: https://github.com/bopan3/VIS-Shepherd.
中文: VIS-Shepherd提出了一种基于多模态大语言模型的专门评估器,通过高质量的可视化评述数据集来改进大语言模型生成的数据可视化效果,使得较小模型也能达到与大型模型相当的性能水平。
English: VIS-Shepherd introduces a specialized MLLM-based critic that evaluates and improves LLM-generated data visualizations through high-quality critique datasets, enabling smaller models to achieve performance comparable to larger ones.

Authors:Qidi Fang, Hang Yu, Shijie Fang, Jindan Huang, Qiuyu Chen, Reuben M. Aronson, Elaine S. Short
Title: CHARM: Considering Human Attributes for Reinforcement Modeling
Abstract:
Reinforcement Learning from Human Feedback has recently achieved significant success in various fields, and its performance is highly related to feedback quality. While much prior work acknowledged that human teachers' characteristics would affect human feedback patterns, there is little work that has closely investigated the actual effects. In this work, we designed an exploratory study investigating how human feedback patterns are associated with human characteristics. We conducted a public space study with two long horizon tasks and 46 participants. We found that feedback patterns are not only correlated with task statistics, such as rewards, but also correlated with participants' characteristics, especially robot experience and educational background. Additionally, we demonstrated that human feedback value can be more accurately predicted with human characteristics compared to only using task statistics. All human feedback and characteristics we collected, and codes for our data collection and predicting more accurate human feedback are available at https://github.com/AABL-Lab/CHARM
中文摘要:强化学习中的人类反馈模式不仅与任务统计数据相关,还受到参与者个体特征的影响,结合人类特征比仅使用任务数据能更准确地预测反馈价值。
English Summary: Human feedback patterns in reinforcement learning are influenced by both task statistics and individual characteristics, with the inclusion of human traits enabling more accurate prediction of feedback value than relying solely on task data.

Authors:Nuwan Bandara, Thivya Kandappu, Archan Misra
Title: Inference-Time Gaze Refinement for Micro-Expression Recognition: Enhancing Event-Based Eye Tracking with Motion-Aware Post-Processing
Abstract:
Event-based eye tracking holds significant promise for fine-grained cognitive state inference, offering high temporal resolution and robustness to motion artifacts, critical features for decoding subtle mental states such as attention, confusion, or fatigue. In this work, we introduce a model-agnostic, inference-time refinement framework designed to enhance the output of existing event-based gaze estimation models without modifying their architecture or requiring retraining. Our method comprises two key post-processing modules: (i) Motion-Aware Median Filtering, which suppresses blink-induced spikes while preserving natural gaze dynamics, and (ii) Optical Flow-Based Local Refinement, which aligns gaze predictions with cumulative event motion to reduce spatial jitter and temporal discontinuities. To complement traditional spatial accuracy metrics, we propose a novel Jitter Metric that captures the temporal smoothness of predicted gaze trajectories based on velocity regularity and local signal complexity. Together, these contributions significantly improve the consistency of event-based gaze signals, making them better suited for downstream tasks such as micro-expression analysis and mind-state decoding. Our results demonstrate consistent improvements across multiple baseline models on controlled datasets, laying the groundwork for future integration with multimodal affect recognition systems in real-world environments. Our code implementations can be found at https://github.com/eye-tracking-for-physiological-sensing/EyeLoRiN.
中文: 本文提出了一种与模型无关的优化框架,通过运动感知滤波和光流局部优化来改进基于事件的视线估计,显著提升了视线信号的稳定性以支持认知状态解码,并引入了评估时间平滑度的新型抖动指标。
English: This paper presents a model-agnostic framework that enhances event-based gaze estimation through motion-aware filtering and optical flow refinement, improving signal consistency for cognitive state inference while introducing a novel jitter metric for temporal smoothness evaluation.

Authors:Yixin Ou, Yujie Luo, Jingsheng Zheng, Lanning Wei, Shuofei Qiao, Jintian Zhang, Da Zheng, Huajun Chen, Ningyu Zhang
Title: AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Abstract:
Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems. LLM-driven data science agents promise to automate the entire machine learning pipeline, yet their real-world effectiveness remains limited. Existing frameworks depend on rigid, pre-defined workflows and inflexible coding strategies; consequently, they excel only on relatively simple, classical problems and fail to capture the empirical expertise that human practitioners bring to complex, innovative tasks. In this work, we introduce AutoMind, an adaptive, knowledgeable LLM-agent framework that overcomes these deficiencies through three key advances: (1) a curated expert knowledge base that grounds the agent in domain expert knowledge, (2) an agentic knowledgeable tree search algorithm that strategically explores possible solutions, and (3) a self-adaptive coding strategy that dynamically tailors code generation to task complexity. Evaluations on two automated data science benchmarks demonstrate that AutoMind delivers superior performance versus state-of-the-art baselines. Additional analyses confirm favorable effectiveness, efficiency, and qualitative solution quality, highlighting AutoMind as an efficient and robust step toward fully automated data science.
中文摘要:AutoMind是一种自适应大型语言模型智能体框架,通过融合专家知识、策略性解决方案探索和动态编码,在自动化数据科学基准测试中展现出卓越性能。
English Summary: AutoMind is an adaptive LLM-agent framework that enhances automated data science by integrating expert knowledge, strategic solution exploration, and dynamic coding, achieving superior performance on benchmarks.

Authors:Chandana Srinivas, Elif E. Firat, Robert S. Laramee, Alark Joshi
Title: Mastery Learning Improves Performance on Complex Tasks on PCP Literacy Test
Abstract:
Developing literacy with unfamiliar data visualization techniques such as Parallel Coordinate Plots (PCPs) can be a significant challenge for students. We adopted the Revised Bloom's taxonomy to instruct students on Parallel Coordinate Plots (PCPs) using Mastery Learning in the classroom. To evaluate Mastery Learning's impact, we conducted an intervention in a Data Visualization course to teach students about PCPs using the Revised Bloom's taxonomy with and without Mastery Learning. Based on our intervention, we found that while students in both groups performed similarly on the first two (Remember, Understand) modules, the students in the Mastery Learning group performed better on modules that required more advanced thinking (Analyze, Evaluate) and demonstrated a better comprehension of PCPs. We provide all the materials developed including the six-module Bloom's Taxonomy PCP literacy (BTPL) test for full reproducibility on our website at https://vis-graphics.github.io/PCP-Literacy-Test/.

Authors:Jônata Tyska Carvalho, Stefano Nolfi
Title: LLMs for sensory-motor control: Combining in-context and iterative learning
Abstract:
We propose a method that enables large language models (LLMs) to control embodied agents by directly mapping continuous observation vectors to continuous action vectors. At the outset, the LLMs generate a control strategy based on a textual description of the agent, its environment, and the intended goal. This strategy is then iteratively refined through a learning process in which the LLMs are repeatedly prompted to improve the current strategy, using performance feedback and sensory-motor data collected during its evaluation. The method is validated on classic control tasks from the Gymnasium library and the inverted pendulum task from the MuJoCo library. The approach proves effective with relatively compact models such as Gpt-oss:120b and Qwen2.5:72b. In most cases, it successfully identifies optimal or near-optimal solutions by integrating symbolic knowledge derived through reasoning with sub-symbolic sensory-motor data gathered as the agent interacts with its environment.
中文: 该方法通过迭代学习生成并优化控制策略,使大型语言模型能够操控具身智能体,并在Gymnasium和MuJoCo任务中通过GPT-oss:120b等模型验证了其有效性。
English: This method enables large language models to control embodied agents by generating and refining control strategies through iterative learning, validated on Gymnasium and MuJoCo tasks using models like GPT-oss:120b and Qwen2.5:72b.

Authors:Guillermo Marco, Julio Gonzalo, Víctor Fresno
Title: The Reader is the Metric: How Textual Features and Reader Profiles Explain Conflicting Evaluations of AI Creative Writing
Abstract:
Recent studies comparing AI-generated and human-authored literary texts have produced conflicting results: some suggest AI already surpasses human quality, while others argue it still falls short. We start from the hypothesis that such divergences can be largely explained by genuine differences in how readers interpret and value literature, rather than by an intrinsic quality of the texts evaluated. Using five public datasets (1,471 stories, 101 annotators including critics, students, and lay readers), we (i) extract 17 reference-less textual features (e.g., coherence, emotional variance, average sentence length...); (ii) model individual reader preferences, deriving feature importance vectors that reflect their textual priorities; and (iii) analyze these vectors in a shared "preference space". Reader vectors cluster into two profiles: 'surface-focused readers' (mainly non-experts), who prioritize readability and textual richness; and 'holistic readers' (mainly experts), who value thematic development, rhetorical variety, and sentiment dynamics. Our results quantitatively explain how measurements of literary quality are a function of how text features align with each reader's preferences. These findings advocate for reader-sensitive evaluation frameworks in the field of creative text generation.
中文摘要:最新研究表明,关于AI与人类文学质量评价的矛盾源于读者偏好差异:表层导向型读者注重可读性与文本丰富性,而整体导向型读者更看重主题发展及情感动态。
English Summary: Recent research reveals that conflicting assessments of AI versus human literary quality stem from distinct reader preferences, with surface-focused readers valuing readability and textual richness, while holistic readers prioritize thematic depth and sentiment dynamics.

Authors:Yin Fang, Qiao Jin, Guangzhi Xiong, Bowen Jin, Xianrui Zhong, Siru Ouyang, Aidong Zhang, Jiawei Han, Zhiyong Lu
Title: Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning
Abstract:
Cell type annotation is a key task in analyzing the heterogeneity of single-cell RNA sequencing data. Although recent foundation models automate this process, they typically annotate cells independently, without considering batch-level cellular context or providing explanatory reasoning. In contrast, human experts often annotate distinct cell types for different cell clusters based on their domain knowledge. To mimic this workflow, we introduce the CellPuzzles task, where the objective is to assign unique cell types to a batch of cells. This benchmark spans diverse tissues, diseases, and donor conditions, and requires reasoning across the batch-level cellular context to ensure label uniqueness. We find that off-the-shelf large language models (LLMs) struggle on CellPuzzles, with the best baseline (OpenAI's o1) achieving only 19.0% batch-level accuracy. To fill this gap, we propose Cell-o1, a 7B LLM trained via supervised fine-tuning on distilled reasoning traces, followed by reinforcement learning with batch-level rewards. Cell-o1 achieves state-of-the-art performance, outperforming o1 by over 73% and generalizing well across contexts. Further analysis of training dynamics and reasoning behaviors provides insights into batch-level annotation performance and emergent expert-like reasoning. Code and data are available at https://github.com/ncbi-nlp/cell-o1.
中文: 本研究提出了CellPuzzles基准任务,要求通过批次级推理在不同条件下分配唯一细胞类型,并开发了Cell-o1模型——一个通过监督微调和批次级奖励强化学习训练的70亿参数大语言模型,实现了最先进的性能表现。
English: The study introduces CellPuzzles, a benchmark task requiring batch-level reasoning to assign unique cell types across diverse conditions, and proposes Cell-o1, a 7B LLM that achieves state-of-the-art performance by leveraging supervised fine-tuning and reinforcement learning with batch-level rewards.

Authors:Zihao Ding, Cheng-Tse Lee, Mufeng Zhu, Tao Guan, Yuan-Chun Sun, Cheng-Hsin Hsu, Yao Liu
Title: EyeNavGS: A 6-DoF Navigation Dataset and Record-n-Replay Software for Real-World 3DGS Scenes in VR
Abstract:
3D Gaussian Splatting (3DGS) is an emerging media representation that reconstructs real-world 3D scenes in high fidelity, enabling 6-degrees-of-freedom (6-DoF) navigation in virtual reality (VR). However, developing and evaluating 3DGS-enabled applications and optimizing their rendering performance, require realistic user navigation data. Such data is currently unavailable for photorealistic 3DGS reconstructions of real-world scenes. This paper introduces EyeNavGS (EyeNavGS), the first publicly available 6-DoF navigation dataset featuring traces from 46 participants exploring twelve diverse, real-world 3DGS scenes. The dataset was collected at two sites, using the Meta Quest Pro headsets, recording the head pose and eye gaze data for each rendered frame during free world standing 6-DoF navigation. For each of the twelve scenes, we performed careful scene initialization to correct for scene tilt and scale, ensuring a perceptually-comfortable VR experience. We also release our open-source SIBR viewer software fork with record-and-replay functionalities and a suite of utility tools for data processing, conversion, and visualization. The EyeNavGS dataset and its accompanying software tools provide valuable resources for advancing research in 6-DoF viewport prediction, adaptive streaming, 3D saliency, and foveated rendering for 3DGS scenes. The EyeNavGS dataset is available at: https://symmru.github.io/EyeNavGS/.

Authors:Zhong Zhang, Yaxi Lu, Yikun Fu, Yupeng Huo, Shenzhi Yang, Yesai Wu, Han Si, Xin Cong, Haotian Chen, Yankai Lin, Jie Xie, Wei Zhou, Wang Xu, Yuanheng Zhang, Zhou Su, Zhongwu Zhai, Xiaoming Liu, Yudong Mei, Jianming Xu, Hongyan Tian, Chongyi Wang, Chi Chen, Yuan Yao, Zhiyuan Liu, Maosong Sun
Title: AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
Abstract:
The recent progress of large language model agents has opened new possibilities for automating tasks through graphical user interfaces (GUIs), especially in mobile environments where intelligent interaction can greatly enhance usability. However, practical deployment of such agents remains constrained by several key challenges. Existing training data is often noisy and lack semantic diversity, which hinders the learning of precise grounding and planning. Models trained purely by imitation tend to overfit to seen interface patterns and fail to generalize in unfamiliar scenarios. Moreover, most prior work focuses on English interfaces while overlooks the growing diversity of non-English applications such as those in the Chinese mobile ecosystem. In this work, we present AgentCPM-GUI, an 8B-parameter GUI agent built for robust and efficient on-device GUI interaction. Our training pipeline includes grounding-aware pre-training to enhance perception, supervised fine-tuning on high-quality Chinese and English trajectories to imitate human-like actions, and reinforcement fine-tuning with GRPO to improve reasoning capability. We also introduce a compact action space that reduces output length and supports low-latency execution on mobile devices. AgentCPM-GUI achieves state-of-the-art performance on five public benchmarks and a new Chinese GUI benchmark called CAGUI, reaching $96.9\%$ Type-Match and $91.3\%$ Exact-Match. To facilitate reproducibility and further research, we publicly release all code, model checkpoint, and evaluation data.
中文: 大型语言模型代理在自动化图形界面任务方面展现出潜力,但面临训练数据噪声大、泛化能力差等挑战;AgentCPM-GUI通过增强训练流程和精简动作空间,在多个基准测试中取得了领先性能。
English: Large language model agents show promise for automating GUI tasks, yet face challenges like noisy training data and limited generalization, which AgentCPM-GUI addresses through a robust training pipeline and compact action space to achieve top performance on benchmarks.

Authors:Yueqian Guo, Tianzhao Li, Xin Lyu, Jiehaolin Chen, Zhaohan Wang, Sirui Xiao, Yurun Chen, Yezi He, Helin Li, Fan Zhang
Title: TRiMM: Transformer-Based Rich Motion Matching for Real-Time multi-modal Interaction in Digital Humans
Abstract:
Large Language Model (LLM)-driven digital humans have sparked a series of recent studies on co-speech gesture generation systems. However, existing approaches struggle with real-time synthesis and long-text comprehension. This paper introduces Transformer-Based Rich Motion Matching (TRiMM), a novel multi-modal framework for real-time 3D gesture generation. Our method incorporates three modules: 1) a cross-modal attention mechanism to achieve precise temporal alignment between speech and gestures; 2) a long-context autoregressive model with a sliding window mechanism for effective sequence modeling; 3) a large-scale gesture matching system that constructs an atomic action library and enables real-time retrieval. Additionally, we develop a lightweight pipeline implemented in the Unreal Engine for experimentation. Our approach achieves real-time inference at 120 fps and maintains a per-sentence latency of 0.15 seconds on consumer-grade GPUs (Geforce RTX3060). Extensive subjective and objective evaluations on the ZEGGS, and BEAT datasets demonstrate that our model outperforms current state-of-the-art methods. TRiMM enhances the speed of co-speech gesture generation while ensuring gesture quality, enabling LLM-driven digital humans to respond to speech in real time and synthesize corresponding gestures. Our code is available at https://github.com/teroon/TRiMM-Transformer-Based-Rich-Motion-Matching
中文: 本文提出的TRiMM框架通过跨模态对齐、自回归建模和动作匹配技术,解决了实时合成与长文本理解的难题,在保持手势质量的同时实现了120帧/秒的实时生成性能。
English: This paper introduces TRiMM, a real-time 3D gesture generation framework that overcomes limitations in real-time synthesis and long-text comprehension through cross-modal alignment, autoregressive modeling, and motion matching, achieving 120 fps performance while maintaining gesture quality.

Authors:Xuejiao Ma, Haibo Zhao, Zinuo Guo, Yijie Guo, Guanhong Liu, Bo Jiang
Title: CO-OPERA: A Human-AI Collaborative Playwriting Tool to Support Creative Storytelling for Interdisciplinary Drama Education
Abstract:
Drama-in-education is an interdisciplinary instructional approach that integrates subjects such as language, history, and psychology. Its core component is playwriting. Based on need-finding interviews of 13 teachers, we found that current general-purpose AI tools cannot effectively assist teachers and students during playwriting. Therefore, we propose CO-OPERA - a collaborative playwriting tool integrating generative artificial intelligence capabilities. In CO-OPERA, users can both expand their thinking through discussions with a tutor and converge their thinking by operating agents to generate script elements. Additionally, the system allows for iterative modifications and regenerations based on user requirements. A system usability test conducted with middle school students shows that our CO-OPERA helps users focus on whole logical narrative development during playwriting. Our playwriting examples and raw data for qualitative and quantitative analysis are available at https://github.com/daisyinb612/CO-OPERA.
中文摘要:CO-OPERA是一款集成生成式人工智能的协作剧本创作工具,通过导师对话和角色操作帮助用户发散与收敛创作思维,可用性测试表明该系统能有效支持用户在创作过程中聚焦整体叙事逻辑。
English Summary: CO-OPERA is an AI-powered collaborative playwriting tool designed to help users expand and refine their creative thinking through tutor discussions and script element generation, with usability tests showing it effectively supports logical narrative development.

Authors:Seohyun Park, Chitralekha Gupta, Michelle Kah Yian Kwan, Xinhui Fung, Alexander Wenjun Yip, Suranga Nanayakkara
Title: Towards Temporally Explainable Dysarthric Speech Clarity Assessment
Abstract:
Dysarthria, a motor speech disorder, affects intelligibility and requires targeted interventions for effective communication. In this work, we investigate automated mispronunciation feedback by collecting a dysarthric speech dataset from six speakers reading two passages, annotated by a speech therapist with temporal markers and mispronunciation descriptions. We design a three-stage framework for explainable mispronunciation evaluation: (1) overall clarity scoring, (2) mispronunciation localization, and (3) mispronunciation type classification. We systematically analyze pretrained Automatic Speech Recognition (ASR) models in each stage, assessing their effectiveness in dysarthric speech evaluation (Code available at: https://github.com/augmented-human-lab/interspeech25_speechtherapy, Supplementary webpage: https://apps.ahlab.org/interspeech25_speechtherapy/). Our findings offer clinically relevant insights for automating actionable feedback for pronunciation assessment, which could enable independent practice for patients and help therapists deliver more effective interventions.
中文: 本研究开发了一个三阶段框架用于构音障碍语音的自动评估,通过清晰度评分、发音错误定位和分类,利用预训练语音识别模型为言语治疗提供临床可行的反馈方案。
English: This study develops a three-stage framework for automated dysarthric speech evaluation, combining clarity scoring, mispronunciation localization, and classification using pretrained ASR models to provide clinically actionable feedback for speech therapy.

Authors:Enshang Zhang, Zhicheng Zhang, Takashi Hanakawa
Title: Category-aware EEG image generation based on wavelet transform and contrast semantic loss
Abstract:
Reconstructing visual stimuli from EEG signals is a crucial step in realizing brain-computer interfaces. In this paper, we propose a transformer-based EEG signal encoder integrating the Discrete Wavelet Transform (DWT) and the gating mechanism. Guided by the feature alignment and category-aware fusion losses, this encoder is used to extract features related to visual stimuli from EEG signals. Subsequently, with the aid of a pre-trained diffusion model, these features are reconstructed into visual stimuli. To verify the effectiveness of the model, we conducted EEG-to-image generation and classification tasks using the THINGS-EEG dataset. To address the limitations of quantitative analysis at the semantic level, we combined WordNet-based classification and semantic similarity metrics to propose a novel semantic-based score, emphasizing the ability of our model to transfer neural activities into visual representations. Experimental results show that our model significantly improves semantic alignment and classification accuracy, which achieves a maximum single-subject accuracy of 43\%, outperforming other state-of-the-art methods. The source code and supplementary material is available at https://github.com/zes0v0inn/DWT_EEG_Reconstruction/tree/main.
中文摘要:本文提出一种结合离散小波变换和门控机制的基于Transformer的脑电信号编码器,通过预训练扩散模型从脑电信号重建视觉刺激,在THINGS-EEG数据集上实现了最佳的语义对齐效果和高达43%的分类准确率。
English Summary: This paper introduces a transformer-based EEG encoder that integrates Discrete Wavelet Transform and gating mechanisms to reconstruct visual stimuli from brain signals using a diffusion model, achieving state-of-the-art performance in semantic alignment and classification accuracy on the THINGS-EEG dataset.

Authors:Neemesh Yadav, Palakorn Achananuparp, Jing Jiang, Ee-Peng Lim
Title: Effects of Theory of Mind and Prosocial Beliefs on Steering Human-Aligned Behaviors of LLMs in Ultimatum Games
Abstract:
Large Language Models (LLMs) have shown potential in simulating human behaviors and performing theory-of-mind (ToM) reasoning, a crucial skill for complex social interactions. In this study, we investigate the role of ToM reasoning in aligning agentic behaviors with human norms in negotiation tasks, using the ultimatum game as a controlled environment. We initialized LLM agents with different prosocial beliefs (including Greedy, Fair, and Selfless) and reasoning methods like chain-of-thought (CoT) and varying ToM levels, and examined their decision-making processes across diverse LLMs, including reasoning models like o3-mini and DeepSeek-R1 Distilled Qwen 32B. Results from 2,700 simulations indicated that ToM reasoning enhances behavior alignment, decision-making consistency, and negotiation outcomes. Consistent with previous findings, reasoning models exhibit limited capability compared to models with ToM reasoning, different roles of the game benefits with different orders of ToM reasoning. Our findings contribute to the understanding of ToM's role in enhancing human-AI interaction and cooperative decision-making. The code used for our experiments can be found at https://github.com/Stealth-py/UltimatumToM.
中文摘要:本研究证明,在谈判任务中,心理理论推理能显著提高大型语言模型代理行为与人类规范的契合度,在不同亲社会信念设置下增强决策一致性和谈判结果。
English Summary: This study demonstrates that theory-of-mind (ToM) reasoning significantly improves the alignment of LLM agent behaviors with human norms in negotiation tasks, enhancing decision-making consistency and outcomes across various prosocial belief settings.

Authors:Sahil Verma, Keegan Hines, Jeff Bilmes, Charlotte Siska, Luke Zettlemoyer, Hila Gonen, Chandan Singh
Title: OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities
Abstract:
The emerging capabilities of large language models (LLMs) have sparked concerns about their immediate potential for harmful misuse. The core approach to mitigate these concerns is the detection of harmful queries to the model. Current detection approaches are fallible, and are particularly susceptible to attacks that exploit mismatched generalization of model capabilities (e.g., prompts in low-resource languages or prompts provided in non-text modalities such as image and audio). To tackle this challenge, we propose OMNIGUARD, an approach for detecting harmful prompts across languages and modalities. Our approach (i) identifies internal representations of an LLM/MLLM that are aligned across languages or modalities and then (ii) uses them to build a language-agnostic or modality-agnostic classifier for detecting harmful prompts. OMNIGUARD improves harmful prompt classification accuracy by 11.57\% over the strongest baseline in a multilingual setting, by 20.44\% for image-based prompts, and sets a new SOTA for audio-based prompts. By repurposing embeddings computed during generation, OMNIGUARD is also very efficient ($\approx 120 \times$ faster than the next fastest baseline). Code and data are available at: https://github.com/vsahil/OmniGuard.
中文摘要:本文提出OMNIGUARD方法,通过利用大语言模型中跨语言和跨模态对齐的内部表征来检测有害提示,在多语言和跨模态场景下显著提升了分类准确率与检测效率。
English Summary: The paper introduces OMNIGUARD, a method that detects harmful prompts across languages and modalities by leveraging aligned internal representations of LLMs, significantly improving classification accuracy and efficiency over existing baselines.

Authors:Gabriele Sarti, Vilém Zouhar, Malvina Nissim, Arianna Bisazza
Title: Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
Abstract:
Word-level quality estimation (WQE) aims to automatically identify fine-grained error spans in machine-translated outputs and has found many uses, including assisting translators during post-editing. Modern WQE techniques are often expensive, involving prompting of large language models or ad-hoc training on large amounts of human-labeled data. In this work, we investigate efficient alternatives exploiting recent advances in language model interpretability and uncertainty quantification to identify translation errors from the inner workings of translation models. In our evaluation spanning 14 metrics across 12 translation directions, we quantify the impact of human label variation on metric performance by using multiple sets of human labels. Our results highlight the untapped potential of unsupervised metrics, the shortcomings of supervised methods when faced with label uncertainty, and the brittleness of single-annotator evaluation practices.
中文: 本研究利用模型可解释性和不确定性量化探索了高效的词级质量评估方法,以检测翻译错误,揭示了无监督指标的潜力及有监督方法在标签不确定性下的局限性。
English: This study explores efficient methods for word-level quality estimation by leveraging model interpretability and uncertainty quantification to detect translation errors, revealing the potential of unsupervised metrics and the limitations of supervised approaches under label uncertainty.

Authors:Yupei Li, Shuaijie Shao, Manuel Milling, Björn W. Schuller
Title: Large Language Models for Depression Recognition in Spoken Language Integrating Psychological Knowledge
Abstract:
Depression is a growing concern gaining attention in both public discourse and AI research. While deep neural networks (DNNs) have been used for recognition, they still lack real-world effectiveness. Large language models (LLMs) show strong potential but require domain-specific fine-tuning and struggle with non-textual cues. Since depression is often expressed through vocal tone and behaviour rather than explicit text, relying on language alone is insufficient. Diagnostic accuracy also suffers without incorporating psychological expertise. To address these limitations, we present, to the best of our knowledge, the first application of LLMs to multimodal depression detection using the DAIC-WOZ dataset. We extract the audio features using the pre-trained model Wav2Vec, and mapped it to text-based LLMs for further processing. We also propose a novel strategy for incorporating psychological knowledge into LLMs to enhance diagnostic performance, specifically using a question and answer set to grant authorised knowledge to LLMs. Our approach yields a notable improvement in both Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) compared to a base score proposed by the related original paper. The codes are available at https://github.com/myxp-lyp/Depression-detection.git
中文摘要:本研究首次将大语言模型应用于多模态抑郁症检测,通过结合Wav2Vec音频特征与心理学知识增强策略,在诊断准确性上较基线分数实现了显著提升。
English Summary: This study introduces the first multimodal depression detection method using large language models (LLMs) combined with audio features from Wav2Vec and psychological knowledge integration, achieving significant improvements in diagnostic accuracy over baseline scores.

Authors:Andrew Zhu, Evan Osgood, Chris Callison-Burch
Title: First Steps Towards Overhearing LLM Agents: A Case Study With Dungeons & Dragons Gameplay
Abstract:
Much work has been done on conversational LLM agents which directly assist human users with tasks. We present an alternative paradigm for interacting with LLM agents, which we call "overhearing agents". These overhearing agents do not actively participate in conversation -- instead, they "listen in" on human-to-human conversations and perform background tasks or provide suggestions to assist the user. In this work, we explore the overhearing agents paradigm through the lens of Dungeons & Dragons gameplay. We present an in-depth study using large multimodal audio-language models as overhearing agents to assist a Dungeon Master. We perform a human evaluation to examine the helpfulness of such agents and find that some large audio-language models have the emergent ability to perform overhearing agent tasks using implicit audio cues. Finally, we release Python libraries and our project code to support further research into the overhearing agents paradigm at https://github.com/zhudotexe/overhearing_agents.
中文: 本文提出了“旁听智能体”的新范式,通过《龙与地下城》案例展示了多模态音频语言模型如何被动监听人类对话以执行后台任务或提供辅助,并开源相关代码以推动该领域研究。
English: This paper introduces "overhearing agents," a novel paradigm where LLM agents passively monitor human conversations to perform background tasks or offer assistance, demonstrated through a Dungeons & Dragons case study using multimodal audio-language models and released with open-source code for further research.

Authors:Aditya Gunturu, Ben Pearman, Keiichi Ihara, Morteza Faraji, Bryan Wang, Rubaiat Habib Kazi, Ryo Suzuki
Title: MapStory: Prototyping Editable Map Animations with LLM Agents
Abstract:
We introduce MapStory, an LLM-powered animation prototyping tool that generates editable map animation sequences directly from natural language text by leveraging a dual-agent LLM architecture. Given a user written script, MapStory automatically produces a scene breakdown, which decomposes the text into key map animation primitives such as camera movements, visual highlights, and animated elements. Our system includes a researcher agent that accurately queries geospatial information by leveraging an LLM with web search, enabling automatic extraction of relevant regions, paths, and coordinates while allowing users to edit and query for changes or additional information to refine the results. Additionally, users can fine-tune parameters of these primitive blocks through an interactive timeline editor. We detail the system's design and architecture, informed by formative interviews with professional animators and by an analysis of 200 existing map animation videos. Our evaluation, which includes expert interviews (N=5) and a usability study (N=12), demonstrates that MapStory enables users to create map animations with ease, facilitates faster iteration, encourages creative exploration, and lowers barriers to creating map-centric stories.

Authors:Danush Khanna, Pratinav Seth, Sidhaarth Sredharan Murali, Aditya Kumar Guru, Siddharth Shukla, Tanuj Tyagi, Sandeep Chaurasia, Kripabandhu Ghosh
Title: SELF-PERCEPT: Introspection Improves Large Language Models' Detection of Multi-Person Mental Manipulation in Conversations
Abstract:
Mental manipulation is a subtle yet pervasive form of abuse in interpersonal communication, making its detection critical for safeguarding potential victims. However, due to manipulation's nuanced and context-specific nature, identifying manipulative language in complex, multi-turn, and multi-person conversations remains a significant challenge for large language models (LLMs). To address this gap, we introduce the MultiManip dataset, comprising 220 multi-turn, multi-person dialogues balanced between manipulative and non-manipulative interactions, all drawn from reality shows that mimic real-world scenarios. For manipulative interactions, it includes 11 distinct manipulations depicting real-life scenarios. We conduct extensive evaluations of state-of-the-art LLMs, such as GPT-4o and Llama-3.1-8B, employing various prompting strategies. Despite their capabilities, these models often struggle to detect manipulation effectively. To overcome this limitation, we propose SELF-PERCEPT, a novel, two-stage prompting framework inspired by Self-Perception Theory, demonstrating strong performance in detecting multi-person, multi-turn mental manipulation. Our code and data are publicly available at https://github.com/danushkhanna/self-percept .
中文: 本研究提出了MultiManip数据集和SELF-PERCEPT框架,以解决大型语言模型在多轮对话中检测微妙心理操纵的难题,相比现有模型展现出显著性能提升。
English: The study introduces the MultiManip dataset and SELF-PERCEPT framework to address LLMs' challenges in detecting nuanced mental manipulation in multi-turn dialogues, showing significant improvement over existing models.

Authors:Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, Jianing Wang, Qintong Li, Xiangru Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing Qi, Lingpeng Kong, Zhiyong Wu
Title: ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Abstract:
Large Language Models (LLMs) have extended their impact beyond Natural Language Processing, substantially fostering the development of interdisciplinary research. Recently, various LLM-based agents have been developed to assist scientific discovery progress across multiple aspects and domains. Among these, computer-using agents, capable of interacting with operating systems as humans do, are paving the way to automated scientific problem-solving and addressing routines in researchers' workflows. Recognizing the transformative potential of these agents, we introduce ScienceBoard, which encompasses two complementary contributions: (i) a realistic, multi-domain environment featuring dynamic and visually rich scientific workflows with integrated professional software, where agents can autonomously interact via different interfaces to accelerate complex research tasks and experiments; and (ii) a challenging benchmark of 169 high-quality, rigorously validated real-world tasks curated by humans, spanning scientific-discovery workflows in domains such as biochemistry, astronomy, and geoinformatics. Extensive evaluations of agents with state-of-the-art backbones (e.g., GPT-4o, Claude 3.7, UI-TARS) show that, despite some promising results, they still fall short of reliably assisting scientists in complex workflows, achieving only a 15% overall success rate. In-depth analysis further provides valuable insights for addressing current agent limitations and more effective design principles, paving the way to build more capable agents for scientific discovery. Our code, environment, and benchmark are at https://qiushisun.github.io/ScienceBoard-Home/.

Authors:Dora Zhao, Diyi Yang, Michael S. Bernstein
Title: Knoll: Creating a Knowledge Ecosystem for Large Language Models
Abstract:
Large language models are designed to encode general purpose knowledge about the world from Internet data. Yet, a wealth of information falls outside this scope -- ranging from personal preferences to organizational policies, from community-specific advice to up-to-date news -- that users want models to access but remains unavailable. In this paper, we propose a knowledge ecosystem in which end-users can create, curate, and configure custom knowledge modules that are utilized by language models, such as ChatGPT and Claude. To support this vision, we introduce Knoll, a software infrastructure that allows users to make modules by clipping content from the web or authoring shared documents on Google Docs and GitHub, add modules that others have made, and rely on the system to insert relevant knowledge when interacting with an LLM. We conduct a public deployment of Knoll reaching over 200 users who employed the system for a diverse set of tasks including personalized recommendations, advice-seeking, and writing assistance. In our evaluation, we validate that using Knoll improves the quality of generated responses.

Authors:Kai Mei, Xi Zhu, Hang Gao, Shuhang Lin, Yongfeng Zhang
Title: LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOS
Abstract:
We present AIOS 1.0, a novel platform designed to advance computer-use agent (CUA) capabilities through environmental contextualization. While existing approaches primarily focus on building more powerful agent frameworks or enhancing agent models, we identify a fundamental limitation: the semantic disconnect between how language models understand the world and how computer interfaces are structured. AIOS 1.0 addresses this challenge by transforming computers into contextual environments that language models can natively comprehend, implementing a Model Context Protocol (MCP) server architecture to abstract computer states and actions. This approach effectively decouples interface complexity from decision complexity, enabling agents to reason more effectively about computing environments. To demonstrate our platform's effectiveness, we introduce LiteCUA, a lightweight computer-use agent built on AIOS 1.0 that achieves a 14.66% success rate on the OSWorld benchmark, outperforming several specialized agent frameworks despite its simple architecture. Our results suggest that contextualizing computer environments for language models represents a promising direction for developing more capable computer-use agents and advancing toward AI that can interact with digital systems. The source code of LiteCUA is available at https://github.com/agiresearch/LiteCUA, and it is also integrated into the AIOS main branch as part of AIOS at https://github.com/agiresearch/AIOS.
中文: AIOS 1.0通过构建可被语言模型原生理解的上下文环境,采用模型上下文协议解决语义鸿沟问题,其轻量级代理LiteCUA在OSWorld基准测试中取得14.66%的成功率,展示了环境情境化对提升计算机使用代理能力的有效性。
English: AIOS 1.0 introduces a novel platform that addresses the semantic gap between language models and computer interfaces by creating contextual environments through a Model Context Protocol, enabling more effective agent reasoning and achieving a 14.66% success rate on OSWorld with its LiteCUA agent.

Authors:Natia Kukhilava, Tatia Tsmindashvili, Rapael Kalandadze, Anchit Gupta, Sofio Katamadze, François Brémond, Laura M. Ferrari, Philipp Müller, Benedikt Emanuel Wirth
Title: Evaluation in EEG Emotion Recognition: State-of-the-Art Review and Unified Framework
Abstract:
Electroencephalography-based Emotion Recognition (EEG-ER) has become a growing research area in recent years. Analyzing 216 papers published between 2018 and 2023, we uncover that the field lacks a unified evaluation protocol, which is essential to fairly define the state of the art, compare new approaches and to track the field's progress. We report the main inconsistencies between the used evaluation protocols, which are related to ground truth definition, evaluation metric selection, data splitting types (e.g., subject-dependent or subject-independent) and the use of different datasets. Capitalizing on this state-of-the-art research, we propose a unified evaluation protocol, EEGain (https://github.com/EmotionLab/EEGain), which enables an easy and efficient evaluation of new methods and datasets. EEGain is a novel open source software framework, offering the capability to compare - and thus define - state-of-the-art results. EEGain includes standardized methods for data pre-processing, data splitting, evaluation metrics, and the ability to load the six most relevant datasets (i.e., AMIGOS, DEAP, DREAMER, MAHNOB-HCI, SEED, SEED-IV) in EEG-ER with only a single line of code. In addition, we have assessed and validated EEGain using these six datasets on the four most common publicly available methods (EEGNet, DeepConvNet, ShallowConvNet, TSception). This is a significant step to make research on EEG-ER more reproducible and comparable, thereby accelerating the overall progress of the field.
中文: EEG-ER研究领域缺乏统一的评估标准,为此开发了EEGain开源框架,通过标准化数据处理和评估方法,确保研究结果的可复现性与可比性。
English: EEG-ER research lacks standardized evaluation protocols, prompting the development of EEGain, an open-source framework that ensures reproducible and comparable results by unifying data processing and assessment methods.

Authors:Austin Howard
Title: InjectLab: A Tactical Framework for Adversarial Threat Modeling Against Large Language Models
Abstract:
Large Language Models (LLMs) are changing the way people interact with technology. Tools like ChatGPT and Claude AI are now common in business, research, and everyday life. But with that growth comes new risks, especially prompt-based attacks that exploit how these models process language. InjectLab is a security framework designed to address that problem. This paper introduces InjectLab as a structured, open-source matrix that maps real-world techniques used to manipulate LLMs. The framework is inspired by MITRE ATT&CK and focuses specifically on adversarial behavior at the prompt layer. It includes over 25 techniques organized under six core tactics, covering threats like instruction override, identity swapping, and multi-agent exploitation. Each technique in InjectLab includes detection guidance, mitigation strategies, and YAML-based simulation tests. A Python tool supports easy execution of prompt-based test cases. This paper outlines the framework's structure, compares it to other AI threat taxonomies, and discusses its future direction as a practical, community-driven foundation for securing language models.
中文摘要:InjectLab是受MITRE ATT&CK启发的开源安全框架,通过包含六大核心战术、25种以上技术的结构化矩阵,系统性地识别针对大语言模型的提示词攻击,并提供检测指南与防护策略。
English Summary: InjectLab is an open-source security framework inspired by MITRE ATT&CK that systematically maps prompt-based attack techniques against Large Language Models, offering detection guidance and mitigation strategies through a structured matrix of 25+ techniques across six core tactics.

Authors:Niklas Holzner, Sebastian Maier, Stefan Feuerriegel
Title: Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis
Abstract:
Generative artificial intelligence (GenAI) is increasingly used to support a wide range of human tasks, yet empirical evidence on its effect on creativity remains scattered. Can GenAI generate ideas that are creative? To what extent can it support humans in generating ideas that are both creative and diverse? In this study, we conduct a meta-analysis to evaluate the effect of GenAI on the performance in creative tasks. For this, we first perform a systematic literature search, based on which we identify n = 28 relevant studies (m = 8214 participants) for inclusion in our meta-analysis. We then compute standardized effect sizes based on Hedges' g. We compare different outcomes: (i) how creative GenAI is; (ii) how creative humans augmented by GenAI are; and (iii) the diversity of ideas by humans augmented by GenAI. Our results show no significant difference in creative performance between GenAI and humans (g = -0.05), while humans collaborating with GenAI significantly outperform those working without assistance (g = 0.27). However, GenAI has a significant negative effect on the diversity of ideas for such collaborations between humans and GenAI (g = -0.86). We further analyze heterogeneity across different GenAI models (e.g., GPT-3.5, GPT-4), different tasks (e.g., creative writing, ideation, divergent thinking), and different participant populations (e.g., laypeople, business, academia). Overall, our results position GenAI as an augmentative tool that can support, rather than replace, human creativity-particularly in tasks benefiting from ideation support.
中文: 荟萃分析表明,生成式人工智能在创造力方面未超越人类,但人机协作能显著提升创意表现,不过会降低想法多样性。
English: A meta-analysis reveals that generative AI does not outperform humans in creativity but significantly enhances human creative performance when used collaboratively, although it reduces idea diversity.

Authors:Yuxiang Wei, Yanteng Zhang, Xi Xiao, Tianyang Wang, Xiao Wang, Vince D. Calhoun
Title: MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding
Abstract:
Decoding visual experiences from fMRI offers a powerful avenue to understand human perception and develop advanced brain-computer interfaces. However, current progress often prioritizes maximizing reconstruction fidelity while overlooking interpretability, an essential aspect for deriving neuroscientific insight. To address this gap, we propose MoRE-Brain, a neuro-inspired framework designed for high-fidelity, adaptable, and interpretable visual reconstruction. MoRE-Brain uniquely employs a hierarchical Mixture-of-Experts architecture where distinct experts process fMRI signals from functionally related voxel groups, mimicking specialized brain networks. The experts are first trained to encode fMRI into the frozen CLIP space. A finetuned diffusion model then synthesizes images, guided by expert outputs through a novel dual-stage routing mechanism that dynamically weighs expert contributions across the diffusion process. MoRE-Brain offers three main advancements: First, it introduces a novel Mixture-of-Experts architecture grounded in brain network principles for neuro-decoding. Second, it achieves efficient cross-subject generalization by sharing core expert networks while adapting only subject-specific routers. Third, it provides enhanced mechanistic insight, as the explicit routing reveals precisely how different modeled brain regions shape the semantic and spatial attributes of the reconstructed image. Extensive experiments validate MoRE-Brain's high reconstruction fidelity, with bottleneck analyses further demonstrating its effective utilization of fMRI signals, distinguishing genuine neural decoding from over-reliance on generative priors. Consequently, MoRE-Brain marks a substantial advance towards more generalizable and interpretable fMRI-based visual decoding. Code will be publicly available soon: https://github.com/yuxiangwei0808/MoRE-Brain.
中文: MoRE-Brain提出了一种基于脑网络原理的混合专家框架,通过分层处理和双阶段路由机制,实现了从fMRI信号到图像的高保真、可适应且可解释的视觉重建,显著提升了重建质量与神经机制的可解释性。
English: MoRE-Brain introduces a neuro-inspired Mixture-of-Experts framework that achieves high-fidelity, adaptable, and interpretable visual reconstruction from fMRI through hierarchical processing and dual-stage routing, advancing both reconstruction quality and mechanistic insight.

Authors:Xinyi Lu, Aditya Mahesh, Zejia Shen, Mitchell Dudley, Larissa Sano, Xu Wang
Title: Exploring LLM-Generated Feedback for Economics Essays: How Teaching Assistants Evaluate and Envision Its Use
Abstract:
This project examines the prospect of using AI-generated feedback as suggestions to expedite and enhance human instructors' feedback provision. In particular, we focus on understanding the teaching assistants' perspectives on the quality of AI-generated feedback and how they may or may not utilize AI feedback in their own workflows. We situate our work in a foundational college Economics class, which has frequent short essay assignments. We developed an LLM-powered feedback engine that generates feedback on students' essays based on grading rubrics used by the teaching assistants (TAs). To ensure that TAs can meaningfully critique and engage with the AI feedback, we had them complete their regular grading jobs. For a randomly selected set of essays that they had graded, we used our feedback engine to generate feedback and displayed the feedback as in-text comments in a Word document. We then performed think-aloud studies with 5 TAs over 20 1-hour sessions to have them evaluate the AI feedback, contrast the AI feedback with their handwritten feedback, and share how they envision using the AI feedback if they were offered as suggestions. The study highlights the importance of providing detailed rubrics for AI to generate high-quality feedback for knowledge-intensive essays. TAs considered that using AI feedback as suggestions during their grading could expedite grading, enhance consistency, and improve overall feedback quality. We discuss the importance of decomposing the feedback generation task into steps and presenting intermediate results, in order for TAs to use the AI feedback.
该项目探讨了利用AI生成反馈辅助助教评分,通过评估其质量与工作流程整合,发现基于详细评分标准时能加快评分速度并提高一致性。
This project explores using AI-generated feedback to assist teaching assistants in grading by evaluating its quality and integration into their workflows, finding it can speed up grading and improve consistency when guided by detailed rubrics.

Authors:Lu Li, Cunhang Fan, Hongyu Zhang, Jingjing Zhang, Xiaoke Yang, Jian Zhou, Zhao Lv
Title: MHANet: Multi-scale Hybrid Attention Network for Auditory Attention Detection
Abstract:
Auditory attention detection (AAD) aims to detect the target speaker in a multi-talker environment from brain signals, such as electroencephalography (EEG), which has made great progress. However, most AAD methods solely utilize attention mechanisms sequentially and overlook valuable multi-scale contextual information within EEG signals, limiting their ability to capture long-short range spatiotemporal dependencies simultaneously. To address these issues, this paper proposes a multi-scale hybrid attention network (MHANet) for AAD, which consists of the multi-scale hybrid attention (MHA) module and the spatiotemporal convolution (STC) module. Specifically, MHA combines channel attention and multi-scale temporal and global attention mechanisms. This effectively extracts multi-scale temporal patterns within EEG signals and captures long-short range spatiotemporal dependencies simultaneously. To further improve the performance of AAD, STC utilizes temporal and spatial convolutions to aggregate expressive spatiotemporal representations. Experimental results show that the proposed MHANet achieves state-of-the-art performance with fewer trainable parameters across three datasets, 3 times lower than that of the most advanced model. Code is available at: https://github.com/fchest/MHANet.
中文摘要:本文提出MHANet多尺度混合注意力网络,通过结合通道注意力和多尺度时空注意机制,有效提取脑电信号中的多尺度时间模式并同时捕获长短程时空依赖关系,以更少的可训练参数在三个数据集上实现了最优性能。
English Summary: This paper introduces MHANet, a multi-scale hybrid attention network that enhances auditory attention detection by effectively capturing long-short range spatiotemporal dependencies in EEG signals through combined attention mechanisms and spatiotemporal convolutions, achieving state-of-the-art performance with significantly fewer parameters.

Authors:Yilin Ye, Junchao Huang, Xingchen Zeng, Jiazhi Xia, Wei Zeng
Title: AKRMap: Adaptive Kernel Regression for Trustworthy Visualization of Cross-Modal Embeddings
Abstract:
Cross-modal embeddings form the foundation for multi-modal models. However, visualization methods for interpreting cross-modal embeddings have been primarily confined to traditional dimensionality reduction (DR) techniques like PCA and t-SNE. These DR methods primarily focus on feature distributions within a single modality, whilst failing to incorporate metrics (e.g., CLIPScore) across multiple modalities. This paper introduces AKRMap, a new DR technique designed to visualize cross-modal embeddings metric with enhanced accuracy by learning kernel regression of the metric landscape in the projection space. Specifically, AKRMap constructs a supervised projection network guided by a post-projection kernel regression loss, and employs adaptive generalized kernels that can be jointly optimized with the projection. This approach enables AKRMap to efficiently generate visualizations that capture complex metric distributions, while also supporting interactive features such as zoom and overlay for deeper exploration. Quantitative experiments demonstrate that AKRMap outperforms existing DR methods in generating more accurate and trustworthy visualizations. We further showcase the effectiveness of AKRMap in visualizing and comparing cross-modal embeddings for text-to-image models. Code and demo are available at https://github.com/yilinye/AKRMap.
Chinese Summary: 本文提出AKRMap这一新型降维技术,通过学习投影空间中度量景观的核回归,能够更准确地可视化跨模态嵌入,在定量实验中超越了PCA和t-SNE等传统方法。
English Summary: This paper introduces AKRMap, a novel dimensionality reduction technique that visualizes cross-modal embeddings more accurately by learning kernel regression of the metric landscape, outperforming traditional methods like PCA and t-SNE.

Authors:Yu Ying Chiu, Zhilin Wang, Sharan Maiya, Yejin Choi, Kyle Fish, Sydney Levine, Evan Hubinger
Title: Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas
Abstract:
Detecting AI risks becomes more challenging as stronger models emerge and find novel methods such as Alignment Faking to circumvent these detection attempts. Inspired by how risky behaviors in humans (i.e., illegal activities that may hurt others) are sometimes guided by strongly-held values, we believe that identifying values within AI models can be an early warning system for AI's risky behaviors. We create LitmusValues, an evaluation pipeline to reveal AI models' priorities on a range of AI value classes. Then, we collect AIRiskDilemmas, a diverse collection of dilemmas that pit values against one another in scenarios relevant to AI safety risks such as Power Seeking. By measuring an AI model's value prioritization using its aggregate choices, we obtain a self-consistent set of predicted value priorities that uncover potential risks. We show that values in LitmusValues (including seemingly innocuous ones like Care) can predict for both seen risky behaviors in AIRiskDilemmas and unseen risky behaviors in HarmBench.
中文: 随着AI模型通过“对齐伪装”等新方法规避检测,识别风险愈发困难,因此我们开发了LitmusValues评估流程,通过分析AI在价值困境中的优先选择来预测其潜在危险行为。
English: As AI models evolve with tactics like Alignment Faking, detecting risks grows more difficult, prompting the development of LitmusValues, an evaluation pipeline that identifies AI value priorities to predict risky behaviors through dilemmas and real-world benchmarks.

Authors:Yuka Iwanaga, Masayoshi Tsuchinaga, Kosei Tanada, Yuji Nakamura, Takemitsu Mori, Takashi Yamamoto
Title: Sketch Interface for Teleoperation of Mobile Manipulator to Enable Intuitive and Intended Operation: A Proof of Concept
Abstract:
Recent advancements in robotics have underscored the need for effective collaboration between humans and robots. Traditional interfaces often struggle to balance robot autonomy with human oversight, limiting their practical application in complex tasks like mobile manipulation. This study aims to develop an intuitive interface that enables a mobile manipulator to autonomously interpret user-provided sketches, enhancing user experience while minimizing burden. We implemented a web-based application utilizing machine learning algorithms to process sketches, making the interface accessible on mobile devices for use anytime, anywhere, by anyone. In the first validation, we examined natural sketches drawn by users for 27 selected manipulation and navigation tasks, gaining insights into trends related to sketch instructions. The second validation involved comparative experiments with five grasping tasks, showing that the sketch interface reduces workload and enhances intuitiveness compared to conventional axis control interfaces. These findings suggest that the proposed sketch interface improves the efficiency of mobile manipulators and opens new avenues for integrating intuitive human-robot collaboration in various applications.

Authors:Han Meng, Yancan Chen, Yunan Li, Yitian Yang, Jungup Lee, Renwen Zhang, Yi-Chieh Lee
Title: What is Stigma Attributed to? A Theory-Grounded, Expert-Annotated Interview Corpus for Demystifying Mental-Health Stigma
Abstract:
Mental-health stigma remains a pervasive social problem that hampers treatment-seeking and recovery. Existing resources for training neural models to finely classify such stigma are limited, relying primarily on social-media or synthetic data without theoretical underpinnings. To remedy this gap, we present an expert-annotated, theory-informed corpus of human-chatbot interviews, comprising 4,141 snippets from 684 participants with documented socio-cultural backgrounds. Our experiments benchmark state-of-the-art neural models and empirically unpack the challenges of stigma detection. This dataset can facilitate research on computationally detecting, neutralizing, and counteracting mental-health stigma. Our corpus is openly available at https://github.com/HanMeng2004/Mental-Health-Stigma-Interview-Corpus.
中文摘要:本研究提供了一个专家标注的人机对话访谈语料库,旨在解决心理健康污名检测中缺乏理论指导数据的问题,通过基准测试评估了先进神经模型并揭示了检测难点。
English Summary: This study introduces an expert-annotated corpus of human-chatbot interviews to address the lack of theory-informed data for training neural models in detecting mental-health stigma, benchmarking state-of-the-art models and highlighting detection challenges.

Authors:Botao Amber Hu, Rem Rungu Lin, Yilan Elan Tao, Samuli Laato, Yue Li
Title: Towards Immersive Mixed Reality Street Play: Understanding Co-located Bodily Play with See-through Head-mounted Displays in Public Spaces
Abstract:
As see-through Mixed Reality Head-Mounted Displays (MRHMDs) proliferate, their usage is gradually shifting from controlled, private settings to spontaneous, public contexts. While location-based augmented reality mobile games such as Pokemon GO have been successful, the embodied interaction afforded by MRHMDs moves play beyond phone-based screen-tapping toward co-located, bodily, movement-based play. In anticipation of widespread MRHMD adoption, major technology companies have teased concept videos envisioning urban streets as vast mixed reality playgrounds-imagine Harry Potter-style wizard duels in city streets-which we term Immersive Mixed Reality Street Play (IMRSP). However, few real-world studies examine such scenarios. Through empirical, in-the-wild studies of our research-through-design game probe, Multiplayer Omnipresent Fighting Arena (MOFA), deployed across diverse public venues, we offer initial insights into the social implications, challenges, opportunities, and design recommendations of IMRSP. The MOFA framework, which includes three gameplay modes-"The Training," "The Duel," and "The Dragon"-is open-sourced at https://github.com/realitydeslab/mofa.
中文摘要:混合现实头戴设备正从私人场景转向公共应用,通过MOFA游戏在真实环境中的实证研究,揭示了沉浸式街头游戏的社会影响与设计挑战,并开源了游戏框架。
English Summary: Mixed Reality Head-Mounted Displays are transitioning from private to public use, enabling immersive street play like wizard duels, with the MOFA game study providing initial insights into its social impacts and design considerations.

Authors:Hung Nguyen, Alireza Rahimi, Veronica Whitford, Hélène Fournier, Irina Kondratova, René Richard, Hung Cao
Title: Heart2Mind: Human-Centered Contestable Psychiatric Disorder Diagnosis System using Wearable ECG Monitors
Abstract:
Psychiatric disorders affect millions globally, yet their diagnosis faces significant challenges in clinical practice due to subjective assessments and accessibility concerns, leading to potential delays in treatment. To help address this issue, we present Heart2Mind, a human-centered contestable psychiatric disorder diagnosis system using wearable electrocardiogram (ECG) monitors. Our approach leverages cardiac biomarkers, particularly heart rate variability (HRV) and R-R intervals (RRI) time series, as objective indicators of autonomic dysfunction in psychiatric conditions. The system comprises three key components: (1) a Cardiac Monitoring Interface (CMI) for real-time data acquisition from Polar H9/H10 devices; (2) a Multi-Scale Temporal-Frequency Transformer (MSTFT) that processes RRI time series through integrated time-frequency domain analysis; (3) a Contestable Diagnosis Interface (CDI) combining Self-Adversarial Explanations (SAEs) with contestable Large Language Models (LLMs). Our MSTFT achieves 91.7% accuracy on the HRV-ACC dataset using leave-one-out cross-validation, outperforming state-of-the-art methods. SAEs successfully detect inconsistencies in model predictions by comparing attention-based and gradient-based explanations, while LLMs enable clinicians to validate correct predictions and contest erroneous ones. This work demonstrates the feasibility of combining wearable technology with Explainable Artificial Intelligence (XAI) and contestable LLMs to create a transparent, contestable system for psychiatric diagnosis that maintains clinical oversight while leveraging advanced AI capabilities. Our implementation is publicly available at: https://github.com/Analytics-Everywhere-Lab/heart2mind.
中文: Heart2Mind系统通过可穿戴心电监测设备分析心脏生物标志物,结合可解释人工智能和可争议大语言模型,构建了一个临床可监督的透明精神疾病诊断平台,在保持高准确率的同时允许医生对诊断结果进行验证和争议。
English: Heart2Mind is a contestable psychiatric diagnosis system that uses wearable ECG monitors to analyze cardiac biomarkers through AI, achieving high accuracy while enabling clinician validation through explainable AI and contestable large language models.

Authors:Peizhen Li, Longbing Cao, Xiao-Ming Wu, Runze Yang, Xiaohan Yu
Title: X2C: A Dataset Featuring Nuanced Facial Expressions for Realistic Humanoid Imitation
Abstract:
The ability to imitate realistic facial expressions is essential for humanoid robots engaged in affective human-robot communication. However, the lack of datasets containing diverse humanoid facial expressions with proper annotations hinders progress in realistic humanoid facial expression imitation. To address these challenges, we introduce X2C (Anything to Control), a dataset featuring nuanced facial expressions for realistic humanoid imitation. With X2C, we contribute: 1) a high-quality, high-diversity, large-scale dataset comprising 100,000 (image, control value) pairs. Each image depicts a humanoid robot displaying a diverse range of facial expressions, annotated with 30 control values representing the ground-truth expression configuration; 2) X2CNet, a novel human-to-humanoid facial expression imitation framework that learns the correspondence between nuanced humanoid expressions and their underlying control values from X2C. It enables facial expression imitation in the wild for different human performers, providing a baseline for the imitation task, showcasing the potential value of our dataset; 3) real-world demonstrations on a physical humanoid robot, highlighting its capability to advance realistic humanoid facial expression imitation. Code and Data: https://lipzh5.github.io/X2CNet/

Authors:Cunhang Fan, Xiaoke Yang, Hongyu Zhang, Ying Chen, Lu Li, Jian Zhou, Zhao Lv
Title: ListenNet: A Lightweight Spatio-Temporal Enhancement Nested Network for Auditory Attention Detection
Abstract:
Auditory attention detection (AAD) aims to identify the direction of the attended speaker in multi-speaker environments from brain signals, such as Electroencephalography (EEG) signals. However, existing EEG-based AAD methods overlook the spatio-temporal dependencies of EEG signals, limiting their decoding and generalization abilities. To address these issues, this paper proposes a Lightweight Spatio-Temporal Enhancement Nested Network (ListenNet) for AAD. The ListenNet has three key components: Spatio-temporal Dependency Encoder (STDE), Multi-scale Temporal Enhancement (MSTE), and Cross-Nested Attention (CNA). The STDE reconstructs dependencies between consecutive time windows across channels, improving the robustness of dynamic pattern extraction. The MSTE captures temporal features at multiple scales to represent both fine-grained and long-range temporal patterns. In addition, the CNA integrates hierarchical features more effectively through novel dynamic attention mechanisms to capture deep spatio-temporal correlations. Experimental results on three public datasets demonstrate the superiority of ListenNet over state-of-the-art methods in both subject-dependent and challenging subject-independent settings, while reducing the trainable parameter count by approximately 7 times. Code is available at:https://github.com/fchest/ListenNet.
中文: 本文提出ListenNet轻量网络,通过三个创新组件捕捉脑电信号的时空依赖性来提升听觉注意力检测性能,在显著减少参数的同时实现了更优的识别效果。
English: This paper introduces ListenNet, a lightweight network that enhances auditory attention detection by capturing spatio-temporal dependencies in EEG signals through three novel components, achieving superior performance with significantly fewer parameters.

Authors:Ziyi Xuan, Yiwen Wu, Xuhai Xu, Vinod Namboodiri, Mooi Choo Chuah, Yu Yang
Title: Design and Evaluation of Generative Agent-based Platform for Human-Assistant Interaction Research: A Tale of 10 User Studies
Abstract:
Designing and evaluating personalized and proactive assistant agents remains challenging due to the time, cost, and ethical concerns associated with human-in-the-loop experimentation. Existing Human-Computer Interaction (HCI) methods often require extensive physical setup and human participation, which introduces privacy concerns and limits scalability. Simulated environments offer a partial solution but are typically constrained by rule-based scenarios and still depend heavily on human input to guide interactions and interpret results. Recent advances in large language models (LLMs) have introduced the possibility of generative agents that can simulate realistic human behavior, reasoning, and social dynamics. However, their effectiveness in modeling human-assistant interactions remains largely unexplored. To address this gap, we present a generative agent-based simulation platform designed to simulate human-assistant interactions. We identify ten prior studies on assistant agents that span different aspects of interaction design and replicate these studies using our simulation platform. Our results show that fully simulated experiments using generative agents can approximate key aspects of human-assistant interactions. Based on these simulations, we are able to replicate the core conclusions of the original studies. Our work provides a scalable and cost-effective approach for studying assistant agent design without requiring live human subjects. We will open source both the platform and collected results from the experiments on our website: https://dash-gidea.github.io/.

Authors:Haoran Ye, Jing Jin, Yuhang Xie, Xin Zhang, Guojie Song
Title: Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and Enhancement
Abstract:
The advancement of large language models (LLMs) has outpaced traditional evaluation methodologies. This progress presents novel challenges, such as measuring human-like psychological constructs, moving beyond static and task-specific benchmarks, and establishing human-centered evaluation. These challenges intersect with psychometrics, the science of quantifying the intangible aspects of human psychology, such as personality, values, and intelligence. This review paper introduces and synthesizes the emerging interdisciplinary field of LLM Psychometrics, which leverages psychometric instruments, theories, and principles to evaluate, understand, and enhance LLMs. The reviewed literature systematically shapes benchmarking principles, broadens evaluation scopes, refines methodologies, validates results, and advances LLM capabilities. Diverse perspectives are integrated to provide a structured framework for researchers across disciplines, enabling a more comprehensive understanding of this nascent field. Ultimately, the review provides actionable insights for developing future evaluation paradigms that align with human-level AI and promote the advancement of human-centered AI systems for societal benefit. A curated repository of LLM psychometric resources is available at https://github.com/valuebyte-ai/Awesome-LLM-Psychometrics.
中文摘要:本文综述了新兴的LLM心理测量学领域,该领域运用心理测量工具与理论来应对大语言模型评估中的挑战,致力于建立以人为中心的人工智能系统并推动其发展。
English Summary: This review introduces LLM Psychometrics, an interdisciplinary field using psychometric principles to address the challenges of evaluating large language models beyond traditional benchmarks, aiming to advance human-centered AI systems.

Authors:Donghao Ren, Fred Hohman, Halden Lin, Dominik Moritz
Title: Embedding Atlas: Low-Friction, Interactive Embedding Visualization
Abstract:
Embedding projections are popular for visualizing large datasets and models. However, people often encounter "friction" when using embedding visualization tools: (1) barriers to adoption, e.g., tedious data wrangling and loading, scalability limits, no integration of results into existing workflows, and (2) limitations in possible analyses, without integration with external tools to additionally show coordinated views of metadata. In this paper, we present Embedding Atlas, a scalable, interactive visualization tool designed to make interacting with large embeddings as easy as possible. Embedding Atlas uses modern web technologies and advanced algorithms -- including density-based clustering, and automated labeling -- to provide a fast and rich data analysis experience at scale. We evaluate Embedding Atlas with a competitive analysis against other popular embedding tools, showing that Embedding Atlas's feature set specifically helps reduce friction, and report a benchmark on its real-time rendering performance with millions of points. Embedding Atlas is available as open source to support future work in embedding-based analysis.

Authors:Wei Xiong, Junming Lin, Jiangtong Li, Jie Li, Changjun Jiang
Title: ALFEE: Adaptive Large Foundation Model for EEG Representation
Abstract:
While foundation models excel in text, image, and video domains, the critical biological signals, particularly electroencephalography(EEG), remain underexplored. EEG benefits neurological research with its high temporal resolution, operational practicality, and safety profile. However, low signal-to-noise ratio, inter-subject variability, and cross-paradigm differences hinder the generalization of current models. Existing methods often employ simplified strategies, such as a single loss function or a channel-temporal joint representation module, and suffer from a domain gap between pretraining and evaluation tasks that compromises efficiency and adaptability. To address these limitations, we propose the Adaptive Large Foundation model for EEG signal representation(ALFEE) framework, a novel hybrid transformer architecture with two learning stages for robust EEG representation learning. ALFEE employs a hybrid attention that separates channel-wise feature aggregation from temporal dynamics modeling, enabling robust EEG representation with variable channel configurations. A channel encoder adaptively compresses variable channel information, a temporal encoder captures task-guided evolution, and a hybrid decoder reconstructs signals in both temporal and frequency domains. During pretraining, ALFEE optimizes task prediction, channel and temporal mask reconstruction, and temporal forecasting to enhance multi-scale and multi-channel representation. During fine-tuning, a full-model adaptation with a task-specific token dictionary and a cross-attention layer boosts performance across multiple tasks. After 25,000 hours of pretraining, extensive experimental results on six downstream EEG tasks demonstrate the superior performance of ALFEE over existing models. Our ALFEE framework establishes a scalable foundation for biological signal analysis with implementation at https://github.com/xw1216/ALFEE.
Chinese: ALFEE框架采用混合Transformer架构,通过多阶段学习和混合注意力机制解决脑电信号的关键难题,在六项下游任务中经过大规模预训练后展现出卓越性能。
English: The ALFEE framework introduces a hybrid transformer architecture that overcomes EEG signal challenges through multi-stage learning and hybrid attention, achieving superior performance across six tasks after extensive pretraining.

Authors:Gabriel Gagné, Anisha Azad, Thomas Labbé, Evan Campbell, Xavier Isabel, Erik Scheme, Ulysse Côté-Allard, Benoit Gosselin
Title: Context Informed Incremental Learning Improves Myoelectric Control Performance in Virtual Reality Object Manipulation Tasks
Abstract:
Electromyography (EMG)-based gesture recognition is a promising approach for designing intuitive human-computer interfaces. However, while these systems typically perform well in controlled laboratory settings, their usability in real-world applications is compromised by declining performance during real-time control. This decline is largely due to goal-directed behaviors that are not captured in static, offline scenarios. To address this issue, we use \textit{Context Informed Incremental Learning} (CIIL) - marking its first deployment in an object-manipulation scenario - to continuously adapt the classifier using contextual cues. Nine participants without upper limb differences completed a functional task in a virtual reality (VR) environment involving transporting objects with life-like grips. We compared two scenarios: one where the classifier was adapted in real-time using contextual information, and the other using a traditional open-loop approach without adaptation. The CIIL-based approach not only enhanced task success rates and efficiency, but also reduced the perceived workload by 7.1 %, despite causing a 5.8 % reduction in offline classification accuracy. This study highlights the potential of real-time contextualized adaptation to enhance user experience and usability of EMG-based systems for practical, goal-oriented applications, crucial elements towards their long-term adoption. The source code for this study is available at: https://github.com/BiomedicalITS/ciil-emg-vr.
中文: 该研究在虚拟现实物体操控任务中首次应用了上下文感知增量学习(CIIL),证明实时调整肌电手势识别系统能提升任务完成效率并降低用户负荷,尽管离线分类准确率略有下降。
English: The study introduces Context Informed Incremental Learning (CIIL) in a VR object-manipulation task, showing that real-time adaptation of EMG-based gesture recognition enhances task success and reduces user workload, despite a slight drop in offline accuracy.

Authors:Jai Prakash Veerla, Partha Sai Guttikonda, Helen H. Shang, Mohammad Sadegh Nasr, Cesar Torres, Jacob M. Luber
Title: Beyond the Monitor: Mixed Reality Visualization and AI for Enhanced Digital Pathology Workflow
Abstract:
Pathologists rely on gigapixel whole-slide images (WSIs) to diagnose diseases like cancer, yet current digital pathology tools hinder diagnosis. The immense scale of WSIs, often exceeding 100,000 X 100,000 pixels, clashes with the limited views traditional monitors offer. This mismatch forces constant panning and zooming, increasing pathologist cognitive load, causing diagnostic fatigue, and slowing pathologists' adoption of digital methods. PathVis, our mixed-reality visualization platform for Apple Vision Pro, addresses these challenges. It transforms the pathologist's interaction with data, replacing cumbersome mouse-and-monitor navigation with intuitive exploration using natural hand gestures, eye gaze, and voice commands in an immersive workspace. PathVis integrates AI to enhance diagnosis. An AI-driven search function instantly retrieves and displays the top five similar patient cases side-by-side, improving diagnostic precision and efficiency through rapid comparison. Additionally, a multimodal conversational AI assistant offers real-time image interpretation support and aids collaboration among pathologists across multiple Apple devices. By merging the directness of traditional pathology with advanced mixed-reality visualization and AI, PathVis improves diagnostic workflows, reduces cognitive strain, and makes pathology practice more effective and engaging. The PathVis source code and a demo video are publicly available at: https://github.com/jaiprakash1824/Path_Vis
中文: PathVis是一个针对Apple Vision Pro的混合现实平台,通过手势导航和集成AI进行病例比对与实时辅助,优化了数字病理学工作流程,减轻了认知负担并提升了诊断效率。
English: PathVis is a mixed-reality platform for Apple Vision Pro that enhances digital pathology by enabling intuitive navigation through hand gestures and integrating AI for case comparison and real-time assistance, reducing cognitive load and improving diagnostic efficiency.

Authors:Wolfgang Gritz, Hewi Salih, Anett Hoppe, Ralph Ewerth
Title: From Formulas to Figures: How Visual Elements Impact User Interactions in Educational Videos
Abstract:
Educational videos have become increasingly relevant in today's learning environments. While prior research in laboratory studies has provided valuable insights, analyzing real-world interaction data can enhance our understanding of authentic user behavior. Previous studies have investigated technical aspects, such as the influence of cuts on pausing behavior, but the impact of visual complexity remains understudied. In this paper, we address this gap and propose a novel approach centered on visual complexity, defined as the number of visually distinguishable and meaningful elements in a video frame, such as mathematical equations, chemical formulas, or graphical representations. Our study introduces a fine-grained taxonomy of visual objects in educational videos, expanding on previous classifications. Applying this taxonomy to 25 videos from physics and chemistry, we examine the relationship between visual complexity and user behavior, including pauses, in-video navigation, and session dropouts. The results indicate that increased visual complexity, especially of textual elements, correlates with more frequent pauses, rewinds, and dropouts. The results offer a deeper understanding of how video design affects user behavior in real-world scenarios. Our work has implications for optimizing educational videos, particularly in STEM fields. We make our code publicly available (https://github.com/TIBHannover/from_formulas_to_figures).
中文摘要:本研究探讨了教育视频中视觉复杂度对用户行为的影响,发现复杂度增加,尤其是文本元素,会导致更多暂停、回放和退出行为,为优化视频设计提供了参考。
English Summary: This study investigates how visual complexity in educational videos, particularly in STEM subjects, influences user behavior by correlating increased complexity with more frequent pauses, rewinds, and dropouts, offering insights for optimizing video design.

Authors:Zhonghao Li, Kunpeng Zhang, Jinghuai Ou, Shuliang Liu, Xuming Hu
Title: TreeHop: Generate and Filter Next Query Embeddings Efficiently for Multi-hop Question Answering
Abstract:
Retrieval-augmented generation (RAG) systems face significant challenges in multi-hop question answering (MHQA), where complex queries require synthesizing information across multiple document chunks. Existing approaches typically rely on iterative LLM-based query rewriting and routing, resulting in high computational costs due to repeated LLM invocations and multi-stage processes. To address these limitations, we propose TreeHop, an embedding-level framework without the need for LLMs in query refinement. TreeHop dynamically updates query embeddings by fusing semantic information from prior queries and retrieved documents, enabling iterative retrieval through embedding-space operations alone. This method replaces the traditional "Retrieve-Rewrite-Vectorize-Retrieve" cycle with a streamlined "Retrieve-Embed-Retrieve" loop, significantly reducing computational overhead. Moreover, a rule-based stop criterion is introduced to further prune redundant retrievals, balancing efficiency and recall rate. Experimental results show that TreeHop rivals advanced RAG methods across three open-domain MHQA datasets, achieving comparable performance with only 5\%-0.4\% of the model parameter size and reducing the query latency by approximately 99\% compared to concurrent approaches. This makes TreeHop a faster and more cost-effective solution for deployment in a range of knowledge-intensive applications. For reproducibility purposes, codes and data are available here: https://github.com/allen-li1231/TreeHop-RAG.
中文: TreeHop是一种高效的嵌入级框架,通过语义融合动态更新查询嵌入,大幅降低计算成本和延迟,同时在多跳问答任务中保持优异性能。
English: TreeHop is an efficient embedding-level framework that streamlines multi-hop question answering by dynamically updating query embeddings through semantic fusion, significantly reducing computational costs and latency while maintaining competitive performance.

Authors:Guangyi Liu, Pengxiang Zhao, Liang Liu, Yaxuan Guo, Han Xiao, Weifeng Lin, Yuxiang Chai, Yue Han, Shuai Ren, Hao Wang, Xiaoyu Liang, Wenhao Wang, Tianze Wu, Linghao Li, Hao Wang, Guanjing Xiong, Yong Liu, Hongsheng Li
Title: LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects
Abstract:
With the rapid rise of large language models (LLMs), phone automation has undergone transformative changes. This paper systematically reviews LLM-driven phone GUI agents, highlighting their evolution from script-based automation to intelligent, adaptive systems. We first contextualize key challenges, (i) limited generality, (ii) high maintenance overhead, and (iii) weak intent comprehension, and show how LLMs address these issues through advanced language understanding, multimodal perception, and robust decision-making. We then propose a taxonomy covering fundamental agent frameworks (single-agent, multi-agent, plan-then-act), modeling approaches (prompt engineering, training-based), and essential datasets and benchmarks. Furthermore, we detail task-specific architectures, supervised fine-tuning, and reinforcement learning strategies that bridge user intent and GUI operations. Finally, we discuss open challenges such as dataset diversity, on-device deployment efficiency, user-centric adaptation, and security concerns, offering forward-looking insights into this rapidly evolving field. By providing a structured overview and identifying pressing research gaps, this paper serves as a definitive reference for researchers and practitioners seeking to harness LLMs in designing scalable, user-friendly phone GUI agents.
中文摘要:本文系统综述了大型语言模型驱动的手机图形界面代理从脚本自动化向智能自适应系统的演进,通过先进语言理解解决核心挑战,提出涵盖框架、方法和数据集的分类体系,并指出未来研究方向。
English Summary: This paper systematically reviews the evolution of LLM-driven phone GUI agents from script-based systems to intelligent adaptive solutions, addressing key challenges through advanced language understanding and proposing comprehensive frameworks while highlighting future research directions.

Authors:Tao Wu, Kexue Fu, Qiang Hua, Xinxin Liu, Muhammad Ali Imran, Bo Liu
Title: LEAM: A Prompt-only Large Language Model-enabled Antenna Modeling Method
Abstract:
Antenna modeling is a time-consuming and complex process, decreasing the speed of antenna analysis and design. In this paper, a large language model (LLM)- enabled antenna modeling method, called LEAM, is presented to address this challenge. LEAM enables automatic antenna model generation based on language descriptions via prompt input, images, descriptions from academic papers, patents, and technical reports (either one or multiple). The effectiveness of LEAM is demonstrated by three examples: a Vivaldi antenna generated from a complete user description, a slotted patch antenna generated from an incomplete user description and the operating frequency, and a monopole slotted antenna generated from images and descriptions scanned from the literature. For all the examples, correct antenna models are generated in a few minutes. The code can be accessed via https://github.com/TaoWu974/LEAM.
中文:LEAM是一种基于大语言模型的天线建模方法,能够通过文本描述、图像等多种输入自动生成精确的天线设计,将建模时间大幅缩短至几分钟。
English: LEAM is an innovative antenna modeling method that utilizes large language models to automatically generate accurate antenna designs from various inputs like text descriptions and images, significantly speeding up the process to just minutes.

Authors:Zilin Huang, Zihao Sheng, Zhengyang Wan, Yansong Qu, Yuhao Luo, Boyue Wang, Pei Li, Yen-Jung Chen, Jiancong Chen, Keke Long, Jiayi Meng, Yue Leng, Sikai Chen
Title: Sky-Drive: A Distributed Multi-Agent Simulation Platform for Human-AI Collaborative and Socially-Aware Future Transportation
Abstract:
Recent advances in autonomous system simulation platforms have significantly enhanced the safe and scalable testing of driving policies. However, existing simulators do not yet fully meet the needs of future transportation research-particularly in enabling effective human-AI collaboration and modeling socially-aware driving agents. This paper introduces Sky-Drive, a novel distributed multi-agent simulation platform that addresses these limitations through four key innovations: (a) a distributed architecture for synchronized simulation across multiple terminals; (b) a multi-modal human-in-the-loop framework integrating diverse sensors to collect rich behavioral data; (c) a human-AI collaboration mechanism supporting continuous and adaptive knowledge exchange; and (d) a digital twin framework for constructing high-fidelity virtual replicas of real-world transportation environments. Sky-Drive supports diverse applications such as autonomous vehicle-human road users interaction modeling, human-in-the-loop training, socially-aware reinforcement learning, personalized driving development, and customized scenario generation. Future extensions will incorporate foundation models for context-aware decision support and hardware-in-the-loop testing for real-world validation. By bridging scenario generation, data collection, algorithm training, and hardware integration, Sky-Drive has the potential to become a foundational platform for the next generation of human-centered and socially-aware autonomous transportation systems research. The demo video and code are available at:https://sky-lab-uw.github.io/Sky-Drive-website/

Authors:Kazi Shahrukh Omar, Shuaijie Wang, Ridhuparan Kungumaraju, Tanvi Bhatt, Fabio Miranda
Title: VIGMA: An Open-Access Framework for Visual Gait and Motion Analytics
Abstract:
Gait disorders are commonly observed in older adults, who frequently experience various issues related to walking. Additionally, researchers and clinicians extensively investigate mobility related to gait in typically and atypically developing children, athletes, and individuals with orthopedic and neurological disorders. Effective gait analysis enables the understanding of the causal mechanisms of mobility and balance control of patients, the development of tailored treatment plans to improve mobility, the reduction of fall risk, and the tracking of rehabilitation progress. However, analyzing gait data is a complex task due to the multivariate nature of the data, the large volume of information to be interpreted, and the technical skills required. Existing tools for gait analysis are often limited to specific patient groups (e.g., cerebral palsy), only handle a specific subset of tasks in the entire workflow, and are not openly accessible. To address these shortcomings, we conducted a requirements assessment with gait practitioners (e.g., researchers, clinicians) via surveys and identified key components of the workflow, including (1) data processing and (2) data analysis and visualization. Based on the findings, we designed VIGMA, an open-access visual analytics framework integrated with computational notebooks and a Python library, to meet the identified requirements. Notably, the framework supports analytical capabilities for assessing disease progression and for comparing multiple patient groups. We validated the framework through usage scenarios with experts specializing in gait and mobility rehabilitation. VIGMA is available at https://github.com/komar41/VIGMA.
中文摘要:步态分析对于理解不同人群的行走问题至关重要,但现有工具存在局限;为此,开发了开源可视化分析框架VIGMA来弥补不足,并已通过专家验证。
English Summary: Gait analysis is crucial for understanding mobility issues across diverse populations, but existing tools are limited; thus, VIGMA, an open-access visual analytics framework, was developed to address these gaps and validated by experts.

Authors:Aniketh Garikaparthi, Manasi Patwardhan, Lovekesh Vig, Arman Cohan
Title: IRIS: Interactive Research Ideation System for Accelerating Scientific Discovery
Abstract:
The rapid advancement in capabilities of large language models (LLMs) raises a pivotal question: How can LLMs accelerate scientific discovery? This work tackles the crucial first stage of research, generating novel hypotheses. While recent work on automated hypothesis generation focuses on multi-agent frameworks and extending test-time compute, none of the approaches effectively incorporate transparency and steerability through a synergistic Human-in-the-loop (HITL) approach. To address this gap, we introduce IRIS: Interactive Research Ideation System, an open-source platform designed for researchers to leverage LLM-assisted scientific ideation. IRIS incorporates innovative features to enhance ideation, including adaptive test-time compute expansion via Monte Carlo Tree Search (MCTS), fine-grained feedback mechanism, and query-based literature synthesis. Designed to empower researchers with greater control and insight throughout the ideation process. We additionally conduct a user study with researchers across diverse disciplines, validating the effectiveness of our system in enhancing ideation. We open-source our code at https://github.com/Anikethh/IRIS-Interactive-Research-Ideation-System
中文: 本文提出开源平台IRIS,通过自适应计算和文献整合等功能将人类反馈与大型语言模型相结合,有效提升科学假说生成能力,并经过跨学科用户研究验证。
English: This paper introduces IRIS, an open-source platform that enhances scientific hypothesis generation by integrating human feedback with LLMs through features like adaptive computation and literature synthesis, validated by a cross-disciplinary user study.

Authors:Chang Zong, Bin Li, Shoujun Zhou, Jian Wan, Lei Zhang
Title: Ask2Loc: Learning to Locate Instructional Visual Answers by Asking Questions
Abstract:
Locating specific segments within an instructional video is an efficient way to acquire guiding knowledge. Generally, the task of obtaining video segments for both verbal explanations and visual demonstrations is known as visual answer localization (VAL). However, users often need multiple interactions to obtain answers that align with their expectations when using the system. During these interactions, humans deepen their understanding of the video content by asking themselves questions, thereby accurately identifying the location. Therefore, we propose a new task, named In-VAL, to simulate the multiple interactions between humans and videos in the procedure of obtaining visual answers. The In-VAL task requires interactively addressing several semantic gap issues, including 1) the ambiguity of user intent in the input questions, 2) the incompleteness of language in video subtitles, and 3) the fragmentation of content in video segments. To address these issues, we propose Ask2Loc, a framework for resolving In-VAL by asking questions. It includes three key modules: 1) a chatting module to refine initial questions and uncover clear intentions, 2) a rewriting module to generate fluent language and create complete descriptions, and 3) a searching module to broaden local context and provide integrated content. We conduct extensive experiments on three reconstructed In-VAL datasets. Compared to traditional end-to-end and two-stage methods, our proposed Ask2Loc can improve performance by up to 14.91 (mIoU) on the In-VAL task. Our code and datasets can be accessed at https://github.com/changzong/Ask2Loc.
中文: 本文提出In-VAL任务,模拟人与视频的交互过程,通过Ask2Loc框架解决语义鸿沟问题来定位教学片段,相比传统方法性能提升高达14.91 mIoU。
English: The paper introduces In-VAL, a task simulating human-video interactions to locate instructional segments by resolving semantic gaps through the Ask2Loc framework, which enhances performance by up to 14.91 mIoU over traditional methods.

Authors:Yike Zhang, Eduardo Davalos, Jack Noble
Title: Vision6D: 3D-to-2D Interactive Visualization and Annotation Tool for 6D Pose Estimation
Abstract:
Accurate 6D pose estimation has gained more attention over the years for robotics-assisted tasks that require precise interaction with physical objects. This paper presents an interactive 3D-to-2D visualization and annotation tool to support the 6D pose estimation research community. To the best of our knowledge, the proposed work is the first tool that allows users to visualize and manipulate 3D objects interactively on a 2D real-world scene, along with a comprehensive user study. This system supports robust 6D camera pose annotation by providing both visual cues and spatial relationships to determine object position and orientation in various environments. The annotation feature in Vision6D is particularly helpful in scenarios where the transformation matrix between the camera and world objects is unknown, as it enables accurate annotation of these objects' poses using only the camera intrinsic matrix. This capability serves as a foundational step in developing and training advanced pose estimation models across various domains. We evaluate Vision6D's effectiveness by utilizing widely-used open-source pose estimation datasets Linemod and HANDAL through comparisons between the default ground-truth camera poses with manual annotations. A user study was performed to show that Vision6D generates accurate pose annotations via visual cues in an intuitive 3D user interface. This approach aims to bridge the gap between 2D scene projections and 3D scenes, offering an effective way for researchers and developers to solve 6D pose annotation related problems. The software is open-source and publicly available at https://github.com/InteractiveGL/vision6D.
中文: 本文介绍了Vision6D这一交互式3D到2D可视化标注工具,它通过让用户在真实场景中操控3D物体来实现精确的6D姿态估计,并附有用户研究和开源发布。
English: This paper introduces Vision6D, an interactive 3D-to-2D visualization and annotation tool that enables precise 6D pose estimation by allowing users to manipulate 3D objects in real-world scenes, supported by a user study and open-source availability.

Authors:Ziwen Xu, Shuxun Wang, Kewei Xu, Haoming Xu, Mengru Wang, Xinle Deng, Yunzhi Yao, Guozhou Zheng, Huajun Chen, Ningyu Zhang
Title: EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models
Abstract:
In this paper, we introduce EasyEdit2, a framework designed to enable plug-and-play adjustability for controlling Large Language Model (LLM) behaviors. EasyEdit2 supports a wide range of test-time interventions, including safety, sentiment, personality, reasoning patterns, factuality, and language features. Unlike its predecessor, EasyEdit2 features a new architecture specifically designed for seamless model steering. It comprises key modules such as the steering vector generator and the steering vector applier, which enable automatic generation and application of steering vectors to influence the model's behavior without modifying its parameters. One of the main advantages of EasyEdit2 is its ease of use-users do not need extensive technical knowledge. With just a single example, they can effectively guide and adjust the model's responses, making precise control both accessible and efficient. Empirically, we report model steering performance across different LLMs, demonstrating the effectiveness of these techniques. We have released the source code on GitHub at https://github.com/zjunlp/EasyEdit along with a demonstration notebook. In addition, we provide a demo video at https://www.youtube.com/watch?v=AkfoiPfp5rQ for a quick introduction.
中文:EasyEdit2是一个即插即用的框架,通过测试时干预让用户无需深厚技术知识即可轻松控制大型语言模型的行为,仅需一个示例即可精确调整模型响应。
English: EasyEdit2 is a plug-and-play framework that enables users to easily control Large Language Model behaviors through test-time interventions, requiring minimal technical knowledge and allowing precise adjustments with just a single example.

Authors:Yiqian Yang
Title: NeuGaze: Reshaping the future BCI
Abstract:
Traditional brain-computer interfaces (BCIs), reliant on costly electroencephalography or invasive implants, struggle with complex human-computer interactions due to setup complexity and limited precision. We present NeuGaze, a novel webcam-based system that leverages eye gaze, head movements, and facial expressions to enable intuitive, real-time control using only a standard 30 Hz webcam, often pre-installed in laptops. Requiring minimal calibration, NeuGaze achieves performance comparable to conventional inputs, supporting precise cursor navigation, key triggering via an efficient skill wheel, and dynamic gaming interactions, such as defeating formidable opponents in first-person games. By harnessing preserved neck-up functionalities in motor-impaired individuals, NeuGaze eliminates the need for specialized hardware, offering a low-cost, accessible alternative to BCIs. This paradigm empowers diverse applications, from assistive technology to entertainment, redefining human-computer interaction for motor-impaired users. Project is at \href{https://github.com/NeuSpeech/NeuGaze}{github.com/NeuSpeech/NeuGaze}.
中文: NeuGaze 提出了一种基于网络摄像头的系统,利用眼球注视、头部动作和面部表情实现直观的实时控制,为传统脑机接口提供了无需专用硬件的低成本替代方案。
English: NeuGaze introduces a webcam-based system using eye gaze, head movements, and facial expressions for intuitive, real-time control, offering a low-cost alternative to traditional BCIs without specialized hardware.

Authors:Chaoyun Zhang, He Huang, Chiming Ni, Jian Mu, Si Qin, Shilin He, Lu Wang, Fangkai Yang, Pu Zhao, Chao Du, Liqun Li, Yu Kang, Zhao Jiang, Suzhen Zheng, Rujia Wang, Jiaxu Qian, Minghua Ma, Jian-Guang Lou, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
Title: UFO2: The Desktop AgentOS
Abstract:
Recent Computer-Using Agents (CUAs), powered by multimodal large language models (LLMs), offer a promising direction for automating complex desktop workflows through natural language. However, most existing CUAs remain conceptual prototypes, hindered by shallow OS integration, fragile screenshot-based interaction, and disruptive execution. We present UFO2, a multiagent AgentOS for Windows desktops that elevates CUAs into practical, system-level automation. UFO2 features a centralized HostAgent for task decomposition and coordination, alongside a collection of application-specialized AppAgent equipped with native APIs, domain-specific knowledge, and a unified GUI--API action layer. This architecture enables robust task execution while preserving modularity and extensibility. A hybrid control detection pipeline fuses Windows UI Automation (UIA) with vision-based parsing to support diverse interface styles. Runtime efficiency is further enhanced through speculative multi-action planning, reducing per-step LLM overhead. Finally, a Picture-in-Picture (PiP) interface enables automation within an isolated virtual desktop, allowing agents and users to operate concurrently without interference. We evaluate UFO2 across over 20 real-world Windows applications, demonstrating substantial improvements in robustness and execution accuracy over prior CUAs. Our results show that deep OS integration unlocks a scalable path toward reliable, user-aligned desktop automation.
中文:UFO2提出了一种面向Windows的多智能体操作系统,通过深度系统集成、专用代理和混合控制检测技术,显著提升了桌面自动化的鲁棒性和执行精度,优于现有系统。
English: UFO2 introduces a multiagent AgentOS for Windows that enhances desktop automation through deep OS integration, specialized agents, and hybrid control detection, significantly improving robustness and execution accuracy over previous systems.

Authors:Jiwei Li, Bi Zhang, Xiaowei Tan, Wanxin Chen, Zhaoyuan Liu, Juanjuan Zhang, Weiguang Huo, Jian Huang, Lianqing Liu, Xingang Zhao
Title: K2MUSE: A human lower limb multimodal dataset under diverse conditions for facilitating rehabilitation robotics
Abstract:
The natural interaction and control performance of lower limb rehabilitation robots are closely linked to biomechanical information from various human locomotion activities. Multidimensional human motion data significantly deepen the understanding of the complex mechanisms governing neuromuscular alterations, thereby facilitating the development and application of rehabilitation robots in multifaceted real-world environments. However, currently available lower limb datasets are inadequate for supplying the essential multimodal data and large-scale gait samples necessary for effective data-driven approaches, and they neglect the significant effects of acquisition interference in real applications.To fill this gap, we present the K2MUSE dataset, which includes a comprehensive collection of multimodal data, comprising kinematic, kinetic, amplitude-mode ultrasound (AUS), and surface electromyography (sEMG) measurements. The proposed dataset includes lower limb multimodal data from 30 able-bodied participants walking under different inclines (0$^\circ$, $\pm$5$^\circ$, and $\pm$10$^\circ$), various speeds (0.5 m/s, 1.0 m/s, and 1.5 m/s), and different nonideal acquisition conditions (muscle fatigue, electrode shifts, and inter-day differences). The kinematic and ground reaction force data were collected via a Vicon motion capture system and an instrumented treadmill with embedded force plates, whereas the sEMG and AUS data were synchronously recorded for thirteen muscles on the bilateral lower limbs. This dataset offers a new resource for designing control frameworks for rehabilitation robots and conducting biomechanical analyses of lower limb locomotion. The dataset is available at https://k2muse.github.io/.

Authors:Dezhao Luo, Bohan Tang, Kang Li, Georgios Papoudakis, Jifei Song, Shaogang Gong, Jianye Hao, Jun Wang, Kun Shao
Title: ViMo: A Generative Visual GUI World Model for App Agents
Abstract:
App agents, which autonomously operate mobile Apps through Graphical User Interfaces (GUIs), have gained significant interest in real-world applications. Yet, they often struggle with long-horizon planning, failing to find the optimal actions for complex tasks with longer steps. To address this, world models are used to predict the next GUI observation based on user actions, enabling more effective agent planning. However, existing world models primarily focus on generating only textual descriptions, lacking essential visual details. To fill this gap, we propose ViMo, the first visual world model designed to generate future App observations as images. For the challenge of generating text in image patches, where even minor pixel errors can distort readability, we decompose GUI generation into graphic and text content generation. We propose a novel data representation, the Symbolic Text Representation~(STR) to overlay text content with symbolic placeholders while preserving graphics. With this design, ViMo employs a STR Predictor to predict future GUIs' graphics and a GUI-text Predictor for generating the corresponding text. Moreover, we deploy ViMo to enhance agent-focused tasks by predicting the outcome of different action options. Experiments show ViMo's ability to generate visually plausible and functionally effective GUIs that enable App agents to make more informed decisions.
中文摘要:ViMo是一种创新的视觉世界模型,通过分离图形与文本生成未来应用界面图像,使应用代理能通过更优规划做出更明智决策。
English Summary: ViMo is a novel visual world model that generates future app interface images by separating graphics and text, enabling app agents to make better decisions through improved planning.

Authors:Kunihiko Fujiwara, Ryuta Tsurumi, Tomoki Kiyono, Zicheng Fan, Xiucheng Liang, Binyu Lei, Winston Yap, Koichi Ito, Filip Biljecki
Title: VoxCity: A Seamless Framework for Open Geospatial Data Integration, Grid-Based Semantic 3D City Model Generation, and Urban Environment Simulation
Abstract:
Three-dimensional urban environment simulation is a powerful tool for informed urban planning. However, the intensive manual effort required to prepare input 3D city models has hindered its widespread adoption. To address this challenge, we present VoxCity, an open-source Python package that provides a one-stop solution for grid-based 3D city model generation and urban environment simulation for cities worldwide. VoxCity's `generator' subpackage automatically downloads building heights, tree canopy heights, land cover, and terrain elevation within a specified target area, and voxelizes buildings, trees, land cover, and terrain to generate an integrated voxel city model. The `simulator' subpackage enables users to conduct environmental simulations, including solar radiation and view index analyses. Users can export the generated models using several file formats compatible with external software, such as ENVI-met (INX), Blender, and Rhino (OBJ). We generated 3D city models for eight global cities, and demonstrated the calculation of solar irradiance, sky view index, and green view index. We also showcased microclimate simulation and 3D rendering visualization through ENVI-met and Rhino, respectively, through the file export function. Additionally, we reviewed openly available geospatial data to create guidelines to help users choose appropriate data sources depending on their target areas and purposes. VoxCity can significantly reduce the effort and time required for 3D city model preparation and promote the utilization of urban environment simulations. This contributes to more informed urban and architectural design that considers environmental impacts, and in turn, fosters sustainable and livable cities. VoxCity is released openly at https://github.com/kunifujiwara/VoxCity.
中文: VoxCity是一个开源Python工具包,能自动生成三维体素城市模型并进行环境模拟,如太阳辐射分析,大幅降低了城市规划中的人工成本。
English: VoxCity is an open-source Python package that automates the generation of 3D voxel city models and enables environmental simulations like solar radiation analysis, significantly reducing manual effort in urban planning.

Authors:Guangyi Liu, Pengxiang Zhao, Liang Liu, Zhiming Chen, Yuxiang Chai, Shuai Ren, Hao Wang, Shibo He, Wenchao Meng
Title: LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark
Abstract:
Mobile GUI agents show promise in automating tasks but face generalization challenges in diverse real-world scenarios. Traditional approaches using pre-training or fine-tuning with massive datasets struggle with the diversity of mobile applications and user-specific tasks. We propose enhancing mobile GUI agent capabilities through human demonstrations, focusing on improving performance in unseen scenarios rather than pursuing universal generalization through larger datasets. To realize this paradigm, we introduce LearnGUI, the first comprehensive dataset specifically designed for studying demonstration-based learning in mobile GUI agents, comprising 2,252 offline tasks and 101 online tasks with high-quality human demonstrations. We further develop LearnAct, a sophisticated multi-agent framework that automatically extracts knowledge from demonstrations to enhance task completion. This framework integrates three specialized agents: DemoParser for knowledge extraction, KnowSeeker for relevant knowledge retrieval, and ActExecutor for demonstration-enhanced task execution. Our experimental results show significant performance gains in both offline and online evaluations. In offline assessments, a single demonstration improves model performance, increasing Gemini-1.5-Pro's accuracy from 19.3% to 51.7%. In online evaluations, our framework enhances UI-TARS-7B-SFT's task success rate from 18.1% to 32.8%. LearnAct framework and LearnGUI benchmark establish demonstration-based learning as a promising direction for more adaptable, personalized, and deployable mobile GUI agents.

Authors:Xia Deng, Shen Chen, Jiale Zhou, Lei Li
Title: Mind2Matter: Creating 3D Models from EEG Signals
Abstract:
The reconstruction of 3D objects from brain signals has gained significant attention in brain-computer interface (BCI) research. Current research predominantly utilizes functional magnetic resonance imaging (fMRI) for 3D reconstruction tasks due to its excellent spatial resolution. Nevertheless, the clinical utility of fMRI is limited by its prohibitive costs and inability to support real-time operations. In comparison, electroencephalography (EEG) presents distinct advantages as an affordable, non-invasive, and mobile solution for real-time brain-computer interaction systems. While recent advances in deep learning have enabled remarkable progress in image generation from neural data, decoding EEG signals into structured 3D representations remains largely unexplored. In this paper, we propose a novel framework that translates EEG recordings into 3D object reconstructions by leveraging neural decoding techniques and generative models. Our approach involves training an EEG encoder to extract spatiotemporal visual features, fine-tuning a large language model to interpret these features into descriptive multimodal outputs, and leveraging generative 3D Gaussians with layout-guided control to synthesize the final 3D structures. Experiments demonstrate that our model captures salient geometric and semantic features, paving the way for applications in brain-computer interfaces (BCIs), virtual reality, and neuroprosthetics. Our code is available in https://github.com/sddwwww/Mind2Matter.
中文: 本研究提出了一种创新框架,通过神经解码和生成模型将脑电图信号转化为三维物体重建,为脑机接口和虚拟现实应用开辟了新途径。
English: This study introduces a novel framework that translates EEG signals into 3D object reconstructions using neural decoding and generative models, demonstrating potential for brain-computer interfaces and virtual reality applications.

Authors:Xinyi Liu, Xiaoyi Zhang, Ziyun Zhang, Yan Lu
Title: UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction Synthesis
Abstract:
Recent advancements in Large Vision-Language Models are accelerating the development of Graphical User Interface (GUI) agents that utilize human-like vision perception capabilities to enhance productivity on digital devices. Compared to approaches predicated on GUI metadata, which are platform-dependent and vulnerable to implementation variations, vision-based approaches offer broader applicability. In this vision-based paradigm, the GUI instruction grounding, which maps user instruction to the location of corresponding element on the given screenshot, remains a critical challenge, particularly due to limited public training dataset and resource-intensive manual instruction data annotation. In this paper, we delve into unexplored challenges in this task including element-to-screen ratio, unbalanced element type, and implicit instruction. To address these challenges, we introduce a large-scale data synthesis pipeline UI-E2I-Synth for generating varying complex instruction datasets using GPT-4o instead of human annotators. Furthermore, we propose a new GUI instruction grounding benchmark UI-I2E-Bench, which is designed to address the limitations of existing benchmarks by incorporating diverse annotation aspects. Our model, trained on the synthesized data, achieves superior performance in GUI instruction grounding, demonstrating the advancements of proposed data synthesis pipeline. The proposed benchmark, accompanied by extensive analyses, provides practical insights for future research in GUI grounding. We will release corresponding artifacts at https://microsoft.github.io/FIVE-UI-Evol/ .

Authors:Md Rakibul Hasan, Md Zakir Hossain, Aneesh Krishna, Shafin Rahman, Tom Gedeon
Title: TFMPathy: Tabular Foundation Model for Privacy-Aware, Generalisable Empathy Detection from Videos
Abstract:
Detecting empathy from video interactions is an emerging area of research, particularly in healthcare and social robotics. However, privacy and ethical concerns often prevent the release of raw video data, with many datasets instead shared as pre-extracted tabular features. Previous work on such datasets has established classical tree-based models as the state of the art. Motivated by recent successes of large-scale foundation models for text, we investigate the potential of tabular foundation models (TFMs) for empathy detection from video-derived tabular data. Our proposed system, TFMPathy, is demonstrated with two recent TFMs (TabPFN v2 and TabICL) under both in-context learning and fine-tuning paradigms. On a public human-robot interaction benchmark, TFMPathy significantly improves empathy detection accuracy reported in the literature. While the established evaluation protocol in the literature does not ensure cross-subject generalisation, our evaluation scheme also captures such generalisation. We show that TFMPathy under a fine-tuning setup has better cross-subject generalisation capacity over baseline methods (accuracy: $0.590 \rightarrow 0.730$; AUC: $0.564 \rightarrow 0.669$). Given the ongoing privacy and ethical constraints around raw video sharing, the proposed TFMPathy system provides a practical and scalable path toward building AI systems dependent on human-centred video datasets. Our code is publicly available at https://github.com/hasan-rakibul/TFMPathy (will be made available upon acceptance of this paper).
中文: 提出的TFMPathy系统利用表格基础模型,在解决隐私限制的同时,显著提升了基于视频衍生数据的共情检测准确率和跨被试泛化能力。
English: The proposed TFMPathy system leverages tabular foundation models to significantly improve empathy detection accuracy and cross-subject generalization from video-derived data while addressing privacy constraints.

Authors:Vikranth Udandarao, Noel Abraham Tiju, Muthuraj Vairamuthu, Harsh Mistry, Dhruv Kumar
Title: Roamify: Designing and Evaluating an LLM Based Google Chrome Extension for Personalised Itinerary Planning
Abstract:
In this paper, we present Roamify, an Artificial Intelligence powered travel assistant that aims to ease the process of travel planning. We have tested and used multiple Large Language Models like Llama and T5 to generate personalised itineraries per user preferences. Results from user surveys highlight the preference for AI powered mediums over existing methods to help in travel planning across all user age groups. These results firmly validate the potential need of such a travel assistant. We highlight the two primary design considerations for travel assistance: D1) incorporating a web-scraping method to gather up-to-date news articles about destinations from various blog sources, which significantly improves our itinerary suggestions, and D2) utilising user preferences to create customised travel experiences along with a recommendation system which changes the itinerary according to the user needs. Our findings suggest that Roamify has the potential to improve and simplify how users across multiple age groups plan their travel experiences.
中文: Roamify是一款人工智能驱动的旅行助手,它利用大型语言模型和网络爬取数据,根据用户偏好生成个性化行程,研究显示各年龄段用户均倾向于使用此类AI工具来简化旅行规划。
English: Roamify is an AI-powered travel assistant that uses large language models to create personalized itineraries based on user preferences and real-time web-scraped data, demonstrating strong user preference across all age groups for simplifying travel planning.

Authors:Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, Animesh Garg, Nanyun Peng, Fei Sha, Rose Yu, Carl Vondrick, James Zou
Title: Can LLM feedback enhance review quality? A randomized study of 20K reviews at ICLR 2025
Abstract:
Peer review at AI conferences is stressed by rapidly rising submission volumes, leading to deteriorating review quality and increased author dissatisfaction. To address these issues, we developed Review Feedback Agent, a system leveraging multiple large language models (LLMs) to improve review clarity and actionability by providing automated feedback on vague comments, content misunderstandings, and unprofessional remarks to reviewers. Implemented at ICLR 2025 as a large randomized control study, our system provided optional feedback to more than 20,000 randomly selected reviews. To ensure high-quality feedback for reviewers at this scale, we also developed a suite of automated reliability tests powered by LLMs that acted as guardrails to ensure feedback quality, with feedback only being sent to reviewers if it passed all the tests. The results show that 27% of reviewers who received feedback updated their reviews, and over 12,000 feedback suggestions from the agent were incorporated by those reviewers. This suggests that many reviewers found the AI-generated feedback sufficiently helpful to merit updating their reviews. Incorporating AI feedback led to significantly longer reviews (an average increase of 80 words among those who updated after receiving feedback) and more informative reviews, as evaluated by blinded researchers. Moreover, reviewers who were selected to receive AI feedback were also more engaged during paper rebuttals, as seen in longer author-reviewer discussions. This work demonstrates that carefully designed LLM-generated review feedback can enhance peer review quality by making reviews more specific and actionable while increasing engagement between reviewers and authors. The Review Feedback Agent is publicly available at https://github.com/zou-group/review_feedback_agent.
中文: 该评审反馈代理系统利用大语言模型为同行评审提供自动反馈,通过在ICLR 2025的大规模实验证明,能显著提升评审质量、增加评审长度并促进审稿人参与度。
English: The Review Feedback Agent uses large language models to provide automated feedback on peer reviews, significantly improving review quality, length, and reviewer engagement as demonstrated in a large-scale ICLR 2025 study.

Authors:Jiahao Qiu, Yinghui He, Xinzhe Juan, Yimin Wang, Yuhan Liu, Zixin Yao, Yue Wu, Xun Jiang, Ling Yang, Mengdi Wang
Title: EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety
Abstract:
The rise of LLM-driven AI characters raises safety concerns, particularly for vulnerable human users with psychological disorders. To address these risks, we propose EmoAgent, a multi-agent AI framework designed to evaluate and mitigate mental health hazards in human-AI interactions. EmoAgent comprises two components: EmoEval simulates virtual users, including those portraying mentally vulnerable individuals, to assess mental health changes before and after interactions with AI characters. It uses clinically proven psychological and psychiatric assessment tools (PHQ-9, PDI, PANSS) to evaluate mental risks induced by LLM. EmoGuard serves as an intermediary, monitoring users' mental status, predicting potential harm, and providing corrective feedback to mitigate risks. Experiments conducted in popular character-based chatbots show that emotionally engaging dialogues can lead to psychological deterioration in vulnerable users, with mental state deterioration in more than 34.4% of the simulations. EmoGuard significantly reduces these deterioration rates, underscoring its role in ensuring safer AI-human interactions. Our code is available at: https://github.com/1akaman/EmoAgent
中文摘要:EmoAgent框架通过EmoEval组件评估AI交互导致的心理健康风险,并利用EmoGuard实时监测干预,实验证明能显著降低脆弱用户群体34.4%以上的心理状态恶化率。
English Summary: The EmoAgent framework addresses mental health risks in human-AI interactions by using EmoEval to assess psychological deterioration through clinical tools and EmoGuard to monitor and mitigate harm, significantly reducing deterioration rates in vulnerable users.

Authors:Iason Chaimalas, Arnas Vyšniauskas, Gabriel Brostow
Title: Explorer: Robust Collection of Interactable GUI Elements
Abstract:
Automation of existing Graphical User Interfaces (GUIs) is important but hard to achieve. Upstream of making the GUI user-accessible or somehow scriptable, even the data-collection to understand the original interface poses significant challenges. For example, large quantities of general UI data seem helpful for training general machine learning (ML) models, but accessibility for each person can hinge on the ML's precision on a specific app. We therefore take the perspective that a given user needs confidence, that the relevant UI elements are being detected correctly throughout one app or digital environment. We mostly assume that the target application is known in advance, so that data collection and ML-training can be personalized for the test-time target domain. The proposed Explorer system focuses on detecting on-screen buttons and text-entry fields, i.e. interactables, where the training process has access to a live version of the application. The live application can run on almost any popular platform except iOS phones, and the collection is especially streamlined for Android phones or for desktop Chrome browsers. Explorer also enables the recording of interactive user sessions, and subsequent mapping of how such sessions overlap and sometimes loop back to similar states. We show how having such a map enables a kind of path planning through the GUI, letting a user issue audio commands to get to their destination. Critically, we are releasing our code for Explorer openly at https://github.com/varnelis/Explorer.
Chinese: Explorer系统通过专注于检测按钮和文本字段等交互元素,利用实时应用数据训练机器学习模型,实现个性化的图形用户界面自动化,从而提供精确的用户特定可访问性,并通过语音命令进行路径规划。
English: The Explorer system enables personalized automation of graphical user interfaces by focusing on detecting interactive elements like buttons and text fields, using live application data to train machine learning models for precise, user-specific accessibility and path planning through audio commands.

Authors:Xijin Ge
Title: DataMap: A Portable Application for Visualizing High-Dimensional Data
Abstract:
Motivation: The visualization and analysis of high-dimensional data are essential in biomedical research. There is a need for secure, scalable, and reproducible tools to facilitate data exploration and interpretation. Results: We introduce DataMap, a browser-based application for visualization of high-dimensional data using heatmaps, principal component analysis (PCA), and t-distributed stochastic neighbor embedding (t-SNE). DataMap runs in the web browser, ensuring data privacy while eliminating the need for installation or a server. The application has an intuitive user interface for data transformation, annotation, and generation of reproducible R code. Availability and Implementation: Freely available as a GitHub page https://gexijin.github.io/datamap/. The source code can be found at https://github.com/gexijin/datamap, and can also be installed as an R package. Contact: Xijin.Ge@sdstate.ed
中文:DataMap是一款基于浏览器的安全工具,可通过热图、PCA和t-SNE可视化高维生物医学数据,无需安装即可保障数据隐私并生成可复现的R代码。
English: DataMap is a secure, browser-based tool for visualizing high-dimensional biomedical data through heatmaps, PCA, and t-SNE, offering data privacy and reproducible R code without installation.

Authors:Kaixin Li, Ziyang Meng, Hongzhan Lin, Ziyang Luo, Yuchen Tian, Jing Ma, Zhiyong Huang, Tat-Seng Chua
Title: ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use
Abstract:
Recent advancements in Multi-modal Large Language Models (MLLMs) have led to significant progress in developing GUI agents for general tasks such as web browsing and mobile phone use. However, their application in professional domains remains under-explored. These specialized workflows introduce unique challenges for GUI perception models, including high-resolution displays, smaller target sizes, and complex environments. In this paper, we introduce ScreenSpot-Pro, a new benchmark designed to rigorously evaluate the grounding capabilities of MLLMs in high-resolution professional settings. The benchmark comprises authentic high-resolution images from a variety of professional domains with expert annotations. It spans 23 applications across five industries and three operating systems. Existing GUI grounding models perform poorly on this dataset, with the best model achieving only 18.9%. Our experiments reveal that strategically reducing the search area enhances accuracy. Based on this insight, we propose ScreenSeekeR, a visual search method that utilizes the GUI knowledge of a strong planner to guide a cascaded search, achieving state-of-the-art performance with 48.1% without any additional training. We hope that our benchmark and findings will advance the development of GUI agents for professional applications. Code, data and leaderboard can be found at https://gui-agent.github.io/grounding-leaderboard.
中文摘要:ScreenSpot-Pro是一个针对专业GUI环境评估多模态大语言模型的新基准,现有模型在此表现不佳,而提出的ScreenSeekeR方法无需额外训练即可显著提升性能。
English Summary: ScreenSpot-Pro is a new benchmark for evaluating Multi-modal Large Language Models (MLLMs) in professional GUI environments, where existing models struggle, and the proposed ScreenSeekeR method significantly improves performance without additional training.

Authors:Ben Cheng, Yize Chen
Title: Open Datasets for Grid Modeling and Visualization: An Alberta Power Network Case
Abstract:
In the power and energy industry, multiple entities in grid operational logs are frequently recorded and updated. Thanks to recent advances in IT facilities and smart metering services, a variety of datasets such as system load, generation mix, and grid connection are often publicly available. While these resources are valuable in evaluating power grid's operational conditions and system resilience, the lack of fine-grained, accurate locational information constrain the usage of current data, which further hinders the development of smart grid and renewables integration. For instance, electricity end users are not aware of nodal generation mix or carbon emissions, while the general public have limited understanding about the effect of demand response or renewables integration if only the whole system's demands and generations are available. In this work, we focus on recovering power grid topology and line flow directions from open public dataset. Taking the Alberta grid as a working example, we start from mapping multi-modal power system datasets to the grid topology integrated with geographical information. By designing a novel optimization-based scheme to recover line flow directions, we are able to analyze and visualize the interactions between generations and demand vectors in an efficient manner. Proposed research is fully open-sourced and highly generalizable, which can help model and visualize grid information, create synthetic dataset, and facilitate analytics and decision-making framework for clean energy transition.
中文摘要:本研究开发了一种基于公开数据的开源方法,通过创新优化方案重建电网拓扑与线路潮流方向,可有效支持电网建模分析并推动清洁能源转型决策。
English Summary: This study develops an open-source method to reconstruct power grid topology and line flow directions from public datasets, enabling enhanced grid modeling and clean energy transition analytics through a novel optimization approach.

Authors:Donghao Ren, Fred Hohman, Dominik Moritz
Title: A Scalable Approach to Clustering Embedding Projections
Abstract:
Interactive visualization of embedding projections is a useful technique for understanding data and evaluating machine learning models. Labeling data within these visualizations is critical for interpretation, as labels provide an overview of the projection and guide user navigation. However, most methods for producing labels require clustering the points, which can be computationally expensive as the number of points grows. In this paper, we describe an efficient clustering approach using kernel density estimation in the projected 2D space instead of points. This algorithm can produce high-quality cluster regions from a 2D density map in a few hundred milliseconds, orders of magnitude faster than current approaches. We contribute the design of the algorithm, benchmarks, and applications that demonstrate the utility of the algorithm, including labeling and summarization.
Chinese: 本文提出了一种在二维投影空间中使用核密度估计的高效聚类方法,能够快速生成高质量聚类区域用于交互式可视化标注,其速度显著优于现有方法。
English: This paper introduces an efficient clustering method using kernel density estimation in 2D projection space, enabling rapid generation of high-quality cluster regions for interactive visualization labeling and significantly outperforming existing approaches in speed.

Authors:Alexandre Banks, Richard Cook, Septimiu E. Salcudean
Title: Setup-Invariant Augmented Reality for Teaching by Demonstration with Surgical Robots
Abstract:
Augmented reality (AR) is an effective tool in robotic surgery education as it combines exploratory learning with three-dimensional guidance. However, existing AR systems require expert supervision and do not account for differences in the mentor and mentee robot configurations. To enable novices to train outside the operating room while receiving expert-informed guidance, we present dV-STEAR: an open-source system that plays back task-aligned expert demonstrations without assuming identical setup joint positions between expert and novice. Pose estimation was rigorously quantified, showing a registration error of 3.86 (SD=2.01)mm. In a user study (N=24), dV-STEAR significantly improved novice performance on tasks from the Fundamentals of Laparoscopic Surgery. In a single-handed ring-over-wire task, dV-STEAR increased completion speed (p=0.03) and reduced collision time (p=0.01) compared to dry-lab training alone. During a pick-and-place task, it improved success rates (p=0.004). Across both tasks, participants using dV-STEAR exhibited significantly more balanced hand use and reported lower frustration levels. This work presents a novel educational tool implemented on the da Vinci Research Kit, demonstrates its effectiveness in teaching novices, and builds the foundation for further AR integration into robot-assisted surgery.
中文:dV-STAR系统通过增强现实技术让新手外科医生能在专家指导下进行训练,无需相同设备配置即可显著提升手术任务表现并降低操作挫败感。
English: The dV-STAR system enables novice surgeons to train with expert guidance in augmented reality, significantly improving performance and reducing frustration in robotic surgery tasks without requiring identical equipment setups.

Authors:Georg Ahnert, Elena Wurth, Markus Strohmaier, Jutta Mata
Title: Simulating Persuasive Dialogues on Meat Reduction with Generative Agents
Abstract:
Meat reduction benefits human and planetary health, but social norms keep meat central in shared meals. To date, the development of communication strategies that promote meat reduction while minimizing social costs has required the costly involvement of human participants at each stage of the process. We present work in progress on simulating multi-round dialogues on meat reduction between Generative Agents based on large language models (LLMs). We measure our main outcome using established psychological questionnaires based on the Theory of Planned Behavior and additionally investigate Social Costs. We find evidence that our preliminary simulations produce outcomes that are (i) consistent with theoretical expectations; and (ii) valid when compared to data from previous studies with human participants. Generative agent-based models are a promising tool for identifying novel communication strategies on meat reduction-tailored to highly specific participant groups-to then be tested in subsequent studies with human participants.
中文: 本研究利用基于大语言模型的生成智能体模拟关于减少肉类消费的对话,初步结果显示其与理论预期及前人研究数据一致,为开发针对性沟通策略提供了高效方法。
English: This research explores using Generative Agents based on large language models to simulate dialogues on meat reduction, showing promising results that align with theoretical expectations and previous human studies, offering a cost-effective method to develop tailored communication strategies.

Authors:Jose Alberto Baeza Guerra
Title: Geospatial and Symbolic Hypothesis for the Foundation of Tenochtitlan Based on Digital Elevation Analysis of the Valley of Mexico
Abstract:
This paper proposes a novel hypothesis about the foundation of Tenochtitlan by combining digital elevation modeling with historical and symbolic analysis. Using geospatial data from EarthExplorer, we simulate various historical water levels in the Valley of Mexico. The resulting lake configurations reveal possible locations for ancient settlements near now-vanished shorelines, suggesting a dynamic transformation of sacred geography that aligns with key Mexica myths. We identify Santa María Aztahuacan as a strong candidate for the historical Aztlan and propose a reinterpretation of foundational codices in light of geomythical correlations.
中文:本研究结合地理空间建模与历史分析,提出墨西哥谷地变化的湖岸线影响了墨西加人的定居模式,认定圣玛丽亚·阿兹塔瓦坎可能是阿兹特兰遗址,并通过地质神话关联重新解读了古籍抄本。
English: This study combines geospatial modeling with historical analysis to propose that shifting shorelines in the Valley of Mexico influenced Mexica settlement patterns, identifying Santa María Aztahuacan as a potential site for Aztlan and reinterpreting codices through geomythical correlations.

Authors:Kaustubh Shivshankar Shejole, Pushpak Bhattacharyya
Title: StereoDetect: Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological Underpinnings
Abstract:
Stereotypes are known to have very harmful effects, making their detection critically important. However, current research predominantly focuses on detecting and evaluating stereotypical biases, thereby leaving the study of stereotypes in its early stages. Our study revealed that many works have failed to clearly distinguish between stereotypes and stereotypical biases, which has significantly slowed progress in advancing research in this area. Stereotype and Anti-stereotype detection is a problem that requires social knowledge; hence, it is one of the most difficult areas in Responsible AI. This work investigates this task, where we propose a five-tuple definition and provide precise terminologies disentangling stereotypes, anti-stereotypes, stereotypical bias, and general bias. We provide a conceptual framework grounded in social psychology for reliable detection. We identify key shortcomings in existing benchmarks for this task of stereotype and anti-stereotype detection. To address these gaps, we developed StereoDetect, a well curated, definition-aligned benchmark dataset designed for this task. We show that sub-10B language models and GPT-4o frequently misclassify anti-stereotypes and fail to recognize neutral overgeneralizations. We demonstrate StereoDetect's effectiveness through multiple qualitative and quantitative comparisons with existing benchmarks and models fine-tuned on them. The dataset and code is available at https://github.com/KaustubhShejole/StereoDetect.
This study addresses the critical need to distinguish stereotypes from stereotypical biases in AI by proposing a clear five-tuple definition and introducing StereoDetect, a carefully curated benchmark that reveals significant classification failures in current language models.
English Summary:

Authors:Gaurav Verma, Jiawei Zhou, Mohit Chandra, Srijan Kumar, Munmun De Choudhury
Title: A Framework for Situating Innovations, Opportunities, and Challenges in Advancing Vertical Systems with Large AI Models
Abstract:
Large artificial intelligence (AI) models have garnered significant attention for their remarkable, often "superhuman", performance on standardized benchmarks. However, when these models are deployed in high-stakes verticals such as healthcare, education, and law, they often reveal notable limitations. For instance, they exhibit brittleness to minor variations in input data, present contextually uninformed decisions in critical settings, and undermine user trust by confidently producing or reproducing inaccuracies. These challenges in applying large models necessitate cross-disciplinary innovations to align the models' capabilities with the needs of real-world applications. We introduce a framework that addresses this gap through a layer-wise abstraction of innovations aimed at meeting users' requirements with large models. Through multiple case studies, we illustrate how researchers and practitioners across various fields can operationalize this framework. Beyond modularizing the pipeline of transforming large models into useful "vertical systems", we also highlight the dynamism that exists within different layers of the framework. Finally, we discuss how our framework can guide researchers and practitioners to (i) optimally situate their innovations (e.g., when vertical-specific insights can empower broadly impactful vertical-agnostic innovations), (ii) uncover overlooked opportunities (e.g., spotting recurring problems across verticals to develop practically useful foundation models instead of chasing benchmarks), and (iii) facilitate cross-disciplinary communication of critical challenges (e.g., enabling a shared vocabulary for AI developers, domain experts, and human-computer interaction scholars). Project webpage: https://gaurav22verma.github.io/vertical-systems-with-large-ai-models/

Authors:Georgios Hadjiantonis, Sarah Gillet, Marynel Vázquez, Iolanda Leite, Fethiye Irmak Dogan
Title: Let's move on: Topic Change in Robot-Facilitated Group Discussions
Abstract:
Robot-moderated group discussions have the potential to facilitate engaging and productive interactions among human participants. Previous work on topic management in conversational agents has predominantly focused on human engagement and topic personalization, with the agent having an active role in the discussion. Also, studies have shown the usefulness of including robots in groups, yet further exploration is still needed for robots to learn when to change the topic while facilitating discussions. Accordingly, our work investigates the suitability of machine-learning models and audiovisual non-verbal features in predicting appropriate topic changes. We utilized interactions between a robot moderator and human participants, which we annotated and used for extracting acoustic and body language-related features. We provide a detailed analysis of the performance of machine learning approaches using sequential and non-sequential data with different sets of features. The results indicate promising performance in classifying inappropriate topic changes, outperforming rule-based approaches. Additionally, acoustic features exhibited comparable performance and robustness compared to the complete set of multimodal features. Our annotated data is publicly available at https://github.com/ghadj/topic-change-robot-discussions-data-2024.
Chinese Summary: 本研究探索了利用机器学习模型结合视听特征来预测机器人主持的群体讨论中的最佳话题转换时机,结果表明其性能优于基于规则的方法,并强调了声学特征具有相当的预测有效性。
English Summary: This study explores the use of machine learning models with audiovisual features to predict optimal topic transitions in robot-moderated group discussions, demonstrating superior performance over rule-based methods while highlighting acoustic features' comparable effectiveness.

Authors:Xian-Xian Liu, Yuanyuan Wei, Mingkun Xu, Yongze Guo, Hongwei Zhang, Huicong Dong, Qun Song, Qi Zhao, Wei Luo, Feng Tien, Juntao Gao, Simon Fong
Title: An Integrated AI-Enabled System Using One Class Twin Cross Learning (OCT-X) for Early Gastric Cancer Detection
Abstract:
Early detection of gastric cancer, a leading cause of cancer-related mortality worldwide, remains hampered by the limitations of current diagnostic technologies, leading to high rates of misdiagnosis and missed diagnoses. To address these challenges, we propose an integrated system that synergizes advanced hardware and software technologies to balance speed-accuracy. Our study introduces the One Class Twin Cross Learning (OCT-X) algorithm. Leveraging a novel fast double-threshold grid search strategy (FDT-GS) and a patch-based deep fully convolutional network, OCT-X maximizes diagnostic accuracy through real-time data processing and seamless lesion surveillance. The hardware component includes an all-in-one point-of-care testing (POCT) device with high-resolution imaging sensors, real-time data processing, and wireless connectivity, facilitated by the NI CompactDAQ and LabVIEW software. Our integrated system achieved an unprecedented diagnostic accuracy of 99.70%, significantly outperforming existing models by up to 4.47%, and demonstrated a 10% improvement in multirate adaptability. These findings underscore the potential of OCT-X as well as the integrated system in clinical diagnostics, offering a path toward more accurate, efficient, and less invasive early gastric cancer detection. Future research will explore broader applications, further advancing oncological diagnostics. Code is available at https://github.com/liu37972/Multirate-Location-on-OCT-X-Learning.git.
中文: 该研究提出了一种结合OCT-X算法与先进硬件的集成系统,实现了99.70%的胃癌早期诊断准确率,显著优于现有方法。
English: The study introduces an integrated system combining the OCT-X algorithm with advanced hardware to achieve 99.70% diagnostic accuracy for early gastric cancer detection, significantly outperforming existing methods.

Authors:Joshua Rodriguez, Om Sanan, Guillermo Vizarreta-Luna, Steven A. Conrad
Title: Text Chunking for Document Classification for Urban System Management using Large Language Models
Abstract:
Urban systems are managed using complex textual documentation that need coding and analysis to set requirements and evaluate built environment performance. This paper contributes to the study of applying large-language models (LLM) to qualitative coding activities to reduce resource requirements while maintaining comparable reliability to humans. Qualitative coding and assessment face challenges like resource limitations and bias, accuracy, and consistency between human evaluators. Here we report the application of LLMs to deductively code 10 case documents on the presence of 17 digital twin characteristics for the management of urban systems. We utilize two prompting methods to compare the semantic processing of LLMs with human coding efforts: whole text analysis and text chunk analysis using OpenAI's GPT-4o, GPT-4o-mini, and o1-mini models. We found similar trends of internal variability between methods and results indicate that LLMs may perform on par with human coders when initialized with specific deductive coding contexts. GPT-4o, o1-mini and GPT-4o-mini showed significant agreement with human raters when employed using a chunking method. The application of both GPT-4o and GPT-4o-mini as an additional rater with three manual raters showed statistically significant agreement across all raters, indicating that the analysis of textual documents is benefited by LLMs. Our findings reveal nuanced sub-themes of LLM application suggesting LLMs follow human memory coding processes where whole-text analysis may introduce multiple meanings. The novel contributions of this paper lie in assessing the performance of OpenAI GPT models and introduces the chunk-based prompting approach, which addresses context aggregation biases by preserving localized context.
本研究证明,当采用适当的提示方法(特别是通过保留局部语境的块分析)时,大型语言模型能够以与人类编码者相当的可靠性完成城市规划文档的定性编码。
This study demonstrates that large-language models can perform qualitative coding of urban planning documents with reliability comparable to human coders when using appropriate prompting methods, particularly through chunk-based analysis that preserves local context.

Authors:Amr Alshatnawi, Remi Sampaleanu, David Liebovitz
Title: MediTools -- Medical Education Powered by LLMs
Abstract:
Artificial Intelligence (AI) has been advancing rapidly and with the advent of large language models (LLMs) in late 2022, numerous opportunities have emerged for adopting this technology across various domains, including medicine. These innovations hold immense potential to revolutionize and modernize medical education. Our research project leverages large language models to enhance medical education and address workflow challenges through the development of MediTools - AI Medical Education. This prototype application focuses on developing interactive tools that simulate real-life clinical scenarios, provide access to medical literature, and keep users updated with the latest medical news. Our first tool is a dermatology case simulation tool that uses real patient images depicting various dermatological conditions and enables interaction with LLMs acting as virtual patients. This platform allows users to practice their diagnostic skills and enhance their clinical decision-making abilities. The application also features two additional tools: an AI-enhanced PubMed tool for engaging with LLMs to gain deeper insights into research papers, and a Google News tool that offers LLM generated summaries of articles for various medical specialties. A comprehensive survey has been conducted among medical professionals and students to gather initial feedback on the effectiveness and user satisfaction of MediTools, providing insights for further development and refinement of the application. This research demonstrates the potential of AI-driven tools in transforming and revolutionizing medical education, offering a scalable and interactive platform for continuous learning and skill development.
中文: 本研究利用大语言模型开发了MediTools人工智能医学教育平台,通过临床模拟工具、智能文献检索和医学新闻摘要三大功能,为医学教育提供可扩展的交互式学习方案,展现了AI技术推动医学教育变革的巨大潜力。
English: This research utilizes large language models to develop MediTools, an AI-powered medical education platform featuring interactive clinical simulations, enhanced literature access, and real-time news summaries, demonstrating AI's transformative potential in modernizing medical training through scalable learning tools.

Authors:Hojung Choi, Jun En Low, Tae Myung Huh, Gabriela A. Uribe, Seongheon Hong, Kenneth A. W. Hoffman, Julia Di, Tony G. Chen, Andrew A. Stanley, Mark R. Cutkosky
Title: CoinFT: A Coin-Sized, Capacitive 6-Axis Force Torque Sensor for Robotic Applications
Abstract:
We introduce CoinFT, a capacitive 6-axis force/torque (F/T) sensor that is compact, light, low-cost, and robust with an average mean-squared error of 0.11N for force and 0.84mNm for moment when the input ranges from 0~10N and 0~4N in normal and shear directions, respectively. CoinFT is a stack of two rigid PCBs with comb-shaped electrodes connected by an array of silicone rubber pillars. The microcontroller interrogates the electrodes in different subsets in order to enhance sensitivity for measuring 6-axis F/T. The combination of desirable features of CoinFT enables various contact-rich robot interactions at a scale, across different embodiment domains including drones, robot end-effectors, and wearable haptic devices. We demonstrate the utility of CoinFT on drones by performing an attitude-based force control to perform tasks that require careful contact force modulation. The design, fabrication, and firmware of CoinFT are open-sourced at https://hojung-choi.github.io/coinft.github.io/.

Authors:Changlun Li, Yao Shi, Yuyu Luo, Nan Tang
Title: Will LLMs be Professional at Fund Investment? DeepFund: A Live Arena Perspective
Abstract:
Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, but their effectiveness in financial decision-making remains inadequately evaluated. Current benchmarks primarily assess LLMs' understanding on financial documents rather than the ability to manage assets or dig out trading opportunities in dynamic market conditions. Despite the release of new benchmarks for evaluating diversified tasks on the financial domain, we identified four major problems in these benchmarks, which are data leakage, navel-gazing, over-intervention, and maintenance-hard. To pave the research gap, we introduce DeepFund, a comprehensive arena platform for evaluating LLM-based trading strategies in a live environment. Our approach implements a multi-agent framework where they serve as multiple key roles that realize the real-world investment decision processes. Moreover, we provide a web interface that visualizes LLMs' performance with fund investment metrics across different market conditions, enabling detailed comparative analysis. Through DeepFund, we aim to provide a more realistic and fair assessment on LLM's capabilities in fund investment, offering diversified insights and revealing their potential applications in real-world financial markets. Our code is publicly available at https://github.com/HKUSTDial/DeepFund.
中文摘要:大语言模型在金融决策中的有效性评估不足,为此我们推出DeepFund——一个基于多智能体框架的实时评估平台,通过模拟真实投资流程和可视化绩效指标来全面检验LLM在基金投资中的实际应用能力。
English Summary: Large Language Models (LLMs) lack robust evaluation in financial decision-making, prompting the introduction of DeepFund, a live multi-agent platform that assesses LLM-based trading strategies through real-world simulations and performance visualization.

Authors:Ayberk Acar, Jumanh Atoum, Peter S. Connor, Clifford Pierre, Carisa N. Lynch, Nicholas L. Kavoussi, Jie Ying Wu
Title: NAVIUS: Navigated Augmented Reality Visualization for Ureteroscopic Surgery
Abstract:
Ureteroscopy is the standard of care for diagnosing and treating kidney stones and tumors. However, current ureteroscopes have a limited field of view, requiring significant experience to adequately navigate the renal collecting system. This is evidenced by the fact that inexperienced surgeons have higher rates of missed stones. One-third of patients with residual stones require re-operation within 20 months. In order to aid surgeons to fully explore the kidney, this study presents the Navigated Augmented Reality Visualization for Ureteroscopic Surgery (NAVIUS) system. NAVIUS assists surgeons by providing 3D maps of the target anatomy, real-time scope positions, and preoperative imaging overlays. To enable real-time navigation and visualization, we integrate an electromagnetic tracker-based navigation pipeline with augmented reality visualizations. NAVIUS connects to 3D Slicer and Unity with OpenIGTLink, and uses HoloLens 2 as a holographic interface. We evaluate NAVIUS through a user study where surgeons conducted ureteroscopy on kidney phantoms with and without visual guidance. With our proposed system, we observed that surgeons explored more areas within the collecting system with NAVIUS (average 23.73% increase), and NASA-TLX metrics were improved (up to 27.27%). NAVIUS acts as a step towards better surgical outcomes and surgeons' experience. The codebase for the system will be available at: https://github.com/vu-maple-lab/NAVIUS.
Chinese: NAVIUS系统通过提供3D解剖图谱和实时导航,显著提升了输尿管镜手术的探查范围并减轻了医生的操作负担。
English: The NAVIUS system enhances ureteroscopic surgery by providing 3D anatomical maps and real-time navigation, significantly improving surgical exploration and reducing cognitive load for surgeons.

Authors:Rishabh Vishwakarma, Caroline Brophy, Catherine Hurley
Title: PieGlyph: An R package for creating axis invariant pie-glyphs for 2d plots
Abstract:
Effective visualisation of multidimensional data is crucial for generating insights. Glyph-based visualisations, which encode data dimensions onto multiple visual channels such as colour, shape, and size, provide an effective means of representing complex datasets. Pie-chart glyphs (pie-glyphs) are one such approach, where multiple data attributes are mapped to slices within a pie chart. This paper introduces the PieGlyph R package, which enables users to overlay any 2D plot with axis-invariant pie-glyphs, offering a compact and intuitive representation of multidimensional data. Unlike existing R packages such as scatterpie or ggforce, PieGlyph generates pie-glyphs independently of the plot axes by employing a nested coordinate system, ensuring they remain circular regardless of changes to the underlying coordinate system. This enhances interpretability, particularly in when visualising spatial data, as users can select the most appropriate map projection without distorting the glyphs' shape. Pie-glyphs are also particularly well-suited for visualising compositional data, where there is a natural sum-to-one constraint on the data attributes. PieGlyph is developed under the Grammar of Graphics paradigm using the ggplot2 framework and supports the generation of interactive pie-glyphs through the ggiraph package. Designed to integrate seamlessly with all features and extensions offered by ggplot2 and ggiraph, PieGlyph provides users with full flexibility in customising every aspect of the visualisation. This paper outlines the conceptual framework of PieGlyph, compares it with existing alternatives, and demonstrates its applications through example visualisations.
中文:PieGlyph R 包引入了轴不变饼图符号用于多维数据可视化,确保圆形符号在不同坐标系中保持无失真,并提升空间和成分数据的可解释性。
English: The PieGlyph R package introduces axis-invariant pie-glyphs for multidimensional data visualization, ensuring circular glyphs remain undistorted across coordinate systems and enhancing interpretability for spatial and compositional data.

Authors:Marc R. Schlichting, Vale Rasmussen, Heba Alazzeh, Houjun Liu, Kiana Jafari, Amelia F. Hardy, Dylan M. Asmar, Mykel J. Kochenderfer
Title: LeRAAT: LLM-Enabled Real-Time Aviation Advisory Tool
Abstract:
In aviation emergencies, high-stakes decisions must be made in an instant. Pilots rely on quick access to precise, context-specific information -- an area where emerging tools like large language models (LLMs) show promise in providing critical support. This paper introduces LeRAAT, a framework that integrates LLMs with the X-Plane flight simulator to deliver real-time, context-aware pilot assistance. The system uses live flight data, weather conditions, and aircraft documentation to generate recommendations aligned with aviation best practices and tailored to the particular situation. It employs a Retrieval-Augmented Generation (RAG) pipeline that extracts and synthesizes information from aircraft type-specific manuals, including performance specifications and emergency procedures, as well as aviation regulatory materials, such as FAA directives and standard operating procedures. We showcase the framework in both a virtual reality and traditional on-screen simulation, supporting a wide range of research applications such as pilot training, human factors research, and operational decision support.
Chinese: 本文介绍了LeRAAT框架,它将大型语言模型与X-Plane飞行模拟器相结合,通过检索增强生成技术整合实时飞行数据、天气条件和航空文档,为飞行员提供实时、情境感知的辅助支持。
English: This paper presents LeRAAT, a framework that integrates large language models with the X-Plane flight simulator to provide real-time, context-aware pilot assistance by synthesizing live flight data, weather conditions, and aviation documentation through a Retrieval-Augmented Generation pipeline.

Authors:Pengzhou Cheng, Zheng Wu, Zongru Wu, Aston Zhang, Zhuosheng Zhang, Gongshen Liu
Title: OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents
Abstract:
Autonomous graphical user interface (GUI) agents powered by multimodal large language models have shown great promise. However, a critical yet underexplored issue persists: over-execution, where the agent executes tasks in a fully autonomous way, without adequate assessment of its action confidence to compromise an adaptive human-agent collaboration. This poses substantial risks in complex scenarios, such as those involving ambiguous user instructions, unexpected interruptions, and environmental hijacks. To address the issue, we introduce OS-Kairos, an adaptive GUI agent capable of predicting confidence levels at each interaction step and efficiently deciding whether to act autonomously or seek human intervention. OS-Kairos is developed through two key mechanisms: (i) collaborative probing that annotates confidence scores at each interaction step; (ii) confidence-driven interaction that leverages these confidence scores to elicit the ability of adaptive interaction. Experimental results show that OS-Kairos substantially outperforms existing models on our curated dataset featuring complex scenarios, as well as on established benchmarks such as AITZ and Meta-GUI, with 24.59\%$\sim$87.29\% improvements in task success rate. OS-Kairos facilitates an adaptive human-agent collaboration, prioritizing effectiveness, generality, scalability, and efficiency for real-world GUI interaction. The dataset and codes are available at https://github.com/Wuzheng02/OS-Kairos.
中文摘要:OS-Kairos是一种自适应图形界面代理,通过预测每个交互步骤的置信度来自主决定执行操作或请求人工干预,在复杂场景中显著优于现有模型。
English Summary: OS-Kairos is an adaptive GUI agent that predicts confidence levels at each interaction step to determine when to act autonomously or seek human intervention, significantly outperforming existing models in complex scenarios.

Authors:Haidong Wang, Qia Shan, JianHua Zhang, PengFei Xiao, Ao Liu
Title: An Audio-Visual Fusion Emotion Generation Model Based on Neuroanatomical Alignment
Abstract:
In the field of affective computing, traditional methods for generating emotions predominantly rely on deep learning techniques and large-scale emotion datasets. However, deep learning techniques are often complex and difficult to interpret, and standardizing large-scale emotional datasets are difficult and costly to establish. To tackle these challenges, we introduce a novel framework named Audio-Visual Fusion for Brain-like Emotion Learning(AVF-BEL). In contrast to conventional brain-inspired emotion learning methods, this approach improves the audio-visual emotion fusion and generation model through the integration of modular components, thereby enabling more lightweight and interpretable emotion learning and generation processes. The framework simulates the integration of the visual, auditory, and emotional pathways of the brain, optimizes the fusion of emotional features across visual and auditory modalities, and improves upon the traditional Brain Emotional Learning (BEL) model. The experimental results indicate a significant improvement in the similarity of the audio-visual fusion emotion learning generation model compared to single-modality visual and auditory emotion learning and generation model. Ultimately, this aligns with the fundamental phenomenon of heightened emotion generation facilitated by the integrated impact of visual and auditory stimuli. This contribution not only enhances the interpretability and efficiency of affective intelligence but also provides new insights and pathways for advancing affective computing technology. Our source code can be accessed here: https://github.com/OpenHUTB/emotion}{https://github.com/OpenHUTB/emotion.
中文摘要:提出的视听融合类脑情感学习框架通过优化多模态情感特征融合,克服了传统深度学习方法的复杂性,实现了更轻量化、可解释的情感生成系统。
English Summary: The proposed Audio-Visual Fusion for Brain-like Emotion Learning (AVF-BEL) framework overcomes limitations of complex deep learning methods by creating a more interpretable and lightweight system that significantly enhances emotion generation through optimized multimodal fusion.

Authors:Steven-Shine Chen, Jimin Lee, Paul Pu Liang
Title: Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving
Abstract:
Humans have long relied on visual aids like sketches and diagrams to support reasoning and problem-solving. Visual tools, like auxiliary lines in geometry or graphs in calculus, are essential for understanding complex ideas. However, many tutoring systems remain text-based, providing feedback only through natural language. Leveraging recent advances in Large Multimodal Models (LMMs), this paper introduces Interactive Sketchpad, a tutoring system that combines language-based explanations with interactive visualizations to enhance learning. Built on a pre-trained LMM, Interactive Sketchpad is fine-tuned to provide step-by-step guidance in both text and visuals, enabling natural multimodal interaction with the student. Accurate and robust diagrams are generated by incorporating code execution into the reasoning process. User studies conducted on math problems such as geometry, calculus, and trigonometry demonstrate that Interactive Sketchpad leads to improved task comprehension, problem-solving accuracy, and engagement levels, highlighting its potential for transforming educational technologies. All code is available at: https://stevenshinechen.github.io/interactivesketchpad/.

Authors:Karthik Mahadevan, Blaine Lewis, Jiannan Li, Bilge Mutlu, Anthony Tang, Tovi Grossman
Title: ImageInThat: Manipulating Images to Convey User Instructions to Robots
Abstract:
Foundation models are rapidly improving the capability of robots in performing everyday tasks autonomously such as meal preparation, yet robots will still need to be instructed by humans due to model performance, the difficulty of capturing user preferences, and the need for user agency. Robots can be instructed using various methods-natural language conveys immediate instructions but can be abstract or ambiguous, whereas end-user programming supports longer horizon tasks but interfaces face difficulties in capturing user intent. In this work, we propose using direct manipulation of images as an alternative paradigm to instruct robots, and introduce a specific instantiation called ImageInThat which allows users to perform direct manipulation on images in a timeline-style interface to generate robot instructions. Through a user study, we demonstrate the efficacy of ImageInThat to instruct robots in kitchen manipulation tasks, comparing it to a text-based natural language instruction method. The results show that participants were faster with ImageInThat and preferred to use it over the text-based method. Supplementary material including code can be found at: https://image-in-that.github.io/.

Authors:Sebastian Zhao, Alan Zhu, Hussein Mozannar, David Sontag, Ameet Talwalkar, Valerie Chen
Title: CodingGenie: A Proactive LLM-Powered Programming Assistant
Abstract:
While developers increasingly adopt tools powered by large language models (LLMs) in day-to-day workflows, these tools still require explicit user invocation. To seamlessly integrate LLM capabilities to a developer's workflow, we introduce CodingGenie, a proactive assistant integrated into the code editor. CodingGenie autonomously provides suggestions, ranging from bug fixing to unit testing, based on the current code context and allows users to customize suggestions by providing a task description and selecting what suggestions are shown. We demonstrate multiple use cases to show how proactive suggestions from CodingGenie can improve developer experience, and also analyze the cost of adding proactivity. We believe this open-source tool will enable further research into proactive assistants. CodingGenie is open-sourced at https://github.com/sebzhao/CodingGenie/ and video demos are available at https://sebzhao.github.io/CodingGenie/.
中文: CodingGenie是一款集成在代码编辑器中的主动式编程助手,能根据当前代码上下文自主提供从错误修复到单元测试等建议,并通过开源方式推动主动辅助工具的深入研究。
English: CodingGenie is a proactive coding assistant that autonomously provides contextual suggestions like bug fixes and unit tests within code editors, enhancing developer workflows through customizable, open-source integration.

Authors:Jia Xu, Tianyi Wei, Bojian Hou, Patryk Orzechowski, Shu Yang, Ruochen Jin, Rachael Paulbeck, Joost Wagenaar, George Demiris, Li Shen
Title: MentalChat16K: A Benchmark Dataset for Conversational Mental Health Assistance
Abstract:
We introduce MentalChat16K, an English benchmark dataset combining a synthetic mental health counseling dataset and a dataset of anonymized transcripts from interventions between Behavioral Health Coaches and Caregivers of patients in palliative or hospice care. Covering a diverse range of conditions like depression, anxiety, and grief, this curated dataset is designed to facilitate the development and evaluation of large language models for conversational mental health assistance. By providing a high-quality resource tailored to this critical domain, MentalChat16K aims to advance research on empathetic, personalized AI solutions to improve access to mental health support services. The dataset prioritizes patient privacy, ethical considerations, and responsible data usage. MentalChat16K presents a valuable opportunity for the research community to innovate AI technologies that can positively impact mental well-being. The dataset is available at https://huggingface.co/datasets/ShenLab/MentalChat16K and the code and documentation are hosted on GitHub at https://github.com/ChiaPatricia/MentalChat16K.
中文: MentalChat16K是一个结合合成与匿名心理健康咨询记录的英文数据集,旨在推动共情式对话助手的AI技术发展,同时严格保障数据隐私与伦理规范。
English: MentalChat16K is a specialized English dataset combining synthetic and anonymized mental health counseling transcripts, designed to advance AI development for empathetic conversational assistance while ensuring privacy and ethical data use.

Authors:Daniel Syomichev, Padmini Gopinath, Guang-Lin Wei, Eric Chang, Ian Gordon, Amanuel Seifu, Rahul Pemmaraju, Neehar Peri, James Purtilo
Title: QuickDraw: Fast Visualization, Analysis and Active Learning for Medical Image Segmentation
Abstract:
Analyzing CT scans, MRIs and X-rays is pivotal in diagnosing and treating diseases. However, detecting and identifying abnormalities from such medical images is a time-intensive process that requires expert analysis and is prone to interobserver variability. To mitigate such issues, machine learning-based models have been introduced to automate and significantly reduce the cost of image segmentation. Despite significant advances in medical image analysis in recent years, many of the latest models are never applied in clinical settings because state-of-the-art models do not easily interface with existing medical image viewers. To address these limitations, we propose QuickDraw, an open-source framework for medical image visualization and analysis that allows users to upload DICOM images and run off-the-shelf models to generate 3D segmentation masks. In addition, our tool allows users to edit, export, and evaluate segmentation masks to iteratively improve state-of-the-art models through active learning. In this paper, we detail the design of our tool and present survey results that highlight the usability of our software. Notably, we find that QuickDraw reduces the time to manually segment a CT scan from four hours to six minutes and reduces machine learning-assisted segmentation time by 10\% compared to prior work. Our code and documentation are available at https://github.com/qd-seg/quickdraw
中文:QuickDraw 是一个开源框架,支持医学影像可视化和分析,用户可上传 DICOM 图像运行模型生成 3D 分割掩码,并通过主动学习迭代优化结果,将 CT 扫描分割时间从四小时缩短至六分钟。
English: QuickDraw is an open-source framework that enables medical image visualization and analysis by allowing users to upload DICOM images, run models for 3D segmentation, and iteratively improve results through active learning, reducing CT scan segmentation time from four hours to six minutes.

Authors:Krzysztof Adamkiewicz, Paweł W. Woźniak, Julia Dominiak, Andrzej Romanowski, Jakob Karolus, Stanislav Frolov
Title: PromptMap: An Alternative Interaction Style for AI-Based Image Generation
Abstract:
Recent technological advances popularized the use of image generation among the general public. Crafting effective prompts can, however, be difficult for novice users. To tackle this challenge, we developed PromptMap, a new interaction style for text-to-image AI that allows users to freely explore a vast collection of synthetic prompts through a map-like view with semantic zoom. PromptMap groups images visually by their semantic similarity, allowing users to discover relevant examples. We evaluated PromptMap in a between-subject online study ($n=60$) and a qualitative within-subject study ($n=12$). We found that PromptMap supported users in crafting prompts by providing them with examples. We also demonstrated the feasibility of using LLMs to create vast example collections. Our work contributes a new interaction style that supports users unfamiliar with prompting in achieving a satisfactory image output.
Chinese: 近期技术进步使图像生成技术普及化,但新手用户常难以编写有效提示,为此我们开发了PromptMap,这是一种新型交互方式,通过语义地图视图帮助用户自由探索大量合成提示,从而提升生成满意图像的能力。
English: Recent technological advances have made image generation accessible to the public, but novice users often struggle with crafting effective prompts, leading to the development of PromptMap, a new interaction style that helps users explore and create prompts through a semantic map view, improving their ability to generate satisfactory images.

Authors:Jiale Wei, Xiang Ying, Tao Gao, Fangyi Bao, Felix Tao, Jingbo Shang
Title: AI-native Memory 2.0: Second Me
Abstract:
Human interaction with the external world fundamentally involves the exchange of personal memory, whether with other individuals, websites, applications, or, in the future, AI agents. A significant portion of this interaction is redundant, requiring users to repeatedly provide the same information across different contexts. Existing solutions, such as browser-stored credentials, autofill mechanisms, and unified authentication systems, have aimed to mitigate this redundancy by serving as intermediaries that store and retrieve commonly used user data. The advent of large language models (LLMs) presents an opportunity to redefine memory management through an AI-native paradigm: SECOND ME. SECOND ME acts as an intelligent, persistent memory offload system that retains, organizes, and dynamically utilizes user-specific knowledge. By serving as an intermediary in user interactions, it can autonomously generate context-aware responses, prefill required information, and facilitate seamless communication with external systems, significantly reducing cognitive load and interaction friction. Unlike traditional memory storage solutions, SECOND ME extends beyond static data retention by leveraging LLM-based memory parameterization. This enables structured organization, contextual reasoning, and adaptive knowledge retrieval, facilitating a more systematic and intelligent approach to memory management. As AI-driven personal agents like SECOND ME become increasingly integrated into digital ecosystems, SECOND ME further represents a critical step toward augmenting human-world interaction with persistent, contextually aware, and self-optimizing memory systems. We have open-sourced the fully localizable deployment system at GitHub: https://github.com/Mindverse/Second-Me.
中文摘要:SECOND ME系统利用大语言模型实现AI原生的记忆管理,通过智能存储、组织和动态运用个人信息,有效减少人类与外部世界交互中的重复操作,提升交互效率。
English Summary: The SECOND ME system leverages large language models to create an AI-native memory management solution that reduces redundancy in human-world interactions by intelligently storing, organizing, and dynamically utilizing personal information across various contexts.

Authors:DongHeun Han, Byungmin Kim, RoUn Lee, KyeongMin Kim, Hyoseok Hwang, HyeongYeop Kang
Title: ForceGrip: Reference-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation
Abstract:
Realistic Hand manipulation is a key component of immersive virtual reality (VR), yet existing methods often rely on kinematic approach or motion-capture datasets that omit crucial physical attributes such as contact forces and finger torques. Consequently, these approaches prioritize tight, one-size-fits-all grips rather than reflecting users' intended force levels. We present ForceGrip, a deep learning agent that synthesizes realistic hand manipulation motions, faithfully reflecting the user's grip force intention. Instead of mimicking predefined motion datasets, ForceGrip uses generated training scenarios-randomizing object shapes, wrist movements, and trigger input flows-to challenge the agent with a broad spectrum of physical interactions. To effectively learn from these complex tasks, we employ a three-phase curriculum learning framework comprising Finger Positioning, Intention Adaptation, and Dynamic Stabilization. This progressive strategy ensures stable hand-object contact, adaptive force control based on user inputs, and robust handling under dynamic conditions. Additionally, a proximity reward function enhances natural finger motions and accelerates training convergence. Quantitative and qualitative evaluations reveal ForceGrip's superior force controllability and plausibility compared to state-of-the-art methods. Demo videos are available as supplementary material and the code is provided at https://han-dongheun.github.io/ForceGrip.

Authors:Zhao-Heng Yin, Changhao Wang, Luis Pineda, Krishna Bodduluri, Tingfan Wu, Pieter Abbeel, Mustafa Mukadam
Title: Geometric Retargeting: A Principled, Ultrafast Neural Hand Retargeting Algorithm
Abstract:
We introduce Geometric Retargeting (GeoRT), an ultrafast, and principled neural hand retargeting algorithm for teleoperation, developed as part of our recent Dexterity Gen (DexGen) system. GeoRT converts human finger keypoints to robot hand keypoints at 1KHz, achieving state-of-the-art speed and accuracy with significantly fewer hyperparameters. This high-speed capability enables flexible postprocessing, such as leveraging a foundational controller for action correction like DexGen. GeoRT is trained in an unsupervised manner, eliminating the need for manual annotation of hand pairs. The core of GeoRT lies in novel geometric objective functions that capture the essence of retargeting: preserving motion fidelity, ensuring configuration space (C-space) coverage, maintaining uniform response through high flatness, pinch correspondence and preventing self-collisions. This approach is free from intensive test-time optimization, offering a more scalable and practical solution for real-time hand retargeting.

Authors:Xiao Wang, Lu Dong, Sahana Rangasrinivasan, Ifeoma Nwogu, Srirangaraj Setlur, Venugopal Govindaraju
Title: AutoMisty: A Multi-Agent LLM Framework for Automated Code Generation in the Misty Social Robot
Abstract:
The social robot's open API allows users to customize open-domain interactions. However, it remains inaccessible to those without programming experience. In this work, we introduce AutoMisty, the first multi-agent collaboration framework powered by large language models (LLMs), to enable the seamless generation of executable Misty robot code from natural language instructions. AutoMisty incorporates four specialized agent modules to manage task decomposition, assignment, problem-solving, and result synthesis. Each agent incorporates a two-layer optimization mechanism, with self-reflection for iterative refinement and human-in-the-loop for better alignment with user preferences. AutoMisty ensures a transparent reasoning process, allowing users to iteratively refine tasks through natural language feedback for precise execution. To evaluate AutoMisty's effectiveness, we designed a benchmark task set spanning four levels of complexity and conducted experiments in a real Misty robot environment. Extensive evaluations demonstrate that AutoMisty not only consistently generates high-quality code but also enables precise code control, significantly outperforming direct reasoning with ChatGPT-4o and ChatGPT-o1. All code, optimized APIs, and experimental videos will be publicly released through the webpage: https://wangxiaoshawn.github.io/AutoMisty.html

Authors:Zheng Hui, Yinheng Li, Dan zhao, Tianyi Chen, Colby Banbury, Kazuhito Koishida
Title: WinClick: GUI Grounding with Multimodal Large Language Models
Abstract:
Graphical User Interface (GUI) tasks are vital for automating workflows such as software testing, user interface navigation. For users, the GUI is the most intuitive platform for interacting with a computer. Previous work identified a key challenge in developing visual GUI agents: GUI grounding - the ability to accurately locate screen elements based on instructions. However, most existing GUI agents rely on structured data formats like DOM or HTML files in training or inferencing, which are inaccessible across all applications, particular in a general desktop environments such as Windows OS. To address this, we introduce WinClick, a novel visual GUI agent developed in Windows platform. WinClick leverages screenshots to detect actionable regions. To overcome the challenge of GUI grounding, we enhance WinClick with GUI grounding pre-training and propose an LLM-based method for aligning GUI grounding data. Additionally, we introduce WinSpot, the first comprehensive benchmark for GUI grounding on Windows. Our experiments demonstrate that WinClick, combined with GUI grounding pre-training, significantly outperforms existing baselines, offering a scalable solution for GUI automation in desktop environments. WinSpot is publicly available at https://github.com/zackhuiiiii/WinSpot.
中文:WinClick是一种创新的Windows视觉GUI代理,通过截图和增强的GUI基础预训练来自动化桌面任务,其性能优于现有方法,并得到WinSpot基准测试的支持。
English: WinClick is a novel visual GUI agent for Windows that uses screenshots and enhanced GUI grounding pre-training to automate desktop tasks, outperforming existing methods and supported by the WinSpot benchmark.

Authors:Yifei Huang, Jilan Xu, Baoqi Pei, Yuping He, Guo Chen, Mingfang Zhang, Lijin Yang, Zheng Nie, Jinyao Liu, Guoshun Fan, Dechen Lin, Fang Fang, Kunpeng Li, Chang Yuan, Xinyuan Chen, Yaohui Wang, Yali Wang, Yu Qiao, Limin Wang
Title: An Egocentric Vision-Language Model based Portable Real-time Smart Assistant
Abstract:
We present Vinci, a vision-language system designed to provide real-time, comprehensive AI assistance on portable devices. At its core, Vinci leverages EgoVideo-VL, a novel model that integrates an egocentric vision foundation model with a large language model (LLM), enabling advanced functionalities such as scene understanding, temporal grounding, video summarization, and future planning. To enhance its utility, Vinci incorporates a memory module for processing long video streams in real time while retaining contextual history, a generation module for producing visual action demonstrations, and a retrieval module that bridges egocentric and third-person perspectives to provide relevant how-to videos for skill acquisition. Unlike existing systems that often depend on specialized hardware, Vinci is hardware-agnostic, supporting deployment across a wide range of devices, including smartphones and wearable cameras. In our experiments, we first demonstrate the superior performance of EgoVideo-VL on multiple public benchmarks, showcasing its vision-language reasoning and contextual understanding capabilities. We then conduct a series of user studies to evaluate the real-world effectiveness of Vinci, highlighting its adaptability and usability in diverse scenarios. We hope Vinci can establish a new framework for portable, real-time egocentric AI systems, empowering users with contextual and actionable insights. Including the frontend, backend, and models, all codes of Vinci are available at https://github.com/OpenGVLab/vinci.
中文: Vinci 是一款便携式视觉语言系统,通过结合 EgoVideo-VL 模型及记忆、生成和检索模块,可在多种设备上提供实时AI辅助,支持场景理解和技能学习等任务。
English: Vinci is a portable vision-language system that integrates the EgoVideo-VL model with memory, generation, and retrieval modules to deliver real-time AI assistance for tasks like scene understanding and skill acquisition across various devices.

Authors:Zhiyuan Huang, Ziming Cheng, Junting Pan, Zhaohui Hou, Mingjie Zhan
Title: SpiritSight Agent: Advanced GUI Agent with One Look
Abstract:
Graphical User Interface (GUI) agents show amazing abilities in assisting human-computer interaction, automating human user's navigation on digital devices. An ideal GUI agent is expected to achieve high accuracy, low latency, and compatibility for different GUI platforms. Recent vision-based approaches have shown promise by leveraging advanced Vision Language Models (VLMs). While they generally meet the requirements of compatibility and low latency, these vision-based GUI agents tend to have low accuracy due to their limitations in element grounding. To address this issue, we propose $\textbf{SpiritSight}$, a vision-based, end-to-end GUI agent that excels in GUI navigation tasks across various GUI platforms. First, we create a multi-level, large-scale, high-quality GUI dataset called $\textbf{GUI-Lasagne}$ using scalable methods, empowering SpiritSight with robust GUI understanding and grounding capabilities. Second, we introduce the $\textbf{Universal Block Parsing (UBP)}$ method to resolve the ambiguity problem in dynamic high-resolution of visual inputs, further enhancing SpiritSight's ability to ground GUI objects. Through these efforts, SpiritSight agent outperforms other advanced methods on diverse GUI benchmarks, demonstrating its superior capability and compatibility in GUI navigation tasks. Models and datasets are available at https://hzhiyuan.github.io/SpiritSight-Agent.

Authors:Shiyuan Zhou, Bingxuan Li, Xiyuan Chen, Zhi Tu, Yifeng Wang, Yiwen Xiang, Tianyi Zhang
Title: HEPHA: A Mixed-Initiative Image Labeling Tool for Specialized Domains
Abstract:
Image labeling is an important task for training computer vision models. In specialized domains, such as healthcare, it is expensive and challenging to recruit specialists for image labeling. We propose HEPHA, a mixed-initiative image labeling tool that elicits human expertise via inductive logic learning to infer and refine labeling rules. Each rule comprises visual predicates that describe the image. HEPHA enables users to iteratively refine the rules by either direct manipulation through a visual programming interface or by labeling more images. To facilitate rule refinement, HEPHA recommends which rule to edit and which predicate to update. For users unfamiliar with visual programming, HEPHA suggests diverse and informative images to users for further labeling. We conducted a within-subjects user study with 16 participants and compared HEPHA with a variant of HEPHA and a deep learning-based approach. We found that HEPHA outperforms the two baselines in both specialized-domain and general-domain image labeling tasks. Our code is available at https://github.com/Neural-Symbolic-Image-Labeling/NSILWeb.
Chinese: HEPHA是一种混合主动的图像标注工具,通过归纳逻辑学习推断和优化标注规则,用户可通过可视化编程或标注更多图像迭代改进规则,在专业和通用领域的图像标注任务中均优于基线方法。
English: HEPHA is a mixed-initiative image labeling tool that uses inductive logic learning to infer and refine labeling rules, enabling users to iteratively improve them through visual programming or additional image labeling, and it outperforms baseline methods in both specialized and general domains.

Authors:Gabriele Sarti, Vilém Zouhar, Grzegorz Chrupała, Ana Guerberof-Arenas, Malvina Nissim, Arianna Bisazza
Title: QE4PE: Word-level Quality Estimation for Human Post-Editing
Abstract:
Word-level quality estimation (QE) methods aim to detect erroneous spans in machine translations, which can direct and facilitate human post-editing. While the accuracy of word-level QE systems has been assessed extensively, their usability and downstream influence on the speed, quality and editing choices of human post-editing remain understudied. In this study, we investigate the impact of word-level QE on machine translation (MT) post-editing in a realistic setting involving 42 professional post-editors across two translation directions. We compare four error-span highlight modalities, including supervised and uncertainty-based word-level QE methods, for identifying potential errors in the outputs of a state-of-the-art neural MT model. Post-editing effort and productivity are estimated from behavioral logs, while quality improvements are assessed by word- and segment-level human annotation. We find that domain, language and editors' speed are critical factors in determining highlights' effectiveness, with modest differences between human-made and automated QE highlights underlining a gap between accuracy and usability in professional workflows.
中文: 词级质量评估旨在识别机器翻译中的错误以辅助人工后编辑,但其对编辑效率和质量的实用影响尚待深入研究,其中领域和编辑速度等因素比高亮来源更能决定其有效性。
English: Word-level quality estimation aids in identifying machine translation errors to assist human post-editing, yet its practical impact on editing efficiency and quality remains underexplored, with factors like domain and editor speed influencing highlight effectiveness more than the source of the highlights themselves.

Authors:Siddhant Prakash, David R. Walton, Rafael K. dos Anjos, Anthony Steed, Tobias Ritschel
Title: Blind Augmentation: Calibration-free Camera Distortion Model Estimation for Real-time Mixed-reality Consistency
Abstract:
Real camera footage is subject to noise, motion blur (MB) and depth of field (DoF). In some applications these might be considered distortions to be removed, but in others it is important to model them because it would be ineffective, or interfere with an aesthetic choice, to simply remove them. In augmented reality applications where virtual content is composed into a live video feed, we can model noise, MB and DoF to make the virtual content visually consistent with the video. Existing methods for this typically suffer two main limitations. First, they require a camera calibration step to relate a known calibration target to the specific cameras response. Second, existing work require methods that can be (differentiably) tuned to the calibration, such as slow and specialized neural networks. We propose a method which estimates parameters for noise, MB and DoF instantly, which allows using off-the-shelf real-time simulation methods from e.g., a game engine in compositing augmented content. Our main idea is to unlock both features by showing how to use modern computer vision methods that can remove noise, MB and DoF from the video stream, essentially providing self-calibration. This allows to auto-tune any black-box real-time noise+MB+DoF method to deliver fast and high-fidelity augmentation consistency.

Authors:Rin Ashizawa, Yoichi Hirose, Nozomu Yoshinari, Kento Uchida, Shinichi Shirakawa
Title: Bandit-Based Prompt Design Strategy Selection Improves Prompt Optimizers
Abstract:
Prompt optimization aims to search for effective prompts that enhance the performance of large language models (LLMs). Although existing prompt optimization methods have discovered effective prompts, they often differ from sophisticated prompts carefully designed by human experts. Prompt design strategies, representing best practices for improving prompt performance, can be key to improving prompt optimization. Recently, a method termed the Autonomous Prompt Engineering Toolbox (APET) has incorporated various prompt design strategies into the prompt optimization process. In APET, the LLM is needed to implicitly select and apply the appropriate strategies because prompt design strategies can have negative effects. This implicit selection may be suboptimal due to the limited optimization capabilities of LLMs. This paper introduces Optimizing Prompts with sTrategy Selection (OPTS), which implements explicit selection mechanisms for prompt design. We propose three mechanisms, including a Thompson sampling-based approach, and integrate them into EvoPrompt, a well-known prompt optimizer. Experiments optimizing prompts for two LLMs, Llama-3-8B-Instruct and GPT-4o mini, were conducted using BIG-Bench Hard. Our results show that the selection of prompt design strategies improves the performance of EvoPrompt, and the Thompson sampling-based mechanism achieves the best overall results. Our experimental code is provided at https://github.com/shiralab/OPTS .
中文摘要:OPTS通过引入显式策略选择机制优化大语言模型的提示设计,其中基于汤普森采样的方法在提升EvoPrompt性能方面表现最佳。
English Summary: OPTS introduces explicit strategy selection mechanisms to optimize prompts for large language models, with a Thompson sampling-based approach showing the best performance in enhancing EvoPrompt's effectiveness.

Authors:Zongru Wu, Pengzhou Cheng, Zheng Wu, Tianjie Ju, Zhuosheng Zhang, Gongshen Liu
Title: Smoothing Grounding and Reasoning for MLLM-Powered GUI Agents with Query-Oriented Pivot Tasks
Abstract:
Perception-enhanced pre-training, particularly through grounding techniques, is widely adopted to enhance the performance of graphical user interface (GUI) agents. However, in resource-constrained scenarios, the format discrepancy between coordinate-oriented grounding and action-oriented reasoning limits the effectiveness of grounding for reasoning tasks. To address this challenge, we propose a query-oriented pivot approach called query inference, which serves as a bridge between GUI grounding and reasoning. By inferring potential user queries from a screenshot and its associated element coordinates, query inference improves the understanding of coordinates while aligning more closely with reasoning tasks. Experimental results show that query inference outperforms previous grounding techniques under the same training data scale. Notably, query inference achieves comparable or even better performance to large-scale grounding-enhanced OS-Atlas with less than 0.1% of training data. Furthermore, we explore the impact of reasoning formats and demonstrate that integrating additional semantic information into the input further boosts reasoning performance. The code is publicly available at https://github.com/ZrW00/GUIPivot.
中文摘要:提出的查询推理方法通过从截图和坐标推断用户查询,弥合了GUI基础与推理之间的差距,在极少训练数据下显著超越现有技术。
English Summary: The proposed query inference method bridges the gap between GUI grounding and reasoning by inferring user queries from screenshots and coordinates, significantly outperforming previous techniques with minimal training data.

Authors:Zhenxing Cui, Lu Chen, Yunhai Wang, Daniel Haehn, Yong Wang, Hanspeter Pfister
Title: Generalization of CNNs on Relational Reasoning with Bar Charts
Abstract:
This paper presents a systematic study of the generalization of convolutional neural networks (CNNs) and humans on relational reasoning tasks with bar charts. We first revisit previous experiments on graphical perception and update the benchmark performance of CNNs. We then test the generalization performance of CNNs on a classic relational reasoning task: estimating bar length ratios in a bar chart, by progressively perturbing the standard visualizations. We further conduct a user study to compare the performance of CNNs and humans. Our results show that CNNs outperform humans only when the training and test data have the same visual encodings. Otherwise, they may perform worse. We also find that CNNs are sensitive to perturbations in various visual encodings, regardless of their relevance to the target bars. Yet, humans are mainly influenced by bar lengths. Our study suggests that robust relational reasoning with visualizations is challenging for CNNs. Improving CNNs' generalization performance may require training them to better recognize task-related visual properties.
中文: 本研究表明,尽管卷积神经网络在视觉编码一致的条形图关系推理任务中表现优于人类,但在视觉干扰下其性能显著下降,而人类主要关注条形长度,这凸显了卷积神经网络在此类任务中实现稳健泛化的挑战。
English: This study demonstrates that while CNNs can outperform humans in relational reasoning tasks with bar charts when visual encodings remain consistent, their performance significantly deteriorates under visual perturbations, unlike humans who focus primarily on bar lengths, highlighting the challenge of achieving robust generalization in CNNs for such tasks.

Authors:Zijian Kang, Yueyang Li, Shengyu Gong, Weiming Zeng, Hongjie Yan, Lingbin Bian, Zhiguo Zhang, Wai Ting Siok, Nizhuan Wang
Title: Hypergraph Multi-Modal Learning for EEG-based Emotion Recognition in Conversation
Abstract:
Emotional Recognition in Conversation (ERC) is valuable for diagnosing health conditions such as autism and depression, and for understanding the emotions of individuals who struggle to express their feelings. Current ERC methods primarily rely on semantic, audio and visual data but face significant challenges in integrating physiological signals such as Electroencephalography (EEG). This research proposes Hypergraph Multi-Modal Learning (Hyper-MML), a novel framework for identifying emotions in conversation. Hyper-MML effectively integrates EEG with audio and video information to capture complex emotional dynamics. Firstly, we introduce an Adaptive Brain Encoder with Mutual-cross Attention (ABEMA) module for processing EEG signals. This module captures emotion-relevant features across different frequency bands and adapts to subject-specific variations through hierarchical mutual-cross attention mechanisms. Secondly, we propose an Adaptive Hypergraph Fusion Module (AHFM) to actively model the higher-order relationships among multi-modal signals in ERC. Experimental results on the EAV and AFFEC datasets demonstrate that our Hyper-MML model significantly outperforms current state-of-the-art methods. The proposed Hyper-MML can serve as an effective communication tool for healthcare professionals, enabling better engagement with patients who have difficulty expressing their emotions. The official implementation codes are available at https://github.com/NZWANG/Hyper-MML.
中文: 本研究提出超图多模态学习框架,通过自适应脑电编码器和融合模块整合脑电与视听信息,显著提升了对话情绪识别的性能,在医疗辅助沟通中具有应用潜力。
English: This study introduces Hypergraph Multi-Modal Learning (Hyper-MML), a novel framework that integrates EEG with audio-visual data through adaptive modules to enhance emotion recognition in conversations, demonstrating superior performance on benchmark datasets and potential healthcare applications.

Authors:Rose Connolly, Lauren Buck, Victor Zordan, Rachel McDonnell
Title: The Impact of Navigation on Proxemics in an Immersive Virtual Environment with Conversational Agents
Abstract:
As social VR grows in popularity, understanding how to optimise interactions becomes increasingly important. Interpersonal distance (the physical space people maintain between each other) is a key aspect of user experience. Previous work in psychology has shown that breaches of personal space cause stress and discomfort. Thus, effectively managing this distance is crucial in social VR, where social interactions are frequent. Teleportation, a commonly used locomotion method in these environments, involves distinct cognitive processes and requires users to rely on their ability to estimate distance. Despite its widespread use, the effect of teleportation on proximity remains unexplored. To investigate this, we measured the interpersonal distance of 70 participants during interactions with embodied conversational agents, comparing teleportation to natural walking. Our findings revealed that participants maintained closer proximity from the agents during teleportation. Female participants kept greater distances from the agents than male participants, and natural walking was associated with higher agency and body ownership, though co-presence remained unchanged. We propose that differences in spatial perception and spatial cognitive load contribute to reduced interpersonal distance with teleportation. These findings emphasise that proximity should be a key consideration when selecting locomotion methods in social VR, highlighting the need for further research on how locomotion impacts spatial perception and social dynamics in virtual environments.

Authors:Chaoyu Li, Sid Padmanabhuni, Maryam Cheema, Hasti Seifi, Pooyan Fazli
Title: VideoA11y: Method and Dataset for Accessible Video Description
Abstract:
Video descriptions are crucial for blind and low vision (BLV) users to access visual content. However, current artificial intelligence models for generating descriptions often fall short due to limitations in the quality of human annotations within training datasets, resulting in descriptions that do not fully meet BLV users' needs. To address this gap, we introduce VideoA11y, an approach that leverages multimodal large language models (MLLMs) and video accessibility guidelines to generate descriptions tailored for BLV individuals. Using this method, we have curated VideoA11y-40K, the largest and most comprehensive dataset of 40,000 videos described for BLV users. Rigorous experiments across 15 video categories, involving 347 sighted participants, 40 BLV participants, and seven professional describers, showed that VideoA11y descriptions outperform novice human annotations and are comparable to trained human annotations in clarity, accuracy, objectivity, descriptiveness, and user satisfaction. We evaluated models on VideoA11y-40K using both standard and custom metrics, demonstrating that MLLMs fine-tuned on this dataset produce high-quality accessible descriptions. Code and dataset are available at https://people-robots.github.io/VideoA11y.

Authors:Tianyun Liu
Title: Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
Abstract:
Traditional text-to-speech (TTS) methods primarily focus on establishing a mapping between phonemes and mel-spectrograms. However, during the phoneme encoding stage, there is often a lack of real mel-spectrogram auxiliary information, which results in the encoding process lacking true semantic understanding. At the same time, traditional TTS systems often struggle to balance the inference speed of the model with the quality of the synthesized speech. Methods that generate high-quality synthesized speech tend to have slower inference speeds, while faster inference methods often sacrifice speech quality. In this paper, I propose Clip-TTS, a TTS method based on the Clip architecture. This method uses the Clip framework to establish a connection between text content and real mel-spectrograms during the text encoding stage, enabling the text encoder to directly learn the true semantics of the global context, thereby ensuring the quality of the synthesized speech. In terms of model architecture, I adopt the basic structure of Transformer, which allows Clip-TTS to achieve fast inference speeds. Experimental results show that on the LJSpeech and Baker datasets, the speech generated by Clip-TTS achieves state-of-the-art MOS scores, and it also performs excellently on multi-emotion datasets.Audio samples are available at: https://ltydd1314.github.io/.

Authors:Nathalie Riche, Anna Offenwanger, Frederic Gmeiner, David Brown, Hugo Romat, Michel Pahud, Nicolai Marquardt, Kori Inkpen, Ken Hinckley
Title: AI-Instruments: Embodying Prompts as Instruments to Abstract & Reflect Graphical Interface Commands as General-Purpose Tools
Abstract:
Chat-based prompts respond with verbose linear-sequential texts, making it difficult to explore and refine ambiguous intents, back up and reinterpret, or shift directions in creative AI-assisted design work. AI-Instruments instead embody "prompts" as interface objects via three key principles: (1) Reification of user-intent as reusable direct-manipulation instruments; (2) Reflection of multiple interpretations of ambiguous user-intents (Reflection-in-intent) as well as the range of AI-model responses (Reflection-in-response) to inform design "moves" towards a desired result; and (3) Grounding to instantiate an instrument from an example, result, or extrapolation directly from another instrument. Further, AI-Instruments leverage LLM's to suggest, vary, and refine new instruments, enabling a system that goes beyond hard-coded functionality by generating its own instrumental controls from content. We demonstrate four technology probes, applied to image generation, and qualitative insights from twelve participants, showing how AI-Instruments address challenges of intent formulation, steering via direct manipulation, and non-linear iterative workflows to reflect and resolve ambiguous intents.

Authors:Jiahao Tang
Title: SDA-DDA Semi-supervised Domain Adaptation with Dynamic Distribution Alignment Network For Emotion Recognition Using EEG Signals
Abstract:
In this paper, we focus on the challenge of individual variability in affective brain-computer interfaces (aBCI), which employs electroencephalogram (EEG) signals to monitor and recognize human emotional states, thereby facilitating the advancement of emotion-aware technologies. The variability in EEG data across individuals poses a significant barrier to the development of effective and widely applicable aBCI models. To tackle this issue, we propose a novel transfer learning framework called Semi-supervised Domain Adaptation with Dynamic Distribution Alignment (SDA-DDA). This approach aligns the marginal and conditional probability distribution of source and target domains using maximum mean discrepancy (MMD) and conditional maximum mean discrepancy (CMMD). We introduce a dynamic distribution alignment mechanism to adjust differences throughout training and enhance adaptation. Additionally, a pseudo-label confidence filtering module is integrated into the semi-supervised process to refine pseudo-label generation and improve the estimation of conditional distributions. Extensive experiments on EEG benchmark databases (SEED, SEED-IV and DEAP) validate the robustness and effectiveness of SDA-DDA. The results demonstrate its superiority over existing methods in emotion recognition across various scenarios, including cross-subject and cross-session conditions. This advancement enhances the generalization and accuracy of emotion recognition, potentially fostering the development of personalized aBCI applications. The source code is accessible at https://github.com/XuanSuTrum/SDA-DDA.
中文: 本文提出SDA-DDA这一新型迁移学习框架,通过动态对齐概率分布和优化伪标签生成,有效解决情感脑机接口中的个体差异问题,在多个EEG数据集上展现出卓越的跨被试情感识别性能。
English: This paper introduces SDA-DDA, a novel transfer learning framework that addresses individual variability in affective brain-computer interfaces by dynamically aligning probability distributions and refining pseudo-labels, demonstrating superior emotion recognition performance across multiple EEG datasets.

Authors:Yuan Tian, Daniel Lee, Fei Wu, Tung Mai, Kun Qian, Siddhartha Sahai, Tianyi Zhang, Yunyao Li
Title: Text-to-SQL Domain Adaptation via Human-LLM Collaborative Data Annotation
Abstract:
Text-to-SQL models, which parse natural language (NL) questions to executable SQL queries, are increasingly adopted in real-world applications. However, deploying such models in the real world often requires adapting them to the highly specialized database schemas used in specific applications. We find that existing text-to-SQL models experience significant performance drops when applied to new schemas, primarily due to the lack of domain-specific data for fine-tuning. This data scarcity also limits the ability to effectively evaluate model performance in new domains. Continuously obtaining high-quality text-to-SQL data for evolving schemas is prohibitively expensive in real-world scenarios. To bridge this gap, we propose SQLsynth, a human-in-the-loop text-to-SQL data annotation system. SQLsynth streamlines the creation of high-quality text-to-SQL datasets through human-LLM collaboration in a structured workflow. A within-subjects user study comparing SQLsynth with manual annotation and ChatGPT shows that SQLsynth significantly accelerates text-to-SQL data annotation, reduces cognitive load, and produces datasets that are more accurate, natural, and diverse. Our code is available at https://github.com/adobe/nl_sql_analyzer.
中文: 针对文本转SQL模型在适应专业数据库模式时因缺乏领域数据而性能下降的问题,我们提出了SQLsynth系统,通过人机协作的工作流高效生成高质量数据集,显著提升了标注速度和数据质量。
English: Text-to-SQL models struggle with performance when adapting to specialized database schemas due to limited domain-specific data, prompting the development of SQLsynth, a human-in-the-loop system that enhances data annotation efficiency and quality through collaboration between humans and large language models.

Authors:Mengqiao Liu, Tevin Wang, Cassandra A. Cohen, Sarah Li, Chenyan Xiong
Title: Understand User Opinions of Large Language Models via LLM-Powered In-the-Moment User Experience Interviews
Abstract:
Which large language model (LLM) is better? Every evaluation tells a story, but what do users really think about current LLMs? This paper presents CLUE, an LLM-powered interviewer that conducts in-the-moment user experience interviews, right after users interact with LLMs, and automatically gathers insights about user opinions from massive interview logs. We conduct a study with thousands of users to understand user opinions on mainstream LLMs, recruiting users to first chat with a target LLM and then be interviewed by CLUE. Our experiments demonstrate that CLUE captures interesting user opinions, e.g., the bipolar views on the displayed reasoning process of DeepSeek-R1 and demands for information freshness and multi-modality. Our code and data are at https://github.com/cxcscmu/LLM-Interviewer.
中文: 本文提出CLUE,一个由大语言模型驱动的访谈系统,能在用户与模型交互后即时进行用户体验访谈,并通过海量访谈数据自动分析用户对主流模型(如DeepSeek-R1)的真实看法。
English: This paper introduces CLUE, an LLM-powered interviewer that conducts real-time user experience interviews after interactions with LLMs, automatically extracting insights from large-scale logs to reveal user opinions on mainstream models like DeepSeek-R1.

Authors:Xiaoyu Chen, Changde Du, Che Liu, Yizhe Wang, Huiguang He
Title: BP-GPT: Auditory Neural Decoding Using fMRI-prompted LLM
Abstract:
Decoding language information from brain signals represents a vital research area within brain-computer interfaces, particularly in the context of deciphering the semantic information from the fMRI signal. Although existing work uses LLM to achieve this goal, their method does not use an end-to-end approach and avoids the LLM in the mapping of fMRI-to-text, leaving space for the exploration of the LLM in auditory decoding. In this paper, we introduce a novel method, the Brain Prompt GPT (BP-GPT). By using the brain representation that is extracted from the fMRI as a prompt, our method can utilize GPT-2 to decode fMRI signals into stimulus text. Further, we introduce the text prompt and align the fMRI prompt to it. By introducing the text prompt, our BP-GPT can extract a more robust brain prompt and promote the decoding of pre-trained LLM. We evaluate our BP-GPT on the open-source auditory semantic decoding dataset and achieve a significant improvement up to 4.61 on METEOR and 2.43 on BERTScore across all the subjects compared to the state-of-the-art method. The experimental results demonstrate that using brain representation as a prompt to further drive LLM for auditory neural decoding is feasible and effective. The code is available at https://github.com/1994cxy/BP-GPT.
中文: 本文提出的BP-GPT方法通过将fMRI信号转换为脑提示来驱动GPT-2进行听觉语义解码,相比现有最佳方法实现了显著性能提升。
English: This paper introduces BP-GPT, an end-to-end method that uses fMRI-derived brain prompts to drive GPT-2 for auditory semantic decoding, achieving significant improvements over state-of-the-art methods.

Authors:Xingbo Wang, Janessa Griffith, Daniel A. Adler, Joey Castillo, Tanzeem Choudhury, Fei Wang
Title: Exploring Personalized Health Support through Data-Driven, Theory-Guided LLMs: A Case Study in Sleep Health
Abstract:
Despite the prevalence of sleep-tracking devices, many individuals struggle to translate data into actionable improvements in sleep health. Current methods often provide data-driven suggestions but may not be feasible and adaptive to real-life constraints and individual contexts. We present HealthGuru, a novel large language model-powered chatbot to enhance sleep health through data-driven, theory-guided, and adaptive recommendations with conversational behavior change support. HealthGuru's multi-agent framework integrates wearable device data, contextual information, and a contextual multi-armed bandit model to suggest tailored sleep-enhancing activities. The system facilitates natural conversations while incorporating data-driven insights and theoretical behavior change techniques. Our eight-week in-the-wild deployment study with 16 participants compared HealthGuru to a baseline chatbot. Results show improved metrics like sleep duration and activity scores, higher quality responses, and increased user motivation for behavior change with HealthGuru. We also identify challenges and design considerations for personalization and user engagement in health chatbots.
中文摘要:HealthGuru是一种新型的基于大语言模型的聊天机器人,通过个性化睡眠建议和对话式支持,在实际应用中显著改善了用户睡眠指标并提升了参与度。
English Summary: HealthGuru is a novel LLM-powered chatbot that provides personalized, adaptive sleep recommendations and conversational support, demonstrating improved sleep metrics and user engagement in real-world testing.

Authors:Minsuk Chang, Yao Wang, Huichen Will Wang, Andreas Bulling, Cindy Xiong Bearfield
Title: Grid Labeling: Crowdsourcing Task-Specific Importance from Visualizations
Abstract:
Knowing where people look in visualizations is key to effective design. Yet, existing research primarily focuses on free-viewing-based saliency models - although visual attention is inherently task-dependent. Collecting task-relevant importance data remains a resource-intensive challenge. To address this, we introduce Grid Labeling - a novel annotation method for collecting task-specific importance data to enhance saliency prediction models. Grid Labeling dynamically segments visualizations into Adaptive Grids, enabling efficient, low-effort annotation while adapting to visualization structure. We conducted a human subject study comparing Grid Labeling with existing annotation methods, ImportAnnots, and BubbleView across multiple metrics. Results show that Grid Labeling produces the least noisy data and the highest inter-participant agreement with fewer participants while requiring less physical (e.g., clicks/mouse movements) and cognitive effort. An interactive demo is available at https://jangsus1.github.io/Grid-Labeling.

Authors:Chanjin Zheng, Zengyi Yu, Yilin Jiang, Mingzi Zhang, Xunuo Lu, Jing Jin, Liteng Gao
Title: ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities
Abstract:
Can Multimodal Large Language Models (MLLMs), with capabilities in perception, recognition, understanding, and reasoning, function as independent assistants in art evaluation dialogues? Current MLLM evaluation methods, which rely on subjective human scoring or costly interviews, lack comprehensive coverage of various scenarios. This paper proposes a process-oriented Human-Computer Interaction (HCI) space design to facilitate more accurate MLLM assessment and development. This approach aids teachers in efficient art evaluation while also recording interactions for MLLM capability assessment. We introduce ArtMentor, a comprehensive space that integrates a dataset and three systems to optimize MLLM evaluation. The dataset consists of 380 sessions conducted by five art teachers across nine critical dimensions. The modular system includes agents for entity recognition, review generation, and suggestion generation, enabling iterative upgrades. Machine learning and natural language processing techniques ensure the reliability of evaluations. The results confirm GPT-4o's effectiveness in assisting teachers in art evaluation dialogues. Our contributions are available at https://artmentor.github.io/.

Authors:Jianwei Yang, Reuben Tan, Qianhui Wu, Ruijie Zheng, Baolin Peng, Yongyuan Liang, Yu Gu, Mu Cai, Seonghyeon Ye, Joel Jang, Yuquan Deng, Lars Liden, Jianfeng Gao
Title: Magma: A Foundation Model for Multimodal AI Agents
Abstract:
We present Magma, a foundation model that serves multimodal AI agentic tasks in both the digital and physical worlds. Magma is a significant extension of vision-language (VL) models in that it not only retains the VL understanding ability (verbal intelligence) of the latter, but is also equipped with the ability to plan and act in the visual-spatial world (spatial-temporal intelligence) and complete agentic tasks ranging from UI navigation to robot manipulation. To endow the agentic capabilities, Magma is pretrained on large amounts of heterogeneous datasets spanning from images, videos to robotics data, where the actionable visual objects (e.g., clickable buttons in GUI) in images are labeled by Set-of-Mark (SoM) for action grounding, and the object movements (e.g., the trace of human hands or robotic arms) in videos are labeled by Trace-of-Mark (ToM) for action planning. Extensive experiments show that SoM and ToM reach great synergy and facilitate the acquisition of spatial-temporal intelligence for our Magma model, which is fundamental to a wide range of tasks as shown in Fig.1. In particular, Magma creates new state-of-the-art results on UI navigation and robotic manipulation tasks, outperforming previous models that are specifically tailored to these tasks. On image and video-related multimodal tasks, Magma also compares favorably to popular large multimodal models that are trained on much larger datasets. We make our model and code public for reproducibility at https://microsoft.github.io/Magma.
中文: Magma是一个多模态基础模型,不仅具备视觉语言理解能力,还通过空间-时间智能在数字和物理世界中规划执行任务,在界面导航和机器人操控等任务中创造了最新最优性能。
English: Magma is a multimodal foundation model that extends vision-language capabilities by integrating spatial-temporal intelligence for planning and executing agentic tasks in both digital and physical environments, achieving state-of-the-art performance in UI navigation and robotic manipulation.

Authors:Qingwei Ben, Feiyu Jia, Jia Zeng, Junting Dong, Dahua Lin, Jiangmiao Pang
Title: HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit
Abstract:
Generalizable humanoid loco-manipulation poses significant challenges, requiring coordinated whole-body control and precise, contact-rich object manipulation. To address this, this paper introduces HOMIE, a semi-autonomous teleoperation system that combines a reinforcement learning policy for body control mapped to a pedal, an isomorphic exoskeleton arm for arm control, and motion-sensing gloves for hand control, forming a unified cockpit to freely operate humanoids and establish a data flywheel. The policy incorporates novel designs, including an upper-body pose curriculum, a height-tracking reward, and symmetry utilization. These features enable the system to perform walking and squatting to specific heights while seamlessly adapting to arbitrary upper-body poses. The exoskeleton, by eliminating the reliance on inverse dynamics, delivers faster and more precise arm control. The gloves utilize Hall sensors instead of servos, allowing even compact devices to achieve 15 or more degrees of freedom and freely adapt to any model of dexterous hands. Compared to previous teleoperation systems, HOMIE stands out for its exceptional efficiency, completing tasks in half the time; its expanded working range, allowing users to freely reach high and low areas as well as interact with any objects; and its affordability, with a price of just $500. The system is fully open-source, demos and code can be found in our https://homietele.github.io/.
中文: HOMIE是一个创新的半自主遥操作系统,它结合了踏板映射强化学习策略控制身体、同构外骨骼控制手臂和运动感应手套控制双手,以高效、经济的方式扩展了仿人机器人的操作范围,并完全开源。
English: HOMIE is an innovative semi-autonomous teleoperation system that integrates a pedal-mapped reinforcement learning policy for body movement, an isomorphic exoskeleton for precise arm control, and motion-sensing gloves for hand manipulation, enabling efficient and affordable humanoid operation with expanded capabilities and full open-source availability.

Authors:Wujiang Xu, Kai Mei, Hang Gao, Juntao Tan, Zujie Liang, Yongfeng Zhang
Title: A-MEM: Agentic Memory for LLM Agents
Abstract:
While large language model (LLM) agents can effectively use external tools for complex real-world tasks, they require memory systems to leverage historical experiences. Current memory systems enable basic storage and retrieval but lack sophisticated memory organization, despite recent attempts to incorporate graph databases. Moreover, these systems' fixed operations and structures limit their adaptability across diverse tasks. To address this limitation, this paper proposes a novel agentic memory system for LLM agents that can dynamically organize memories in an agentic way. Following the basic principles of the Zettelkasten method, we designed our memory system to create interconnected knowledge networks through dynamic indexing and linking. When a new memory is added, we generate a comprehensive note containing multiple structured attributes, including contextual descriptions, keywords, and tags. The system then analyzes historical memories to identify relevant connections, establishing links where meaningful similarities exist. Additionally, this process enables memory evolution - as new memories are integrated, they can trigger updates to the contextual representations and attributes of existing historical memories, allowing the memory network to continuously refine its understanding. Our approach combines the structured organization principles of Zettelkasten with the flexibility of agent-driven decision making, allowing for more adaptive and context-aware memory management. Empirical experiments on six foundation models show superior improvement against existing SOTA baselines. The source code for evaluating performance is available at https://github.com/WujiangXu/A-mem, while the source code of the agentic memory system is available at https://github.com/WujiangXu/A-mem-sys.
中文: 本文提出了一种基于Zettelkasten方法的智能记忆系统,通过动态索引和链接构建互联知识网络,使LLM代理能够实现记忆的持续演进,在实验中展现出优于现有方法的性能。
English: This paper introduces an agentic memory system for LLM agents that dynamically organizes and interconnects memories using Zettelkasten principles, enabling continuous evolution and superior performance over existing methods.

Authors:Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu, Jianchang Wu, Jiangjie Zhen, Ranchen Ming, Song Yuan, Xuelin Zhang, Yu Zhou, Bingxin Li, Buyun Ma, Hongyuan Wang, Kang An, Wei Ji, Wen Li, Xuan Wen, Xiangwen Kong, Yuankai Ma, Yuanwei Liang, Yun Mou, Bahtiyar Ahmidi, Bin Wang, Bo Li, Changxin Miao, Chen Xu, Chenrun Wang, Dapeng Shi, Deshan Sun, Dingyuan Hu, Dula Sai, Enle Liu, Guanzhe Huang, Gulin Yan, Heng Wang, Haonan Jia, Haoyang Zhang, Jiahao Gong, Junjing Guo, Jiashuai Liu, Jiahong Liu, Jie Feng, Jie Wu, Jiaoren Wu, Jie Yang, Jinguo Wang, Jingyang Zhang, Junzhe Lin, Kaixiang Li, Lei Xia, Li Zhou, Liang Zhao, Longlong Gu, Mei Chen, Menglin Wu, Ming Li, Mingxiao Li, Mingliang Li, Mingyao Liang, Na Wang, Nie Hao, Qiling Wu, Qinyuan Tan, Ran Sun, Shuai Shuai, Shaoliang Pang, Shiliang Yang, Shuli Gao, Shanshan Yuan, Siqi Liu, Shihong Deng, Shilei Jiang, Sitong Liu, Tiancheng Cao, Tianyu Wang, Wenjin Deng, Wuxun Xie, Weipeng Ming, Wenqing He, Wen Sun, Xin Han, Xin Huang, Xiaomin Deng, Xiaojia Liu, Xin Wu, Xu Zhao, Yanan Wei, Yanbo Yu, Yang Cao, Yangguang Li, Yangzhen Ma, Yanming Xu, Yaoyu Wang, Yaqiang Shi, Yilei Wang, Yizhuang Zhou, Yinmin Zhong, Yang Zhang, Yaoben Wei, Yu Luo, Yuanwei Lu, Yuhe Yin, Yuchu Luo, Yuanhao Ding, Yuting Yan, Yaqi Dai, Yuxiang Yang, Zhe Xie, Zheng Ge, Zheng Sun, Zhewei Huang, Zhichao Chang, Zhisheng Guan, Zidong Yang, Zili Zhang, Binxing Jiao, Daxin Jiang, Heung-Yeung Shum, Jiansheng Chen, Jing Li, Shuchang Zhou, Xiangyu Zhang, Xinhao Zhang, Yibo Zhu
Title: Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Abstract:
Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contributions include: 1) a 130B-parameter unified speech-text multi-modal model that achieves unified understanding and generation, with the Step-Audio-Chat version open-sourced; 2) a generative speech data engine that establishes an affordable voice cloning framework and produces the open-sourced lightweight Step-Audio-TTS-3B model through distillation; 3) an instruction-driven fine control system enabling dynamic adjustments across dialects, emotions, singing, and RAP; 4) an enhanced cognitive architecture augmented with tool calling and role-playing abilities to manage complex tasks effectively. Based on our new StepEval-Audio-360 evaluation benchmark, Step-Audio achieves state-of-the-art performance in human evaluations, especially in terms of instruction following. On open-source benchmarks like LLaMA Question, shows 9.3% average performance improvement, demonstrating our commitment to advancing the development of open-source multi-modal language technologies. Our code and models are available at https://github.com/stepfun-ai/Step-Audio.
中文: 本文提出首个生产就绪的开源方案Step-Audio,通过1300亿参数统一语音文本模型、生成式数据引擎、动态控制系统和增强认知架构,在人工评估中实现最优性能,显著提升开源多模态技术发展。
English: This paper introduces Step-Audio, a production-ready open-source solution featuring a unified 130B-parameter speech-text model, generative data engine, dynamic control system, and enhanced cognitive architecture that achieves state-of-the-art performance in human evaluations.

Authors:Shao Zhang, Xihuai Wang, Wenhao Zhang, Chaoran Li, Junru Song, Tingyu Li, Lin Qiu, Xuezhi Cao, Xunliang Cai, Wen Yao, Weinan Zhang, Xinbing Wang, Ying Wen
Title: Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration
Abstract:
Agents built on large language models (LLMs) have excelled in turn-by-turn human-AI collaboration but struggle with simultaneous tasks requiring real-time interaction. Latency issues and the challenge of inferring variable human strategies hinder their ability to make autonomous decisions without explicit instructions. Through experiments with current independent System 1 and System 2 methods, we validate the necessity of using Dual Process Theory (DPT) in real-time tasks. We propose DPT-Agent, a novel language agent framework that integrates System 1 and System 2 for efficient real-time simultaneous human-AI collaboration. DPT-Agent's System 1 uses a Finite-state Machine (FSM) and code-as-policy for fast, intuitive, and controllable decision-making. DPT-Agent's System 2 integrates Theory of Mind (ToM) and asynchronous reflection to infer human intentions and perform reasoning-based autonomous decisions. We demonstrate the effectiveness of DPT-Agent through further experiments with rule-based agents and human collaborators, showing significant improvements over mainstream LLM-based frameworks. DPT-Agent can effectively help LLMs convert correct slow thinking and reasoning into executable actions, thereby improving performance. To the best of our knowledge, DPT-Agent is the first language agent framework that achieves successful real-time simultaneous human-AI collaboration autonomously. Code of DPT-Agent can be found in https://github.com/sjtu-marl/DPT-Agent.
中文摘要:DPT-Agent是一种新型语言智能体框架,通过整合快速反应的系统1和基于推理的系统2,实现了自主的实时人机协作,克服了当前基于大语言模型的智能体在延迟和策略推断方面的局限。
English Summary: DPT-Agent is a novel language agent framework that integrates fast System 1 and reasoning-based System 2 processes to enable autonomous real-time human-AI collaboration, overcoming latency and strategy inference limitations of current LLM-based agents.

Authors:Yixin Ou, Yunzhi Yao, Ningyu Zhang, Hui Jin, Jiacheng Sun, Shumin Deng, Zhenguo Li, Huajun Chen
Title: How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
Abstract:
Despite exceptional capabilities in knowledge-intensive tasks, Large Language Models (LLMs) face a critical gap in understanding how they internalize new knowledge, particularly how to structurally embed acquired knowledge in their neural computations. We address this issue through the lens of knowledge circuit evolution, identifying computational subgraphs that facilitate knowledge storage and processing. Our systematic analysis of circuit evolution throughout continual pre-training reveals several key findings: (1) the acquisition of new knowledge is influenced by its relevance to pre-existing knowledge; (2) the evolution of knowledge circuits exhibits a distinct phase shift from formation to optimization; (3) the evolution of knowledge circuits follows a deep-to-shallow pattern. These insights not only advance our theoretical understanding of the mechanisms of new knowledge acquisition in LLMs, but also provide potential implications for improving continual pre-training strategies to enhance model performance. Code and data will be available at https://github.com/zjunlp/DynamicKnowledgeCircuits.
Chinese: 本研究通过知识回路演化的视角,揭示了大语言模型获取新知识受其与已有知识相关性影响,遵循从深层到浅层的模式,并经历形成到优化的阶段转变,为改进持续预训练策略提供了理论依据。
English: This study investigates how Large Language Models structurally embed new knowledge through knowledge circuit evolution, revealing that acquisition depends on relevance to existing knowledge, follows a deep-to-shallow pattern, and shifts from formation to optimization, offering insights to improve continual pre-training strategies.

Authors:Haoming Xu, Ningyuan Zhao, Liming Yang, Sendong Zhao, Shumin Deng, Mengru Wang, Bryan Hooi, Nay Oo, Huajun Chen, Ningyu Zhang
Title: ReLearn: Unlearning via Learning for Large Language Models
Abstract:
Current unlearning methods for large language models usually rely on reverse optimization to reduce target token probabilities. However, this paradigm disrupts the subsequent tokens prediction, degrading model performance and linguistic coherence. Moreover, existing evaluation metrics overemphasize contextual forgetting while inadequately assessing response fluency and relevance. To address these challenges, we propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning, along with a comprehensive evaluation framework. This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation, and Linguistic Score (LS) to evaluate generation quality. Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output. Through mechanistic analysis, we further demonstrate how reverse optimization disrupts coherent text generation, while ReLearn preserves this essential capability. Code is available at https://github.com/zjunlp/unlearn.
中文摘要:提出的ReLearn方法通过数据增强和微调有效实现大语言模型的定向遗忘,同时保持输出质量,优于会破坏语言连贯性的反向优化方法。
English Summary: The proposed ReLearn method effectively achieves targeted unlearning in large language models through data augmentation and fine-tuning while maintaining output quality, outperforming reverse optimization approaches that compromise linguistic coherence.

Authors:Lauri Seppäläinen, Mudong Guo, Kai Puolamäki
Title: ExplainReduce: Summarising local explanations via proxies
Abstract:
Most commonly used non-linear machine learning methods are closed-box models, uninterpretable to humans. The field of explainable artificial intelligence (XAI) aims to develop tools to examine the inner workings of these closed boxes. An often-used model-agnostic approach to XAI involves using simple models as local approximations to produce so-called local explanations; examples of this approach include LIME, SHAP, and SLISEMAP. This paper shows how a large set of local explanations can be reduced to a small "proxy set" of simple models, which can act as a generative global explanation. This reduction procedure, ExplainReduce, can be formulated as an optimisation problem and approximated efficiently using greedy heuristics.
中文:ExplainReduce方法通过优化算法将大量局部解释简化为一个精简的代理简单模型集合,从而为复杂机器学习系统提供可生成的全局解释。
English: ExplainReduce is a method that compiles numerous local explanations into a concise proxy set of simple models, offering a global understanding of complex machine learning systems through efficient optimization.

Authors:Arvind Pillai, Dimitris Spathis, Subigya Nepal, Amanda C Collins, Daniel M Mackin, Michael V Heinz, Tess Z Griffin, Nicholas C Jacobson, Andrew Campbell
Title: Time2Lang: Bridging Time-Series Foundation Models and Large Language Models for Health Sensing Beyond Prompting
Abstract:
Large language models (LLMs) show promise for health applications when combined with behavioral sensing data. Traditional approaches convert sensor data into text prompts, but this process is prone to errors, computationally expensive, and requires domain expertise. These challenges are particularly acute when processing extended time series data. While time series foundation models (TFMs) have recently emerged as powerful tools for learning representations from temporal data, bridging TFMs and LLMs remains challenging. Here, we present Time2Lang, a framework that directly maps TFM outputs to LLM representations without intermediate text conversion. Our approach first trains on synthetic data using periodicity prediction as a pretext task, followed by evaluation on mental health classification tasks. We validate Time2Lang on two longitudinal wearable and mobile sensing datasets: daily depression prediction using step count data (17,251 days from 256 participants) and flourishing classification based on conversation duration (46 participants over 10 weeks). Time2Lang maintains near constant inference times regardless of input length, unlike traditional prompting methods. The generated embeddings preserve essential time-series characteristics such as auto-correlation. Our results demonstrate that TFMs and LLMs can be effectively integrated while minimizing information loss and enabling performance transfer across these distinct modeling paradigms. To our knowledge, we are the first to integrate a TFM and an LLM for health, thus establishing a foundation for future research combining general-purpose large models for complex healthcare tasks.
中文摘要:Time2Lang框架通过将时序基础模型的输出直接映射到大型语言模型表示,无需文本转换即可有效整合两种模型,在心理健康分类任务中实现恒定推理时间并保持时序特征,为医疗健康应用建立了新范式。
English Summary: The Time2Lang framework effectively bridges time series foundation models and large language models by directly mapping sensor data representations without text conversion, enabling efficient mental health classification with constant inference time and preserved temporal characteristics.

Authors:Dylan Waldner, Risto Miikkulainen
Title: The Odyssey of the Fittest: Can Agents Survive and Still Be Good?
Abstract:
As AI models grow in power and generality, understanding how agents learn and make decisions in complex environments is critical to promoting ethical behavior. This study introduces the Odyssey, a lightweight, adaptive text based adventure game, providing a scalable framework for exploring AI ethics and safety. The Odyssey examines the ethical implications of implementing biological drives, specifically, self preservation, into three different agents. A Bayesian agent optimized with NEAT, a Bayesian agent optimized with stochastic variational inference, and a GPT 4o agent. The agents select actions at each scenario to survive, adapting to increasingly challenging scenarios. Post simulation analysis evaluates the ethical scores of the agent decisions, uncovering the tradeoffs it navigates to survive. Specifically, analysis finds that when danger increases, agents ethical behavior becomes unpredictable. Surprisingly, the GPT 4o agent outperformed the Bayesian models in both survival and ethical consistency, challenging assumptions about traditional probabilistic methods and raising a new challenge to understand the mechanisms of LLMs' probabilistic reasoning.
中文摘要:本研究通过引入轻量级文本冒险游戏《奥德赛》,探索了三种具有生物驱动力的智能体的伦理行为,发现在危险增加时伦理行为变得不可预测,且GPT-4o智能体在生存能力和伦理一致性上均意外优于贝叶斯模型。
English Summary: This study introduces the Odyssey, a text-based game framework, to explore AI ethics by testing three agents with biological drives, finding that ethical behavior becomes unpredictable under danger and that the GPT-4o agent surprisingly outperformed Bayesian models in both survival and ethical consistency.

Authors:Amy Smith, Barrett R. Anderson, Jasmine Tan Otto, Isaac Karth, Yuqian Sun, John Joon Young Chung, Melissa Roemmele, Max Kreminski
Title: Fuzzy Linkography: Automatic Graphical Summarization of Creative Activity Traces
Abstract:
Linkography -- the analysis of links between the design moves that make up an episode of creative ideation or design -- can be used for both visual and quantitative assessment of creative activity traces. Traditional linkography, however, is time-consuming, requiring a human coder to manually annotate both the design moves within an episode and the connections between them. As a result, linkography has not yet been much applied at scale. To address this limitation, we introduce fuzzy linkography: a means of automatically constructing a linkograph from a sequence of recorded design moves via a "fuzzy" computational model of semantic similarity, enabling wider deployment and new applications of linkographic techniques. We apply fuzzy linkography to three markedly different kinds of creative activity traces (text-to-image prompting journeys, LLM-supported ideation sessions, and researcher publication histories) and discuss our findings, as well as strengths, limitations, and potential future applications of our approach.
中文: 模糊链接图通过语义相似性自动分析创意设计步骤,克服了传统方法耗时且依赖人工的局限,实现了更广泛的应用。
English: Fuzzy linkography automates the analysis of creative design moves using semantic similarity, enabling scalable and diverse applications beyond traditional manual methods.

Authors:Sharana Dharshikgan Suresh Dass, Hrishav Bakul Barua, Ganesh Krishnasamy, Raveendran Paramesran, Raphael C. -W. Phan
Title: MD-BERT: Action Recognition in Dark Videos via Dynamic Multi-Stream Fusion and Temporal Modeling
Abstract:
Action recognition in dark, low-light (under-exposed) or noisy videos is a challenging task due to visibility degradation, which can hinder critical spatiotemporal details. This paper proposes MD-BERT, a novel multi-stream approach that integrates complementary pre-processing techniques such as gamma correction and histogram equalization alongside raw dark frames to address these challenges. We introduce the Dynamic Feature Fusion (DFF) module, extending existing attentional fusion methods to a three-stream setting, thereby capturing fine-grained and global contextual information across different brightness and contrast enhancements. The fused spatiotemporal features are then processed by a BERT-based temporal model, which leverages its bidirectional self-attention to effectively capture long-range dependencies and contextual relationships across frames. Extensive experiments on the ARID V1.0 and ARID V1.5 dark video datasets show that MD-BERT outperforms existing methods, establishing a new state-of-the-art performance. Ablation studies further highlight the individual contributions of each input stream and the effectiveness of the proposed DFF and BERT modules. The official website of this work is available at: https://github.com/HrishavBakulBarua/DarkBERT
中文摘要:本文提出MD-BERT多流框架,通过动态特征融合模块整合增强视频输入与BERT时序模型,在暗光视频行为识别任务中实现了最优性能。
English Summary: This paper introduces MD-BERT, a multi-stream framework that combines enhanced video inputs with a BERT-based temporal model to achieve state-of-the-art action recognition in dark videos through dynamic feature fusion.

Authors:Linghe Wang, Minhwa Lee, Ross Volkov, Luan Tuyen Chau, Dongyeop Kang
Title: ScholaWrite: A Dataset of End-to-End Scholarly Writing Process
Abstract:
Writing is a cognitively demanding task involving continuous decision-making, heavy use of working memory, and frequent switching between multiple activities. Scholarly writing is particularly complex as it requires authors to coordinate many pieces of multiform knowledge. To fully understand writers' cognitive thought process, one should fully decode the end-to-end writing data (from individual ideas to final manuscript) and understand their complex cognitive mechanisms in scholarly writing. We introduce ScholaWrite dataset, a first-of-its-kind keystroke corpus of an end-to-end scholarly writing process for complete manuscripts, with thorough annotations of cognitive writing intentions behind each keystroke. Our dataset includes LaTeX-based keystroke data from five preprints with nearly 62K total text changes and annotations across 4 months of paper writing. ScholaWrite shows promising usability and applications (e.g., iterative self-writing), demonstrating the importance of collection of end-to-end writing data, rather than the final manuscript, for the development of future writing assistants to support the cognitive thinking process of scientists. Our de-identified data examples and code are available on our project page.

Authors:Xiaofan Yu, Lanxiang Hu, Benjamin Reichman, Dylan Chu, Rushil Chandrupatla, Xiyuan Zhang, Larry Heck, Tajana Rosing
Title: SensorChat: Answering Qualitative and Quantitative Questions during Long-Term Multimodal Sensor Interactions
Abstract:
Natural language interaction with sensing systems is crucial for addressing users' personal concerns and providing health-related insights into their daily lives. When a user asks a question, the system automatically analyzes the full history of sensor data, extracts relevant information, and generates an appropriate response. However, existing systems are limited to short-duration (e.g., one minute) or low-frequency (e.g., daily step count) sensor data. In addition, they struggle with quantitative questions that require precise numerical answers. In this work, we introduce SensorChat, the first end-to-end QA system designed for daily life monitoring using long-duration, high-frequency time series data. Given raw sensor signals spanning multiple days and a user-defined natural language question, SensorChat generates semantically meaningful responses that directly address user concerns. SensorChat effectively handles both quantitative questions that require numerical precision and qualitative questions that require high-level reasoning to infer subjective insights. To achieve this, SensorChat uses an innovative three-stage pipeline including question decomposition, sensor data query, and answer assembly. The first and third stages leverage Large Language Models (LLMs) to interpret human queries and generate responses. The intermediate querying stage extracts relevant information from the complete sensor data history. Real-world implementations demonstrate SensorChat's capability for real-time interactions on a cloud server while also being able to run entirely on edge platforms after quantization. Comprehensive QA evaluations show that SensorChat achieves 93% higher answer accuracy than the best performing state-of-the-art systems on quantitative questions. Furthermore, a user study with eight volunteers highlights SensorChat's effectiveness in answering qualitative questions.
中文: SensorChat是首个端到端问答系统,通过利用大型语言模型的三阶段流程处理长期高频传感器数据,既能精确回答定量问题,又能推理定性问题的主观洞察。
English: SensorChat is the first end-to-end QA system that processes long-duration, high-frequency sensor data to generate precise numerical answers for quantitative questions and infer subjective insights for qualitative questions through a three-stage pipeline leveraging LLMs.

Authors:Rohit Girmaji, Bhav Beri, Ramanathan Subramanian, Vineet Gandhi
Title: EditIQ: Automated Cinematic Editing of Static Wide-Angle Videos via Dialogue Interpretation and Saliency Cues
Abstract:
We present EditIQ, a completely automated framework for cinematically editing scenes captured via a stationary, large field-of-view and high-resolution camera. From the static camera feed, EditIQ initially generates multiple virtual feeds, emulating a team of cameramen. These virtual camera shots termed rushes are subsequently assembled using an automated editing algorithm, whose objective is to present the viewer with the most vivid scene content. To understand key scene elements and guide the editing process, we employ a two-pronged approach: (1) a large language model (LLM)-based dialogue understanding module to analyze conversational flow, coupled with (2) visual saliency prediction to identify meaningful scene elements and camera shots therefrom. We then formulate cinematic video editing as an energy minimization problem over shot selection, where cinematic constraints determine shot choices, transitions, and continuity. EditIQ synthesizes an aesthetically and visually compelling representation of the original narrative while maintaining cinematic coherence and a smooth viewing experience. Efficacy of EditIQ against competing baselines is demonstrated via a psychophysical study involving twenty participants on the BBC Old School dataset plus eleven theatre performance videos. Video samples from EditIQ can be found at https://editiq-ave.github.io/.

Authors:Muhammad Zain Raza, Jiawei Xu, Terence Lim, Lily Boddy, Carlos M. Mery, Andrew Well, Ying Ding
Title: LLM-TA: An LLM-Enhanced Thematic Analysis Pipeline for Transcripts from Parents of Children with Congenital Heart Disease
Abstract:
Thematic Analysis (TA) is a fundamental method in healthcare research for analyzing transcript data, but it is resource-intensive and difficult to scale for large, complex datasets. This study investigates the potential of large language models (LLMs) to augment the inductive TA process in high-stakes healthcare settings. Focusing on interview transcripts from parents of children with Anomalous Aortic Origin of a Coronary Artery (AAOCA), a rare congenital heart disease, we propose an LLM-Enhanced Thematic Analysis (LLM-TA) pipeline. Our pipeline integrates an affordable state-of-the-art LLM (GPT-4o mini), LangChain, and prompt engineering with chunking techniques to analyze nine detailed transcripts following the inductive TA framework. We evaluate the LLM-generated themes against human-generated results using thematic similarity metrics, LLM-assisted assessments, and expert reviews. Results demonstrate that our pipeline outperforms existing LLM-assisted TA methods significantly. While the pipeline alone has not yet reached human-level quality in inductive TA, it shows great potential to improve scalability, efficiency, and accuracy while reducing analyst workload when working collaboratively with domain experts. We provide practical recommendations for incorporating LLMs into high-stakes TA workflows and emphasize the importance of close collaboration with domain experts to address challenges related to real-world applicability and dataset complexity. https://github.com/jiaweixu98/LLM-TA
中文: 本研究提出了一种LLM增强主题分析(LLM-TA)流程,通过整合先进语言模型与专家知识显著提升了医疗研究的可扩展性和效率,尽管在归纳性主题分析中尚未完全达到人类专家的水平。
English: This study introduces an LLM-Enhanced Thematic Analysis (LLM-TA) pipeline that significantly improves scalability and efficiency in healthcare research by integrating advanced language models with human expertise, though it has not yet achieved full human-level quality in inductive thematic analysis.

Authors:Zaitian Wang, Jian He, Yu Liang, Xiyuan Hu, Tianhao Peng, Kaixin Wang, Jiakai Wang, Chenlong Zhang, Weili Zhang, Shuang Niu, Xiaoyang Xie
Title: Milmer: a Framework for Multiple Instance Learning based Multimodal Emotion Recognition
Abstract:
Emotions play a crucial role in human behavior and decision-making, making emotion recognition a key area of interest in human-computer interaction (HCI). This study addresses the challenges of emotion recognition by integrating facial expression analysis with electroencephalogram (EEG) signals, introducing a novel multimodal framework-Milmer. The proposed framework employs a transformer-based fusion approach to effectively integrate visual and physiological modalities. It consists of an EEG preprocessing module, a facial feature extraction and balancing module, and a cross-modal fusion module. To enhance visual feature extraction, we fine-tune a pre-trained Swin Transformer on emotion-related datasets. Additionally, a cross-attention mechanism is introduced to balance token representation across modalities, ensuring effective feature integration. A key innovation of this work is the adoption of a multiple instance learning (MIL) approach, which extracts meaningful information from multiple facial expression images over time, capturing critical temporal dynamics often overlooked in previous studies. Extensive experiments conducted on the DEAP dataset demonstrate the superiority of the proposed framework, achieving a classification accuracy of 96.72% in the four-class emotion recognition task. Ablation studies further validate the contributions of each module, highlighting the significance of advanced feature extraction and fusion strategies in enhancing emotion recognition performance. Our code are available at https://github.com/liangyubuaa/Milmer.
中文摘要:本研究提出名为Milmer的新型多模态框架,通过融合面部表情与脑电信号,采用基于Transformer的融合方法和多示例学习,在DEAP数据集上实现了96.72%的情感识别准确率。
English Summary: This study introduces Milmer, a novel multimodal framework that integrates facial expressions and EEG signals using transformer-based fusion and multiple instance learning to achieve 96.72% emotion recognition accuracy on the DEAP dataset.

Authors:Xie Zhang, Chenxiao Li, Chenshu Wu
Title: TAPOR: 3D Hand Pose Reconstruction with Fully Passive Thermal Sensing for Around-Device Interactions
Abstract:
This paper presents the design and implementation of TAPOR, a privacy-preserving, non-contact, and fully passive sensing system for accurate and robust 3D hand pose reconstruction for around-device interaction using a single low-cost thermal array sensor. Thermal sensing using inexpensive and miniature thermal arrays emerges with an excellent utility-privacy balance, offering an imaging resolution significantly lower than cameras but far superior to RF signals like radar or WiFi. The design of TAPOR, however, is challenging, mainly because the captured temperature maps are low-resolution and textureless. To overcome the challenges, we investigate thermo-depth and thermo-pose properties, proposing a novel physics-inspired neural network that learns effective 3D spatial representations of potential hand poses. We then formulate the 3D pose reconstruction problem as a distinct retrieval task, enabling accurate hand pose determination from the input temperature map. To deploy TAPOR on IoT devices, we introduce an effective heterogeneous knowledge distillation method, reducing computation by 377x. TAPOR is fully implemented and tested in real-world scenarios, showing remarkable performance, supported by four gesture control and finger tracking case studies. We envision TAPOR to be a ubiquitous interface for around-device control and have open-sourced it at https://github.com/aiot-lab/TAPOR.
中文: 本文介绍了TAPOR系统,这是一种基于低成本热阵列传感器的隐私保护、非接触式全被动感知系统,通过创新的物理启发神经网络实现精确的3D手部姿态重建,并利用异构知识蒸馏方法在物联网设备上高效部署,支持设备周边的交互控制。
English: This paper introduces TAPOR, a privacy-focused, non-contact, and fully passive system that uses a single low-cost thermal array sensor for accurate 3D hand pose reconstruction, enabling around-device interaction through a novel physics-inspired neural network and efficient deployment on IoT devices.

Authors:Gaole He, Nilay Aishwarya, Ujwal Gadiraju
Title: Is Conversational XAI All You Need? Human-AI Decision Making With a Conversational XAI Assistant
Abstract:
Explainable artificial intelligence (XAI) methods are being proposed to help interpret and understand how AI systems reach specific predictions. Inspired by prior work on conversational user interfaces, we argue that augmenting existing XAI methods with conversational user interfaces can increase user engagement and boost user understanding of the AI system. In this paper, we explored the impact of a conversational XAI interface on users' understanding of the AI system, their trust, and reliance on the AI system. In comparison to an XAI dashboard, we found that the conversational XAI interface can bring about a better understanding of the AI system among users and higher user trust. However, users of both the XAI dashboard and conversational XAI interfaces showed clear overreliance on the AI system. Enhanced conversations powered by large language model (LLM) agents amplified over-reliance. Based on our findings, we reason that the potential cause of such overreliance is the illusion of explanatory depth that is concomitant with both XAI interfaces. Our findings have important implications for designing effective conversational XAI interfaces to facilitate appropriate reliance and improve human-AI collaboration. Code can be found at https://github.com/delftcrowd/IUI2025_ConvXAI
中文: 研究表明,相比仪表板,对话式可解释人工智能界面能提升用户对AI系统的理解和信任,但两种界面均导致用户过度依赖AI,且大语言模型增强的对话会加剧此现象,其根源在于解释深度的错觉。
English: This study demonstrates that conversational XAI interfaces improve user understanding and trust in AI systems compared to dashboards, but both interfaces lead to overreliance, which is amplified by LLM-enhanced conversations due to the illusion of explanatory depth.

Authors:Faria Huq, Zora Zhiruo Wang, Frank F. Xu, Tianyue Ou, Shuyan Zhou, Jeffrey P. Bigham, Graham Neubig
Title: CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation
Abstract:
While much work on web agents emphasizes the promise of autonomously performing tasks on behalf of users, in reality, agents often fall short on complex tasks in real-world contexts and modeling user preference. This presents an opportunity for humans to collaborate with the agent and leverage the agent's capabilities effectively. We propose CowPilot, a framework supporting autonomous as well as human-agent collaborative web navigation, and evaluation across task success and task efficiency. CowPilot reduces the number of steps humans need to perform by allowing agents to propose next steps, while users are able to pause, reject, or take alternative actions. During execution, users can interleave their actions with the agent by overriding suggestions or resuming agent control when needed. We conducted case studies on five common websites and found that the human-agent collaborative mode achieves the highest success rate of 95% while requiring humans to perform only 15.2% of the total steps. Even with human interventions during task execution, the agent successfully drives up to half of task success on its own. CowPilot can serve as a useful tool for data collection and agent evaluation across websites, which we believe will enable research in how users and agents can work together. Video demonstrations are available at https://oaishi.github.io/cowpilot.html

Authors:Zheng Lian, Haoyu Chen, Lan Chen, Haiyang Sun, Licai Sun, Yong Ren, Zebang Cheng, Bin Liu, Rui Liu, Xiaojiang Peng, Jiangyan Yi, Jianhua Tao
Title: AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models
Abstract:
The emergence of multimodal large language models (MLLMs) advances multimodal emotion recognition (MER) to the next level, from naive discriminative tasks to complex emotion understanding with advanced video understanding abilities and natural language description. However, the current community suffers from a lack of large-scale datasets with intensive, descriptive emotion annotations, as well as a multimodal-centric framework to maximize the potential of MLLMs for emotion understanding. To address this, we establish a new benchmark for MLLM-based emotion understanding with a novel dataset (MER-Caption) and a new model (AffectGPT). Utilizing our model-based crowd-sourcing data collection strategy, we construct the largest descriptive emotion dataset to date (by far), featuring over 2K fine-grained emotion categories across 115K samples. We also introduce the AffectGPT model, designed with pre-fusion operations to enhance multimodal integration. Finally, we present MER-UniBench, a unified benchmark with evaluation metrics tailored for typical MER tasks and the free-form, natural language output style of MLLMs. Extensive experimental results show AffectGPT's robust performance across various MER tasks. We have released both the code and the dataset to advance research and development in emotion understanding: https://github.com/zeroQiaoba/AffectGPT.
中文: 作者提出了AffectGPT模型和新数据集,以解决多模态情感识别中缺乏描述性标注和专用框架的问题,并在多种任务中展现出优异性能。
English: The authors introduce AffectGPT and a new dataset to advance multimodal emotion recognition by addressing the lack of descriptive annotations and specialized frameworks, achieving strong performance across various tasks.

Authors:Rong Ye, Yongxin Zhang, Yikai Zhang, Haoyu Kuang, Zhongyu Wei, Peng Sun
Title: Multi-agent KTO: Reinforcing Strategic Interactions of Large Language Model in Language Game
Abstract:
Achieving Artificial General Intelligence (AGI) requires AI agents that can not only make stratigic decisions but also engage in flexible and meaningful communication. Inspired by Wittgenstein's language game theory in Philosophical Investigations, we propose that language agents can learn through in-context interaction rather than traditional multi-stage frameworks that separate decision-making from language expression. Using Werewolf, a social deduction game that tests language understanding, strategic interaction, and adaptability, we develop the Multi-agent Kahneman & Tversky's Optimization (MaKTO). MaKTO engages diverse models in extensive gameplay to generate unpaired desirable and unacceptable responses, then employs KTO to refine the model's decision-making process. In 9-player Werewolf games, MaKTO achieves a 61% average win rate across various models, outperforming GPT-4o and two-stage RL agents by relative improvements of 23.0% and 10.9%, respectively. Notably, MaKTO also demonstrates human-like performance, winning 60% against expert players and showing only 49% detectability in Turing-style blind tests.

Authors:Riqiang Gao, Mamadou Diallo, Han Liu, Anthony Magliari, Jonathan Sackett, Wilko Verbakel, Sandra Meyers, Rafe Mcbeth, Masoud Zarepisheh, Simon Arberet, Martin Kraus, Florin C. Ghesu, Ali Kamen
Title: Automating High Quality RT Planning at Scale
Abstract:
Radiotherapy (RT) planning is complex, subjective, and time-intensive. Advances with artificial intelligence (AI) promise to improve its precision and efficiency, but progress is often limited by the scarcity of large, standardized datasets. To address this, we introduce the Automated Iterative RT Planning (AIRTP) system, a scalable solution for generating high-quality treatment plans. This scalable solution is designed to generate substantial volumes of consistently high-quality treatment plans, overcoming a key obstacle in the advancement of AI-driven RT planning. Our AIRTP pipeline adheres to clinical guidelines and automates essential steps, including organ-at-risk (OAR) contouring, helper structure creation, beam setup, optimization, and plan quality improvement, using AI integrated with RT planning software like Varian Eclipse. Furthermore, a novel approach for determining optimization parameters to reproduce 3D dose distributions, i.e. a method to convert dose predictions to deliverable treatment plans constrained by machine limitations is proposed. A comparative analysis of plan quality reveals that our automated pipeline produces treatment plans of quality comparable to those generated manually, which traditionally require several hours of labor per plan. Committed to public research, the first data release of our AIRTP pipeline includes nine cohorts covering head-and-neck and lung cancer sites to support an AAPM 2025 challenge. To our best knowledge, this dataset features more than 10 times number of plans compared to the largest existing well-curated public dataset. Repo: https://github.com/RiqiangGao/GDP-HMM_AAPMChallenge.
中文: AIRTP系统利用人工智能自动化放疗规划,通过可扩展的流程生成高质量且临床可比的治疗方案,同时发布大规模公共数据集以解决数据稀缺问题,推动研究发展。
English: The AIRTP system automates radiotherapy planning using AI to generate high-quality, clinically comparable treatment plans efficiently, addressing data scarcity with a scalable pipeline and releasing a large public dataset for research.

Authors:Wanqi Yin, Zhongang Cai, Ruisi Wang, Ailing Zeng, Chen Wei, Qingping Sun, Haiyi Mei, Yanjun Wang, Hui En Pang, Mingyuan Zhang, Lei Zhang, Chen Change Loy, Atsushi Yamashita, Lei Yang, Ziwei Liu
Title: SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation
Abstract:
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications. Despite encouraging progress, current state-of-the-art methods focus on training innovative architectural designs on confined datasets. In this work, we investigate the impact of scaling up EHPS towards a family of generalist foundation models. 1) For data scaling, we perform a systematic investigation on 40 EHPS datasets, encompassing a wide range of scenarios that a model trained on any single dataset cannot handle. More importantly, capitalizing on insights obtained from the extensive benchmarking process, we optimize our training scheme and select datasets that lead to a significant leap in EHPS capabilities. Ultimately, we achieve diminishing returns at 10M training instances from diverse data sources. 2) For model scaling, we take advantage of vision transformers (up to ViT-Huge as the backbone) to study the scaling law of model sizes in EHPS. To exclude the influence of algorithmic design, we base our experiments on two minimalist architectures: SMPLer-X, which consists of an intermediate step for hand and face localization, and SMPLest-X, an even simpler version that reduces the network to its bare essentials and highlights significant advances in the capture of articulated hands. With big data and the large model, the foundation models exhibit strong performance across diverse test benchmarks and excellent transferability to even unseen environments. Moreover, our finetuning strategy turns the generalist into specialist models, allowing them to achieve further performance boosts. Notably, our foundation models consistently deliver state-of-the-art results on seven benchmarks such as AGORA, UBody, EgoBody, and our proposed SynHand dataset for comprehensive hand evaluation. (Code is available at: https://github.com/wqyin/SMPLest-X).
中文摘要:本研究通过整合40个数据集扩展训练数据并采用视觉变换器进行模型升级,建立了表达性人体姿态与形状估计的基础模型,在多个基准测试中实现最优性能且展现出卓越的迁移能力。
English Summary: This research scales up expressive human pose and shape estimation through data expansion across 40 datasets and model scaling using vision transformers, establishing foundation models that achieve state-of-the-art performance across multiple benchmarks while demonstrating strong transferability.

Authors:Zekun Xi, Wenbiao Yin, Jizhan Fang, Jialong Wu, Runnan Fang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang
Title: OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
Abstract:
Machine writing with large language models often relies on retrieval-augmented generation. However, these approaches remain confined within the boundaries of the model's predefined scope, limiting the generation of content with rich information. Specifically, vanilla-retrieved information tends to lack depth, novelty, and suffers from redundancy, which negatively impacts the quality of generated articles, leading to shallow, unoriginal, and repetitive outputs. To address these issues, we propose OmniThink, a slow-thinking machine writing framework that emulates the human-like process of iterative expansion and reflection. The core idea behind OmniThink is to simulate the cognitive behavior of learners as they slowly deepen their knowledge of the topics. Experimental results demonstrate that OmniThink improves the knowledge density of generated articles without compromising metrics such as coherence and depth. Human evaluations and expert feedback further highlight the potential of OmniThink to address real-world challenges in the generation of long-form articles. Code is available at https://github.com/zjunlp/OmniThink.
中文摘要:OmniThink是一种慢思考机器写作框架,通过模拟人类迭代学习过程来克服检索增强生成的局限性,能在保持连贯性和深度的同时提高生成文章的知识密度。
English Summary: OmniThink is a slow-thinking machine writing framework designed to overcome the limitations of retrieval-augmented generation by mimicking human iterative learning, resulting in articles with higher knowledge density while maintaining coherence and depth.

Authors:Fen Wang, Bomiao Wang, Xueli Shu, Zhen Liu, Zekai Shao, Chao Liu, Siming Chen
Title: ChartInsighter: An Approach for Mitigating Hallucination in Time-series Chart Summary Generation with A Benchmark Dataset
Abstract:
Effective chart summary can significantly reduce the time and effort decision makers spend interpreting charts, enabling precise and efficient communication of data insights. Previous studies have faced challenges in generating accurate and semantically rich summaries of time-series data charts. In this paper, we identify summary elements and common hallucination types in the generation of time-series chart summaries, which serve as our guidelines for automatic generation. We introduce ChartInsighter, which automatically generates chart summaries of time-series data, effectively reducing hallucinations in chart summary generation. Specifically, we assign multiple agents to generate the initial chart summary and collaborate iteratively, during which they invoke external data analysis modules to extract insights and compile them into a coherent summary. Additionally, we implement a self-consistency test method to validate and correct our summary. We create a high-quality benchmark of charts and summaries, with hallucination types annotated on a sentence-by-sentence basis, facilitating the evaluation of the effectiveness of reducing hallucinations. Our evaluations using our benchmark show that our method surpasses state-of-the-art models, and that our summary hallucination rate is the lowest, which effectively reduces various hallucinations and improves summary quality. The benchmark is available at https://github.com/wangfen01/ChartInsighter.
中文摘要:ChartInsighter通过多智能体协作与自一致性检验方法,有效减少时序图表摘要中的幻觉现象,在新建基准测试中表现优于现有最优模型。
English Summary: ChartInsighter is an automated system that reduces hallucinations in time-series chart summaries through multi-agent collaboration and self-consistency testing, achieving state-of-the-art performance on a newly created benchmark.

Authors:Yiran Tao, Jehan Yang, Dan Ding, Zackory Erickson
Title: LAMS: LLM-Driven Automatic Mode Switching for Assistive Teleoperation
Abstract:
Teleoperating high degrees-of-freedom (DoF) robotic manipulators via low-DoF controllers like joysticks often requires frequent switching between control modes, where each mode maps controller movements to specific robot actions. Manually performing this frequent switching can make teleoperation cumbersome and inefficient. On the other hand, existing automatic mode-switching solutions, such as heuristic-based or learning-based methods, are often task-specific and lack generalizability. In this paper, we introduce LLM-Driven Automatic Mode Switching (LAMS), a novel approach that leverages Large Language Models (LLMs) to automatically switch control modes based on task context. Unlike existing methods, LAMS requires no prior task demonstrations and incrementally improves by integrating user-generated mode-switching examples. We validate LAMS through an ablation study and a user study with 10 participants on complex, long-horizon tasks, demonstrating that LAMS effectively reduces manual mode switches, is preferred over alternative methods, and improves performance over time. The project website with supplementary materials is at https://lams-assistance.github.io/.

Authors:Yin Fang, Xinle Deng, Kangwei Liu, Ningyu Zhang, Jingyang Qian, Penghui Yang, Xiaohui Fan, Huajun Chen
Title: A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following
Abstract:
Large language models excel at interpreting complex natural language instructions, enabling them to perform a wide range of tasks. In the life sciences, single-cell RNA sequencing (scRNA-seq) data serves as the "language of cellular biology", capturing intricate gene expression patterns at the single-cell level. However, interacting with this "language" through conventional tools is often inefficient and unintuitive, posing challenges for researchers. To address these limitations, we present InstructCell, a multi-modal AI copilot that leverages natural language as a medium for more direct and flexible single-cell analysis. We construct a comprehensive multi-modal instruction dataset that pairs text-based instructions with scRNA-seq profiles from diverse tissues and species. Building on this, we develop a multi-modal cell language architecture capable of simultaneously interpreting and processing both modalities. InstructCell empowers researchers to accomplish critical tasks-such as cell type annotation, conditional pseudo-cell generation, and drug sensitivity prediction-using straightforward natural language commands. Extensive evaluations demonstrate that InstructCell consistently meets or exceeds the performance of existing single-cell foundation models, while adapting to diverse experimental conditions. More importantly, InstructCell provides an accessible and intuitive tool for exploring complex single-cell data, lowering technical barriers and enabling deeper biological insights.
中文: InstructCell作为多模态AI助手,通过自然语言实现对单细胞RNA测序数据的直观灵活分析,在细胞注释和药物预测等任务中优于现有模型,同时显著降低了复杂生物数据的技术门槛。
English: InstructCell is a multimodal AI copilot that uses natural language to enable intuitive and flexible analysis of single-cell RNA sequencing data, outperforming existing models in tasks like cell annotation and drug prediction while making complex biological data more accessible.

Authors:Yan Zhang, Haoqi Li, Ramtin Tabatabaei, Wafa Johal
Title: ROSAnnotator: A Web Application for ROSBag Data Analysis in Human-Robot Interaction
Abstract:
Human-robot interaction (HRI) is an interdisciplinary field that utilises both quantitative and qualitative methods. While ROSBags, a file format within the Robot Operating System (ROS), offer an efficient means of collecting temporally synched multimodal data in empirical studies with real robots, there is a lack of tools specifically designed to integrate qualitative coding and analysis functions with ROSBags. To address this gap, we developed ROSAnnotator, a web-based application that incorporates a multimodal Large Language Model (LLM) to support both manual and automated annotation of ROSBag data. ROSAnnotator currently facilitates video, audio, and transcription annotations and provides an open interface for custom ROS messages and tools. By using ROSAnnotator, researchers can streamline the qualitative analysis process, create a more cohesive analysis pipeline, and quickly access statistical summaries of annotations, thereby enhancing the overall efficiency of HRI data analysis. https://github.com/CHRI-Lab/ROSAnnotator
中文摘要:ROSBags虽能高效收集多模态数据,但缺乏定性分析工具,为此我们开发了基于多模态大语言模型的网络应用ROSAnnotator,支持视频、音频和文本标注,可显著提升人机交互数据分析效率。
English Summary: ROSAnnotator is a web-based tool that integrates multimodal LLM capabilities with ROSBags to enable both manual and automated annotation of HRI data, streamlining qualitative analysis and enhancing research efficiency.

Authors:Qian Chen, Yafeng Chen, Yanni Chen, Mengzhe Chen, Yingda Chen, Chong Deng, Zhihao Du, Ruize Gao, Changfeng Gao, Zhifu Gao, Yabin Li, Xiang Lv, Jiaqing Liu, Haoneng Luo, Bin Ma, Chongjia Ni, Xian Shi, Jialong Tang, Hui Wang, Hao Wang, Wen Wang, Yuxuan Wang, Yunlan Xu, Fan Yu, Zhijie Yan, Yexin Yang, Baosong Yang, Xian Yang, Guanrou Yang, Tianyu Zhao, Qinglin Zhang, Shiliang Zhang, Nan Zhao, Pei Zhang, Chong Zhang, Jinren Zhou
Title: MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Abstract:
Recent advancements in large language models (LLMs) and multimodal speech-text models have laid the groundwork for seamless voice interactions, enabling real-time, natural, and human-like conversations. Previous models for voice interactions are categorized as native and aligned. Native models integrate speech and text processing in one framework but struggle with issues like differing sequence lengths and insufficient pre-training. Aligned models maintain text LLM capabilities but are often limited by small datasets and a narrow focus on speech tasks. In this work, we introduce MinMo, a Multimodal Large Language Model with approximately 8B parameters for seamless voice interaction. We address the main limitations of prior aligned multimodal models. We train MinMo through multiple stages of speech-to-text alignment, text-to-speech alignment, speech-to-speech alignment, and duplex interaction alignment, on 1.4 million hours of diverse speech data and a broad range of speech tasks. After the multi-stage training, MinMo achieves state-of-the-art performance across various benchmarks for voice comprehension and generation while maintaining the capabilities of text LLMs, and also facilitates full-duplex conversation, that is, simultaneous two-way communication between the user and the system. Moreover, we propose a novel and simple voice decoder that outperforms prior models in voice generation. The enhanced instruction-following capabilities of MinMo supports controlling speech generation based on user instructions, with various nuances including emotions, dialects, and speaking rates, and mimicking specific voices. For MinMo, the speech-to-text latency is approximately 100ms, full-duplex latency is approximately 600ms in theory and 800ms in practice. The MinMo project web page is https://funaudiollm.github.io/minmo, and the code and models will be released soon.
中文: 本研究提出的MinMo是一个80亿参数的多模态大语言模型,通过多阶段训练在大量语音数据上实现了语音理解与生成的最优性能,同时支持全双工对话并具备增强的语音控制功能。
English: This work introduces MinMo, an 8-billion-parameter multimodal large language model that achieves state-of-the-art performance in voice comprehension and generation through multi-stage training on extensive speech data, while enabling full-duplex conversations and enhanced voice control capabilities.

Authors:Yunlong Tang, Junjia Guo, Pinxin Liu, Zhiyuan Wang, Hang Hua, Jia-Xing Zhong, Yunzhong Xiao, Chao Huang, Luchuan Song, Susan Liang, Yizhi Song, Liu He, Jing Bi, Mingqian Feng, Xinyang Li, Zeliang Zhang, Chenliang Xu
Title: Generative AI for Cel-Animation: A Survey
Abstract:
Traditional Celluloid (Cel) Animation production pipeline encompasses multiple essential steps, including storyboarding, layout design, keyframe animation, inbetweening, and colorization, which demand substantial manual effort, technical expertise, and significant time investment. These challenges have historically impeded the efficiency and scalability of Cel-Animation production. The rise of generative artificial intelligence (GenAI), encompassing large language models, multimodal models, and diffusion models, offers innovative solutions by automating tasks such as inbetween frame generation, colorization, and storyboard creation. This survey explores how GenAI integration is revolutionizing traditional animation workflows by lowering technical barriers, broadening accessibility for a wider range of creators through tools like AniDoc, ToonCrafter, and AniSora, and enabling artists to focus more on creative expression and artistic innovation. Despite its potential, challenges like visual consistency, stylistic coherence, and ethical considerations persist. Additionally, this paper explores future directions and advancements in AI-assisted animation. For further exploration and resources, please visit our GitHub repository: https://github.com/yunlong10/Awesome-AI4Animation
中文摘要:生成式人工智能通过自动完成中间帧生成、上色等繁重工序,正在革新传统赛璐珞动画制作流程,不仅大幅提升效率、降低技术门槛,还让艺术家能更专注于创意表达,尽管在视觉一致性与伦理规范方面仍存在挑战。
English Summary: Generative AI is revolutionizing traditional Cel-Animation by automating labor-intensive processes like inbetweening and colorization, thereby enhancing efficiency and accessibility while allowing artists to focus on creativity, despite ongoing challenges with visual consistency and ethical concerns.

Authors:Taywon Min, Haeone Lee, Yongchan Kwon, Kimin Lee
Title: Understanding Impact of Human Feedback via Influence Functions
Abstract:
In Reinforcement Learning from Human Feedback (RLHF), it is crucial to learn suitable reward models from human feedback to align large language models (LLMs) with human intentions. However, human feedback can often be noisy, inconsistent, or biased, especially when evaluating complex responses. Such feedback can lead to misaligned reward signals, potentially causing unintended side effects during the RLHF process. To address these challenges, we explore the use of influence functions to measure the impact of human feedback on the performance of reward models. We propose a compute-efficient approximation method that enables the application of influence functions to LLM-based reward models and large-scale preference datasets. Our experiments showcase two key applications of influence functions: (1) detecting common labeler biases in human feedback datasets and (2) guiding labelers in refining their strategies to better align with expert feedback. By quantifying the impact of human feedback, we believe that influence functions can enhance feedback interpretability and contribute to scalable oversight in RLHF, helping labelers provide more accurate and consistent feedback. Source code is available at https://github.com/mintaywon/IF_RLHF
中文: 本研究引入影响函数来评估人类反馈在RLHF中对奖励模型的影响,能够高效检测偏差并指导标注者提升反馈的准确性和一致性。
English: This study introduces influence functions to assess the impact of human feedback on reward models in RLHF, enabling efficient detection of biases and guidance for labelers to improve feedback accuracy and consistency.

Authors:Yuhang Liu, Pengxiang Li, Zishu Wei, Congkai Xie, Xueyu Hu, Xinchen Xu, Shengyu Zhang, Xiaotian Han, Hongxia Yang, Fei Wu
Title: InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
Abstract:
Graphical User Interface (GUI) Agents, powered by multimodal large language models (MLLMs), have shown great potential for task automation on computing devices such as computers and mobile phones. However, existing agents face challenges in multi-step reasoning and reliance on textual annotations, limiting their effectiveness. We introduce \textit{InfiGUIAgent}, an MLLM-based GUI Agent trained with a two-stage supervised fine-tuning pipeline. Stage 1 enhances fundamental skills such as GUI understanding and grounding, while Stage 2 integrates hierarchical reasoning and expectation-reflection reasoning skills using synthesized data to enable native reasoning abilities of the agents. \textit{InfiGUIAgent} achieves competitive performance on several GUI benchmarks, highlighting the impact of native reasoning skills in enhancing GUI interaction for automation tasks. Resources are available at \url{https://github.com/Reallm-Labs/InfiGUIAgent}.
中文: InfiGUIAgent是一种基于多模态大语言模型的图形界面代理,通过两阶段微调训练具备原生推理能力,在多个基准测试中表现出色,提升了自动化任务的交互效果。
English: InfiGUIAgent, an MLLM-based GUI agent trained with a two-stage fine-tuning pipeline, enhances GUI interaction through native reasoning skills and achieves competitive performance on benchmarks.

Authors:Xize Cheng, Dongjie Fu, Xiaoda Yang, Minghui Fang, Ruofan Hu, Jingyu Lu, Bai Jionghao, Zehan Wang, Shengpeng Ji, Rongjie Huang, Linjun Li, Yu Chen, Tao Jin, Zhou Zhao
Title: OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Abstract:
With the rapid development of large language models, researchers have created increasingly advanced spoken dialogue systems that can naturally converse with humans. However, these systems still struggle to handle the full complexity of real-world conversations, including audio events, musical contexts, and emotional expressions, mainly because current dialogue datasets are constrained in both scale and scenario diversity. In this paper, we propose leveraging synthetic data to enhance the dialogue models across diverse scenarios. We introduce ShareChatX, the first comprehensive, large-scale dataset for spoken dialogue that spans diverse scenarios. Based on this dataset, we introduce OmniChat, a multi-turn dialogue system with a heterogeneous feature fusion module, designed to optimize feature selection in different dialogue contexts. In addition, we explored critical aspects of training dialogue systems using synthetic data. Through comprehensive experimentation, we determined the ideal balance between synthetic and real data, achieving state-of-the-art results on the real-world dialogue dataset DailyTalk. We also highlight the crucial importance of synthetic data in tackling diverse, complex dialogue scenarios, especially those involving audio and music. For more details, please visit our demo page at \url{https://sharechatx.github.io/}.
中文: 本文提出了首个大规模合成口语对话数据集ShareChatX和多轮对话系统OmniChat,通过优化合成与真实数据的配比,在包含音频和音乐等复杂场景的对话任务中取得了最优性能。
English: This paper introduces ShareChatX, a large-scale synthetic spoken dialogue dataset, and OmniChat, a multi-turn dialogue system that achieves state-of-the-art performance by optimally integrating synthetic and real data to handle complex scenarios like audio events and emotional expressions.

Authors:Yitong Zhu, Zhuowen Liang, Yiming Wu, Tangyao Li, Yuyang Wang
Title: Towards Consumer-Grade Cybersickness Prediction: Multi-Model Alignment for Real-Time Vision-Only Inference
Abstract:
Cybersickness remains a major obstacle to the widespread adoption of immersive virtual reality (VR), particularly in consumer-grade environments. While prior methods rely on invasive signals such as electroencephalography (EEG) for high predictive accuracy, these approaches require specialized hardware and are impractical for real-world applications. In this work, we propose a scalable, deployable framework for personalized cybersickness prediction leveraging only non-invasive signals readily available from commercial VR headsets, including head motion, eye tracking, and physiological responses. Our model employs a modality-specific graph neural network enhanced with a Difference Attention Module to extract temporal-spatial embeddings capturing dynamic changes across modalities. A cross-modal alignment module jointly trains the video encoder to learn personalized traits by aligning video features with sensor-derived representations. Consequently, the model accurately predicts individual cybersickness using only video input during inference. Experimental results show our model achieves 88.4\% accuracy, closely matching EEG-based approaches (89.16\%), while reducing deployment complexity. With an average inference latency of 90ms, our framework supports real-time applications, ideal for integration into consumer-grade VR platforms without compromising personalization or performance. The code will be relesed at https://github.com/U235-Aurora/PTGNN.
中文摘要:本研究提出一种利用商用VR头显非侵入式信号的个性化晕动症预测框架,通过新型图神经网络设计实现接近脑电图方法的准确率,并具备实时应用能力。
English Summary: This study introduces a scalable framework for predicting cybersickness in VR using non-invasive signals from commercial headsets, achieving near-EEG accuracy with real-time performance through a novel graph neural network design.

Authors:Ahmed Heakl, Sara Ghaboura, Omkar Thawkar, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan
Title: AIN: The Arabic INclusive Large Multimodal Model
Abstract:
Amid the swift progress of large language models (LLMs) and their evolution into large multimodal models (LMMs), significant strides have been made in high-resource languages such as English and Chinese. While Arabic LLMs have seen notable progress, Arabic LMMs remain largely unexplored, often narrowly focusing on a few specific aspects of the language and visual understanding. To bridge this gap, we introduce AIN-the Arabic Inclusive Multimodal Model-designed to excel across diverse domains. AIN is an English-Arabic bilingual LMM designed to excel in English and Arabic, leveraging carefully constructed 3.6 million high-quality Arabic-English multimodal data samples. AIN demonstrates state-of-the-art Arabic performance, while also possessing strong English-language visual capabilities. On the recent CAMEL-Bench benchmark comprising 38 sub-domains including, multi-image understanding, complex visual perception, handwritten document understanding, video understanding, medical imaging, plant diseases, and remote sensing-based land use understanding, our AIN demonstrates strong performance with the 7B model outperforming GPT-4o by an absolute gain of 3.4% averaged over eight domains and 38 sub-domains. AIN's superior capabilities position it as a significant step toward empowering Arabic speakers with advanced multimodal generative AI tools across diverse applications.
中文: AIN作为阿拉伯语-英语双语多模态模型,在38个领域实现顶尖性能,以3.4%优势超越GPT-4o,有效填补了阿拉伯语多模态人工智能的研究空白。
English: The AIN model is a bilingual Arabic-English multimodal system that achieves state-of-the-art performance across 38 domains, outperforming GPT-4o by 3.4% while addressing the underdevelopment of Arabic multimodal AI.

Authors:Jingxiao Chen, Xinyao Li, Jiahang Cao, Zhengbang Zhu, Wentao Dong, Minghuan Liu, Ying Wen, Yong Yu, Liqing Zhang, Weinan Zhang
Title: RHINO: Learning Real-Time Humanoid-Human-Object Interaction from Human Demonstrations
Abstract:
Humanoid robots have shown success in locomotion and manipulation. Despite these basic abilities, humanoids are still required to quickly understand human instructions and react based on human interaction signals to become valuable assistants in human daily life. Unfortunately, most existing works only focus on multi-stage interactions, treating each task separately, and neglecting real-time feedback. In this work, we aim to empower humanoid robots with real-time reaction abilities to achieve various tasks, allowing human to interrupt robots at any time, and making robots respond to humans immediately. To support such abilities, we propose a general humanoid-human-object interaction framework, named RHINO, i.e., Real-time Humanoid-human Interaction and Object manipulation. RHINO provides a unified view of reactive motion, instruction-based manipulation, and safety concerns, over multiple human signal modalities, such as languages, images, and motions. RHINO is a hierarchical learning framework, enabling humanoids to learn reaction skills from human-human-object demonstrations and teleoperation data. In particular, it decouples the interaction process into two levels: 1) a high-level planner inferring human intentions from real-time human behaviors; and 2) a low-level controller achieving reactive motion behaviors and object manipulation skills based on the predicted intentions. We evaluate the proposed framework on a real humanoid robot and demonstrate its effectiveness, flexibility, and safety in various scenarios.
中文摘要:本研究提出RHINO分层框架,通过解析多模态人类信号并将交互过程解耦为高层意图推断与底层运动控制,使人形机器人能够执行实时反应任务。
English Summary: This work introduces RHINO, a hierarchical framework enabling humanoid robots to perform real-time reactive tasks by interpreting multimodal human signals and decoupling interactions into high-level intention inference and low-level motion control.

Authors:Chaoyun Zhang, Shilin He, Liqun Li, Si Qin, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
Title: API Agents vs. GUI Agents: Divergence and Convergence
Abstract:
Large language models (LLMs) have evolved beyond simple text generation to power software agents that directly translate natural language commands into tangible actions. While API-based LLM agents initially rose to prominence for their robust automation capabilities and seamless integration with programmatic endpoints, recent progress in multimodal LLM research has enabled GUI-based LLM agents that interact with graphical user interfaces in a human-like manner. Although these two paradigms share the goal of enabling LLM-driven task automation, they diverge significantly in architectural complexity, development workflows, and user interaction models. This paper presents the first comprehensive comparative study of API-based and GUI-based LLM agents, systematically analyzing their divergence and potential convergence. We examine key dimensions and highlight scenarios in which hybrid approaches can harness their complementary strengths. By proposing clear decision criteria and illustrating practical use cases, we aim to guide practitioners and researchers in selecting, combining, or transitioning between these paradigms. Ultimately, we indicate that continuing innovations in LLM-based automation are poised to blur the lines between API- and GUI-driven agents, paving the way for more flexible, adaptive solutions in a wide range of real-world applications.
中文摘要:本文首次对基于API和基于GUI的LLM智能体进行对比研究,分析其差异与融合潜力,并提出结合两者优势的混合方案以指导实际应用。
English Summary: This paper conducts the first comparative analysis of API-based and GUI-based LLM agents, examining their differences and potential integration while proposing hybrid approaches for practical applications.

Authors:Wenhao You, Bryan Hooi, Yiwei Wang, Euijin Choo, Ming-Hsuan Yang, Junsong Yuan, Zi Huang, Yujun Cai
Title: Lost in Edits? A $λ$-Compass for AIGC Provenance
Abstract:
Recent advancements in diffusion models have driven the growth of text-guided image editing tools, enabling precise and iterative modifications of synthesized content. However, as these tools become increasingly accessible, they also introduce significant risks of misuse, emphasizing the critical need for robust attribution methods to ensure content authenticity and traceability. Despite the creative potential of such tools, they pose significant challenges for attribution, particularly in adversarial settings where edits can be layered to obscure an image's origins. We propose LambdaTracer, a novel latent-space attribution method that robustly identifies and differentiates authentic outputs from manipulated ones without requiring any modifications to generative or editing pipelines. By adaptively calibrating reconstruction losses, LambdaTracer remains effective across diverse iterative editing processes, whether automated through text-guided editing tools such as InstructPix2Pix and ControlNet or performed manually with editing software such as Adobe Photoshop. Extensive experiments reveal that our method consistently outperforms baseline approaches in distinguishing maliciously edited images, providing a practical solution to safeguard ownership, creativity, and credibility in the open, fast-evolving AI ecosystems.
中文摘要:LambdaTracer是一种新型潜在空间溯源方法,无需修改生成流程即可有效识别被篡改图像,在各类编辑过程中检测恶意修改的性能均优于基线方法。
English Summary: LambdaTracer is a novel latent-space attribution method that effectively identifies manipulated images without altering generative pipelines, outperforming baselines in detecting malicious edits across diverse editing processes.

Authors:Menglin Zhao, Zhuorui Yong, Ruijia Guan, Kai-Wei Chang, Adrian Haimovich, Kei Ouchi, Timothy Bickmore, Bingsheng Yao, Dakuo Wang, Smit Desai
Title: Designing AI Tools for Clinical Care Teams to Support Serious Illness Conversations with Older Adults in the Emergency Department
Abstract:
Serious illness conversations (SICs), discussions between clinical care teams and patients with serious, life-limiting illnesses about their values, goals, and care preferences, are critical for patient-centered care. Without these conversations, patients often receive aggressive interventions that may not align with their goals. Clinical care teams face significant barriers when conducting serious illness conversations with older adult patients in Emergency Department (ED) settings, where most older adult patients lack documented treatment goals. To understand current practices and identify AI support opportunities, we conducted interviews with two domain experts and nine ED clinical care team members. Through thematic analysis, we characterized a four-phase serious illness conversation workflow (identification, preparation, conduction, documentation) and identified key needs and challenges at each stage. Clinical care teams struggle with fragmented EHR data access, time constraints, emotional preparation demands, and documentation burdens. While participants expressed interest in AI tools for information synthesis, conversational support, and automated documentation, they emphasized preserving human connection and clinical autonomy. We present design guidelines for AI tools supporting SIC workflows that fit within existing clinical practices. This work contributes empirical understanding of ED-based serious illness conversations and provides design considerations for AI in high-stakes clinical environments.
中文摘要:本研究探讨了在急诊科开展重病对话的挑战及人工智能工具的辅助机遇,强调在解决数据碎片化和文书负担等流程障碍的同时,必须保持人文关怀与临床自主性。
English Summary: This study explores the challenges and opportunities for AI tools in supporting serious illness conversations in emergency departments, emphasizing the need to maintain human connection while addressing workflow barriers like fragmented data and documentation burdens.

Authors:Dakuo Wang, Ting-Yao Hsu, Yuxuan Lu, Hansu Gu, Limeng Cui, Yaochen Xie, William Headean, Bingsheng Yao, Akash Veeragouni, Jiapeng Liu, Sreyashi Nag, Jessie Wang
Title: AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents
Abstract:
A/B testing experiment is a widely adopted method for evaluating UI/UX design decisions in modern web applications. Yet, traditional A/B testing remains constrained by its dependence on the large-scale and live traffic of human participants, and the long time of waiting for the testing result. Through formative interviews with six experienced industry practitioners, we identified critical bottlenecks in current A/B testing workflows. In response, we present AgentA/B, a novel system that leverages Large Language Model-based autonomous agents (LLM Agents) to automatically simulate user interaction behaviors with real webpages. AgentA/B enables scalable deployment of LLM agents with diverse personas, each capable of navigating the dynamic webpage and interactively executing multi-step interactions like search, clicking, filtering, and purchasing. In a demonstrative controlled experiment, we employ AgentA/B to simulate a between-subject A/B testing with 1,000 LLM agents Amazon.com, and compare agent behaviors with real human shopping behaviors at a scale. Our findings suggest AgentA/B can emulate human-like behavior patterns.
中文: AgentA/B是一种创新系统,利用基于大语言模型的自主代理模拟用户在网页上的交互行为,通过可扩展的自动化测试克服了传统A/B测试依赖真人流量的局限性。
English: AgentA/B is an innovative system that uses LLM agents to simulate user interactions on webpages, addressing the limitations of traditional A/B testing by enabling scalable, automated behavior simulation without relying on live human traffic.

Authors:Yuxuan Lu, Bingsheng Yao, Hansu Gu, Jing Huang, Jessie Wang, Yang Li, Jiri Gesi, Qi He, Toby Jia-Jun Li, Dakuo Wang
Title: UXAgent: A System for Simulating Usability Testing of Web Design with LLM Agents
Abstract:
Usability testing is a fundamental research method that user experience (UX) researchers use to evaluate and iterate their new designs. But what about evaluating and iterating the usability testing study design itself? Recent advances in Large Language Model-simulated Agent (LLM Agent) research inspired us to design UXAgent to support UX researchers in evaluating and iterating their study design before they conduct the real human-subject study. Our system features a Persona Generator module, an LLM Agent module, and a Universal Browser Connector module to automatically generate thousands of simulated users and to interactively test the target website. The system also provides a Result Viewer Interface so that the UX researchers can easily review and analyze the generated qualitative (e.g., agents' post-study surveys) and quantitative data (e.g., agents' interaction logs), or even interview agents directly. Through a heuristic evaluation with 16 UX researchers, participants praised the innovation of our system but also expressed concerns about the future of LLM Agent usage in UX studies.
Chinese: UXAgent是一个创新系统,它利用大语言模型模拟代理帮助用户体验研究人员在开展真实用户研究前,通过生成模拟用户并提供交互测试与数据分析工具来评估和优化可用性测试研究设计。
English: UXAgent is a novel system that utilizes Large Language Model-simulated Agents to help UX researchers evaluate and iterate usability testing study designs by generating simulated users and providing interactive testing and data analysis tools before conducting real human studies.

Authors:Nan Gao, Yihua Bao, Dongdong Weng, Jiayi Zhao, Jia Li, Yan Zhou, Pengfei Wan, Di Zhang
Title: SARGes: Semantically Aligned Reliable Gesture Generation via Intent Chain
Abstract:
Co-speech gesture generation enhances human-computer interaction realism through speech-synchronized gesture synthesis. However, generating semantically meaningful gestures remains a challenging problem. We propose SARGes, a novel framework that leverages large language models (LLMs) to parse speech content and generate reliable semantic gesture labels, which subsequently guide the synthesis of meaningful co-speech gestures.First, we constructed a comprehensive co-speech gesture ethogram and developed an LLM-based intent chain reasoning mechanism that systematically parses and decomposes gesture semantics into structured inference steps following ethogram criteria, effectively guiding LLMs to generate context-aware gesture labels. Subsequently, we constructed an intent chain-annotated text-to-gesture label dataset and trained a lightweight gesture label generation model, which then guides the generation of credible and semantically coherent co-speech gestures. Experimental results demonstrate that SARGes achieves highly semantically-aligned gesture labeling (50.2% accuracy) with efficient single-pass inference (0.4 seconds). The proposed method provides an interpretable intent reasoning pathway for semantic gesture synthesis.
Chinese: SARGes框架利用大型语言模型解析语音并生成语义手势标签,从而高效合成具有上下文感知的、意义丰富的手势,实现了高准确性和快速推理。
English: The SARGes framework utilizes large language models to parse speech and generate semantic gesture labels, enabling the synthesis of meaningful and context-aware co-speech gestures with high accuracy and efficiency.

Authors:Jiaju Chen, Minglong Tang, Yuxuan Lu, Bingsheng Yao, Elissa Fan, Xiaojuan Ma, Ying Xu, Dakuo Wang, Yuling Sun, Liang He
Title: Characterizing LLM-Empowered Personalized Story-Reading and Interaction for Children: Insights from Multi-Stakeholder Perspectives
Abstract:
Personalized interaction is highly valued by parents in their story-reading activities with children. While AI-empowered story-reading tools have been increasingly used, their abilities to support personalized interaction with children are still limited. Recent advances in large language models (LLMs) show promise in facilitating personalized interactions, but little is known about how to effectively and appropriately use LLMs to enhance children's personalized story-reading experiences. This work explores this question through a design-based study. Drawing on a formative study, we designed and developed StoryMate, an LLM-empowered personalized interactive story-reading tool for children, following an empirical study with children, parents, and education experts. Our participants valued the personalized features in StoryMate, and also highlighted the need to support personalized content, guiding mechanisms, reading context variations, and interactive interfaces. Based on these findings, we propose a series of design recommendations for better using LLMs to empower children's personalized story reading and interaction.
中文: 本研究探讨如何有效利用大语言模型提升儿童个性化故事阅读互动,开发了StoryMate原型,并针对内容定制、引导机制、阅读情境和交互界面提出了关键设计建议。
English: This study explores the effective use of large language models (LLMs) to enhance personalized story-reading interactions for children, developing StoryMate as a prototype and identifying key design recommendations for content, guidance, context, and interface customization.

Authors:Borui Liao, Yulong Xu, Jiao Ou, Kaiyuan Yang, Weihua Jian, Pengfei Wan, Di Zhang
Title: FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems
Abstract:
Full-Duplex Speech Dialogue Systems (Full-Duplex SDS) have significantly enhanced the naturalness of human-machine interaction by enabling real-time bidirectional communication. However, existing approaches face challenges such as difficulties in independent module optimization and contextual noise interference due to highly coupled architectural designs and oversimplified binary state modeling. This paper proposes FlexDuo, a flexible full-duplex control module that decouples duplex control from spoken dialogue systems through a plug-and-play architectural design. Furthermore, inspired by human information-filtering mechanisms in conversations, we introduce an explicit Idle state. On one hand, the Idle state filters redundant noise and irrelevant audio to enhance dialogue quality. On the other hand, it establishes a semantic integrity-based buffering mechanism, reducing the risk of mutual interruptions while ensuring accurate response transitions. Experimental results on the Fisher corpus demonstrate that FlexDuo reduces the false interruption rate by 24.9% and improves response accuracy by 7.6% compared to integrated full-duplex dialogue system baselines. It also outperforms voice activity detection (VAD) controlled baseline systems in both Chinese and English dialogue quality. The proposed modular architecture and state-based dialogue model provide a novel technical pathway for building flexible and efficient duplex dialogue systems.
中文摘要:FlexDuo通过即插即用架构和显式空闲状态设计,在过滤噪音的同时建立语义缓冲机制,将误中断率降低24.9%,响应准确率提升7.6%,为全双工对话系统提供了灵活高效的解决方案。
English Summary: FlexDuo introduces a plug-and-play full-duplex control module with an explicit Idle state that reduces false interruptions by 24.9% and improves response accuracy by 7.6% by filtering noise and establishing semantic buffering.

Authors:Chaoran Chen, Bingsheng Yao, Ruishi Zou, Wenyue Hua, Weimin Lyu, Yanfang Ye, Toby Jia-Jun Li, Dakuo Wang
Title: Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents
Abstract:
Role-Playing Agent (RPA) is an increasingly popular type of LLM Agent that simulates human-like behaviors in a variety of tasks. However, evaluating RPAs is challenging due to diverse task requirements and agent designs. This paper proposes an evidence-based, actionable, and generalizable evaluation design guideline for LLM-based RPA by systematically reviewing 1,676 papers published between Jan. 2021 and Dec. 2024. Our analysis identifies six agent attributes, seven task attributes, and seven evaluation metrics from existing literature. Based on these findings, we present an RPA evaluation design guideline to help researchers develop more systematic and consistent evaluation methods.
Chinese: 本文通过分析1,676篇文献,提出了基于证据、可操作且可推广的大型语言模型角色扮演代理评估设计指南,识别出关键代理属性和任务属性及评估指标,以促进系统化和一致性的评估方法发展。
English: This paper introduces an evidence-based, actionable, and generalizable evaluation design guideline for LLM-based Role-Playing Agents (RPAs) by analyzing 1,676 papers, identifying key agent and task attributes along with evaluation metrics to promote systematic and consistent assessment methods.

Authors:Yuxuan Lu, Bingsheng Yao, Hansu Gu, Jing Huang, Jessie Wang, Yang Li, Jiri Gesi, Qi He, Toby Jia-Jun Li, Dakuo Wang
Title: UXAgent: An LLM Agent-Based Usability Testing Framework for Web Design
Abstract:
Usability testing is a fundamental yet challenging (e.g., inflexible to iterate the study design flaws and hard to recruit study participants) research method for user experience (UX) researchers to evaluate a web design. Recent advances in Large Language Model-simulated Agent (LLM-Agent) research inspired us to design UXAgent to support UX researchers in evaluating and reiterating their usability testing study design before they conduct the real human subject study. Our system features an LLM-Agent module and a universal browser connector module so that UX researchers can automatically generate thousands of simulated users to test the target website. The results are shown in qualitative (e.g., interviewing how an agent thinks ), quantitative (e.g., # of actions), and video recording formats for UX researchers to analyze. Through a heuristic user evaluation with five UX researchers, participants praised the innovation of our system but also expressed concerns about the future of LLM Agent-assisted UX study.
Chinese: UXAgent是一个创新系统,利用大语言模型智能体模拟数千用户进行可用性测试,使UX研究人员能在开展真人研究前通过定性、定量和视频反馈迭代评估网页设计。
English: UXAgent is an innovative system utilizing LLM-Agents to simulate thousands of users for usability testing, enabling UX researchers to iteratively evaluate web designs through qualitative, quantitative, and video feedback before conducting human studies.

Authors:Ying Lei, Yancheng Cao, Will Wang, Yuanzhe Dong, Changchang Yin, Weidan Cao, Ping Zhang, Jingzhen Yang, Bingsheng Yao, Yifan Peng, Chunhua Weng, Randy Auerbach, Lena Mamykina, Dakuo Wang, Yuntao Wang, Xuhai Xu
Title: WatchGuardian: Enabling User-Defined Personalized Just-in-Time Intervention on Smartwatch
Abstract:
While just-in-time interventions (JITIs) have effectively targeted common health behaviors, individuals often have unique needs to intervene in personal undesirable actions that can negatively affect physical, mental, and social well-being. We present WatchGuardian, a smartwatch-based JITI system that empowers users to define custom interventions for these personal actions with a small number of samples. For the model to detect new actions based on limited new data samples, we developed a few-shot learning pipeline that finetuned a pre-trained inertial measurement unit (IMU) model on public hand-gesture datasets. We then designed a data augmentation and synthesis process to train additional classification layers for customization. Our offline evaluation with 26 participants showed that with three, five, and ten examples, our approach achieved an average accuracy of 76.8%, 84.7%, and 87.7%, and an F1 score of 74.8%, 84.2%, and 87.2% We then conducted a four-hour intervention study to compare WatchGuardian against a rule-based intervention. Our results demonstrated that our system led to a significant reduction by 64.0 +- 22.6% in undesirable actions, substantially outperforming the baseline by 29.0%. Our findings underscore the effectiveness of a customizable, AI-driven JITI system for individuals in need of behavioral intervention in personal undesirable actions. We envision that our work can inspire broader applications of user-defined personalized intervention with advanced AI solutions.
中文摘要:WatchGuardian是一款基于智能手表的即时干预系统,通过小样本学习让用户自定义针对不良行为的个性化干预,在研究中实现了64.0%的行为显著减少。
English Summary: WatchGuardian is a smartwatch-based just-in-time intervention system that allows users to create personalized interventions for undesirable behaviors using few-shot learning, achieving significant action reduction of 64.0% in studies.

Authors:Ziqi Yang, Yuxuan Lu, Jennifer Bagdasarian, Vedant Das Swain, Ritu Agarwal, Collin Campbell, Waddah Al-Refaire, Jehan El-Bayoumi, Guodong Gao, Dakuo Wang, Bingsheng Yao, Nawar Shara
Title: RECOVER: Designing a Large Language Model-based Remote Patient Monitoring System for Postoperative Gastrointestinal Cancer Care
Abstract:
Cancer surgery is a key treatment for gastrointestinal (GI) cancers, a group of cancers that account for more than 35% of cancer-related deaths worldwide, but postoperative complications are unpredictable and can be life-threatening. In this paper, we investigate how recent advancements in large language models (LLMs) can benefit remote patient monitoring (RPM) systems through clinical integration by designing RECOVER, an LLM-powered RPM system for postoperative GI cancer care. To closely engage stakeholders in the design process, we first conducted seven participatory design sessions with five clinical staff and interviewed five cancer patients to derive six major design strategies for integrating clinical guidelines and information needs into LLM-based RPM systems. We then designed and implemented RECOVER, which features an LLM-powered conversational agent for cancer patients and an interactive dashboard for clinical staff to enable efficient postoperative RPM. Finally, we used RECOVER as a pilot system to assess the implementation of our design strategies with four clinical staff and five patients, providing design implications by identifying crucial design elements, offering insights on responsible AI, and outlining opportunities for future LLM-powered RPM systems.
中文摘要:本研究开发了基于大语言模型的RECOVER远程患者监护系统,通过利益相关者参与设计并评估其在胃肠癌术后护理中的应用,为未来AI医疗系统提供了关键设计要素与实施策略。
English Summary: This study introduces RECOVER, an LLM-powered remote patient monitoring system designed for postoperative gastrointestinal cancer care, developed through stakeholder engagement and evaluated to provide design insights for future AI-enhanced healthcare systems.

Authors:Shihan Fu, Bingsheng Yao, Smit Desai, Yuqi Hu, Yuling Sun, Samantha Stonbraker, Yanjun Gao, Elizabeth M. Goldberg, Dakuo Wang
Title: "It Felt Like I Was Left in the Dark": Exploring Information Needs and Design Opportunities for Family Caregivers of Older Adult Patients in Critical Care Settings
Abstract:
Older adult patients constitute a rapidly growing subgroup of Intensive Care Unit (ICU) patients. In these situations, their family caregivers are expected to represent the unconscious patients to access and interpret patients' medical information. However, caregivers currently have to rely on overloaded clinicians for information updates and typically lack the health literacy to understand complex medical information. Our project aims to explore the information needs of caregivers of ICU older adult patients, from which we can propose design opportunities to guide future AI systems. The project begins with formative interviews with 11 caregivers to identify their challenges in accessing and interpreting medical information; From these findings, we then synthesize design requirements and propose an AI system prototype to cope with caregivers' challenges. The system prototype has two key features: a timeline visualization to show the AI extracted and summarized older adult patients' key medical events; and an LLM-based chatbot to provide context-aware informational support. We conclude our paper by reporting on the follow-up user evaluation of the system and discussing future AI-based systems for ICU caregivers of older adults.
中文摘要:本项目针对老年ICU患者家属照护者的信息获取困境,开发了一个具备医疗事件时间轴和基于大语言模型的聊天机器人功能的AI系统原型,以提供有效的信息支持。
English Summary: This project identifies the information challenges faced by family caregivers of elderly ICU patients and develops an AI system prototype featuring a medical event timeline and an LLM-powered chatbot to address their needs.

Authors:Bingsheng Yao, Menglin Zhao, Yuling Sun, Weidan Cao, Changchang Yin, Stephen Intille, Xuhai Xu, Ping Zhang, Jingzhen Yang, Dakuo Wang
Title: More Modality, More AI: Exploring Design Opportunities of AI-Based Multi-modal Remote Monitoring Technologies for Early Detection of Mental Health Sequelae in Youth Concussion Patients
Abstract:
Anxiety, depression, and suicidality are common mental health sequelae following concussion in youth patients, often exacerbating concussion symptoms and prolonging recovery. Despite the critical need for early detection of these mental health symptoms, clinicians often face challenges in accurately collecting patients' mental health data and making clinical decision-making in a timely manner. Today's remote patient monitoring (RPM) technologies offer opportunities to objectively monitor patients' activities, but they were not specifically designed for youth concussion patients; moreover, the large amount of data collected by RPM technologies may also impose significant workloads on clinicians to keep up with and use the data. To address these gaps, we employed a three-stage study consisting of a formative study, interface design, and design evaluation. We first conducted a formative study through semi-structured interviews with six highly professional concussion clinicians and identified clinicians' key challenges in remotely collecting patient information and accessing patient treatment compliance. Subsequently, we proposed preliminary clinician-facing interface designs with the integration of AI-based RPM technologies (AI-RPM), followed by design evaluation sessions with highly professional concussion clinicians. Clinicians underscored the value of integrating multi-modal AI-RPM technologies to support their decision-making while emphasizing the importance of customizable interfaces through collaborative design and multiple responsible design considerations.
中文摘要:本研究通过临床医生访谈和迭代设计开发了AI增强的远程患者监测界面,旨在解决青少年脑震荡康复中精神健康跟踪的难题,并强调可定制化与负责任的设计实施。
English Summary: This study developed AI-enhanced remote patient monitoring interfaces through clinician interviews and iterative design to address mental health tracking challenges in youth concussion recovery, emphasizing customizable and responsible implementation.

Authors:Changchang Yin, Shihan Fu, Bingsheng Yao, Thai-Hoang Pham, Weidan Cao, Dakuo Wang, Jeffrey Caterino, Ping Zhang
Title: SepsisCalc: Integrating Clinical Calculators into Early Sepsis Prediction via Dynamic Temporal Graph Construction
Abstract:
Sepsis is an organ dysfunction caused by a deregulated immune response to an infection. Early sepsis prediction and identification allow for timely intervention, leading to improved clinical outcomes. Clinical calculators (e.g., the six-organ dysfunction assessment of SOFA) play a vital role in sepsis identification within clinicians' workflow, providing evidence-based risk assessments essential for sepsis diagnosis. However, artificial intelligence (AI) sepsis prediction models typically generate a single sepsis risk score without incorporating clinical calculators for assessing organ dysfunctions, making the models less convincing and transparent to clinicians. To bridge the gap, we propose to mimic clinicians' workflow with a novel framework SepsisCalc to integrate clinical calculators into the predictive model, yielding a clinically transparent and precise model for utilization in clinical settings. Practically, clinical calculators usually combine information from multiple component variables in Electronic Health Records (EHR), and might not be applicable when the variables are (partially) missing. We mitigate this issue by representing EHRs as temporal graphs and integrating a learning module to dynamically add the accurately estimated calculator to the graphs. Experimental results on real-world datasets show that the proposed model outperforms state-of-the-art methods on sepsis prediction tasks. Moreover, we developed a system to identify organ dysfunctions and potential sepsis risks, providing a human-AI interaction tool for deployment, which can help clinicians understand the prediction outputs and prepare timely interventions for the corresponding dysfunctions, paving the way for actionable clinical decision-making support for early intervention.
中文: 提出的SepsisCalc框架将临床计算器整合到AI模型中,通过动态估算缺失数据,在败血症预测中提高了透明度和准确性,并优于现有方法,同时提供人机交互工具以支持可操作的临床决策。
English: The proposed SepsisCalc framework integrates clinical calculators into AI models to enhance transparency and accuracy in sepsis prediction by dynamically estimating missing data and outperforming existing methods, while also providing a human-AI interaction tool for actionable clinical decision support.

Authors:Alvaro Becerra, Roberto Daza, Ruth Cobos, Aythami Morales, Mutlu Cukurova, Julian Fierrez
Title: AI-based Multimodal Biometrics for Detecting Smartphone Distractions: Application to Online Learning
Abstract:
This work investigates the use of multimodal biometrics to detect distractions caused by smartphone use during tasks that require sustained attention, with a focus on computer-based online learning. Although the methods are applicable to various domains, such as autonomous driving, we concentrate on the challenges learners face in maintaining engagement amid internal (e.g., motivation), system-related (e.g., course design) and contextual (e.g., smartphone use) factors. Traditional learning platforms often lack detailed behavioral data, but Multimodal Learning Analytics (MMLA) and biosensors provide new insights into learner attention. We propose an AI-based approach that leverages physiological signals and head pose data to detect phone use. Our results show that single biometric signals, such as brain waves or heart rate, offer limited accuracy, while head pose alone achieves 87%. A multimodal model combining all signals reaches 91% accuracy, highlighting the benefits of integration. We conclude by discussing the implications and limitations of deploying these models for real-time support in online learning environments.
中文: 本研究开发了一种多模态生物特征系统,利用生理信号和头部姿态数据检测在线学习中的手机使用分心行为,集成多信号后准确率达91%,而仅用头部姿态数据时为87%。
English: This study develops a multimodal biometric system using physiological signals and head pose data to detect smartphone-induced distractions during online learning, achieving 91% accuracy by integrating multiple signals compared to 87% with head pose alone.

Authors:Alvaro Becerra, Roberto Daza, Ruth Cobos, Aythami Morales, Julian Fierrez
Title: M2LADS Demo: A System for Generating Multimodal Learning Analytics Dashboards
Abstract:
We present a demonstration of a web-based system called M2LADS ("System for Generating Multimodal Learning Analytics Dashboards"), designed to integrate, synchronize, visualize, and analyze multimodal data recorded during computer-based learning sessions with biosensors. This system presents a range of biometric and behavioral data on web-based dashboards, providing detailed insights into various physiological and activity-based metrics. The multimodal data visualized include electroencephalogram (EEG) data for assessing attention and brain activity, heart rate metrics, eye-tracking data to measure visual attention, webcam video recordings, and activity logs of the monitored tasks. M2LADS aims to assist data scientists in two key ways: (1) by providing a comprehensive view of participants' experiences, displaying all data categorized by the activities in which participants are engaged, and (2) by synchronizing all biosignals and videos, facilitating easier data relabeling if any activity information contains errors.
中文: M2LADS是一个基于网络的系统,可集成并可视化多模态数据,包括脑电图、心率、眼动追踪和活动日志,为研究人员提供全面洞察并促进同步数据分析。
English: M2LADS is a web-based system that integrates and visualizes multimodal data, including EEG, heart rate, eye-tracking, and activity logs, to provide comprehensive insights and facilitate synchronized data analysis for researchers.

Authors:Roberto Daza, Lin Shengkai, Aythami Morales, Julian Fierrez, Katashi Nagao
Title: SMARTe-VR: Student Monitoring and Adaptive Response Technology for e-Learning in Virtual Reality
Abstract:
This work introduces SMARTe-VR, a platform for student monitoring in an immersive virtual reality environment designed for online education. SMARTe-VR aims to collect data for adaptive learning, focusing on facial biometrics and learning metadata. The platform allows instructors to create customized learning sessions with video lectures, featuring an interface with an AutoQA system to evaluate understanding, interaction tools (for example, textbook highlighting and lecture tagging), and real-time feedback. Furthermore, we released a dataset that contains 5 research challenges with data from 10 users in VR-based TOEIC sessions. This data set, which spans more than 25 hours, includes facial features, learning metadata, 450 responses, difficulty levels of the questions, concept tags, and understanding labels. Alongside the database, we present preliminary experiments using Item Response Theory models, adapted for understanding detection using facial features. Two architectures were explored: a Temporal Convolutional Network for local features and a Multilayer Perceptron for global features.
中文: 本研究介绍了SMARTe-VR虚拟现实平台,它通过面部生物特征和学习元数据监控在线教育中的学生以实现自适应学习,并发布了基于VR的托业课程数据集及采用项目反应理论模型进行理解检测的初步实验。
English: This study presents SMARTe-VR, a virtual reality platform for online education that monitors students through facial biometrics and learning metadata to enable adaptive learning, and it also releases a comprehensive dataset from VR-based TOEIC sessions along with preliminary experiments using Item Response Theory models for understanding detection.

Authors:Sunhao Dai, Wenjie Wang, Liang Pang, Jun Xu, See-Kiong Ng, Ji-Rong Wen, Tat-Seng Chua
Title: NExT-Search: Rebuilding User Feedback Ecosystem for Generative AI Search
Abstract:
Generative AI search is reshaping information retrieval by offering end-to-end answers to complex queries, reducing users' reliance on manually browsing and summarizing multiple web pages. However, while this paradigm enhances convenience, it disrupts the feedback-driven improvement loop that has historically powered the evolution of traditional Web search. Web search can continuously improve their ranking models by collecting large-scale, fine-grained user feedback (e.g., clicks, dwell time) at the document level. In contrast, generative AI search operates through a much longer search pipeline, spanning query decomposition, document retrieval, and answer generation, yet typically receives only coarse-grained feedback on the final answer. This introduces a feedback loop disconnect, where user feedback for the final output cannot be effectively mapped back to specific system components, making it difficult to improve each intermediate stage and sustain the feedback loop. In this paper, we envision NExT-Search, a next-generation paradigm designed to reintroduce fine-grained, process-level feedback into generative AI search. NExT-Search integrates two complementary modes: User Debug Mode, which allows engaged users to intervene at key stages; and Shadow User Mode, where a personalized user agent simulates user preferences and provides AI-assisted feedback for less interactive users. Furthermore, we envision how these feedback signals can be leveraged through online adaptation, which refines current search outputs in real-time, and offline update, which aggregates interaction logs to periodically fine-tune query decomposition, retrieval, and generation models. By restoring human control over key stages of the generative AI search pipeline, we believe NExT-Search offers a promising direction for building feedback-rich AI search systems that can evolve continuously alongside human feedback.
中文: 生成式AI搜索通过提供直接答案简化了信息检索,但破坏了系统改进所必需的反馈循环,因此提出的NExT-Search范式通过用户干预和AI辅助模拟重新引入细粒度的过程级反馈。
English: Generative AI search simplifies information retrieval by providing direct answers but breaks the feedback loop essential for system improvement, prompting the proposed NExT-Search paradigm to reintroduce fine-grained, process-level feedback through user intervention and AI-assisted simulations.

Authors:Xiaoyan Zhao, Yang Deng, Wenjie Wang, Hongzhan lin, Hong Cheng, Rui Zhang, See-Kiong Ng, Tat-Seng Chua
Title: Exploring the Impact of Personality Traits on Conversational Recommender Systems: A Simulation with Large Language Models
Abstract:
Conversational Recommender Systems (CRSs) engage users in multi-turn interactions to deliver personalized recommendations. The emergence of large language models (LLMs) further enhances these systems by enabling more natural and dynamic user interactions. However, a key challenge remains in understanding how personality traits shape conversational recommendation outcomes. Psychological evidence highlights the influence of personality traits on user interaction behaviors. To address this, we introduce an LLM-based personality-aware user simulation for CRSs (PerCRS). The user agent induces customizable personality traits and preferences, while the system agent possesses the persuasion capability to simulate realistic interaction in CRSs. We incorporate multi-aspect evaluation to ensure robustness and conduct extensive analysis from both user and system perspectives. Experimental results demonstrate that state-of-the-art LLMs can effectively generate diverse user responses aligned with specified personality traits, thereby prompting CRSs to dynamically adjust their recommendation strategies. Our experimental analysis offers empirical insights into the impact of personality traits on the outcomes of conversational recommender systems.
中文摘要:本研究提出了PerCRS,一种基于大语言模型的个性化对话推荐系统模拟框架,通过实验验证了人格特质如何影响用户交互行为并驱动推荐策略的动态调整。
English Summary: The study introduces PerCRS, an LLM-based personality-aware simulation for conversational recommender systems that demonstrates how personality traits influence user interactions and prompt dynamic adjustments in recommendation strategies.

Authors:Ziyang Ma, Xiquan Li, Yakun Song, Wenxi Chen, Chenpeng Du, Jian Wu, Yuanzhe Chen, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen
Title: Towards Reliable Large Audio Language Model
Abstract:
Recent advancements in large audio language models (LALMs) have demonstrated impressive results and promising prospects in universal understanding and reasoning across speech, music, and general sound. However, these models still lack the ability to recognize their knowledge boundaries and refuse to answer questions they don't know proactively. While there have been successful attempts to enhance the reliability of LLMs, reliable LALMs remain largely unexplored. In this paper, we systematically investigate various approaches towards reliable LALMs, including training-free methods such as multi-modal chain-of-thought (MCoT), and training-based methods such as supervised fine-tuning (SFT). Besides, we identify the limitations of previous evaluation metrics and propose a new metric, the Reliability Gain Index (RGI), to assess the effectiveness of different reliable methods. Our findings suggest that both training-free and training-based methods enhance the reliability of LALMs to different extents. Moreover, we find that awareness of reliability is a "meta ability", which can be transferred across different audio modalities, although significant structural and content differences exist among sound, music, and speech.
中文: 大型音频语言模型在音频理解方面展现出潜力但缺乏可靠性,可通过免训练和基于训练的方法提升,并提出了新评估指标以改进衡量效果。
English: Large audio language models show potential in audio understanding but lack reliability, which can be improved through training-free and training-based methods, with a new metric proposed for better evaluation.

Authors:Zheng Lian, Rui Liu, Kele Xu, Bin Liu, Xuefei Liu, Yazhou Zhang, Xin Liu, Yong Li, Zebang Cheng, Haolin Zuo, Ziyang Ma, Xiaojiang Peng, Xie Chen, Ya Li, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao
Title: MER 2025: When Affective Computing Meets Large Language Models
Abstract:
MER2025 is the third year of our MER series of challenges, aiming to bring together researchers in the affective computing community to explore emerging trends and future directions in the field. Previously, MER2023 focused on multi-label learning, noise robustness, and semi-supervised learning, while MER2024 introduced a new track dedicated to open-vocabulary emotion recognition. This year, MER2025 centers on the theme "When Affective Computing Meets Large Language Models (LLMs)".We aim to shift the paradigm from traditional categorical frameworks reliant on predefined emotion taxonomies to LLM-driven generative methods, offering innovative solutions for more accurate and reliable emotion understanding. The challenge features four tracks: MER-SEMI focuses on fixed categorical emotion recognition enhanced by semi-supervised learning; MER-FG explores fine-grained emotions, expanding recognition from basic to nuanced emotional states; MER-DES incorporates multimodal cues (beyond emotion words) into predictions to enhance model interpretability; MER-PR investigates whether emotion prediction results can improve personality recognition performance. For the first three tracks, baseline code is available at MERTools, and datasets can be accessed via Hugging Face. For the last track, the dataset and baseline code are available on GitHub.
中文:MER2025以“情感计算与大语言模型融合”为主题,旨在通过生成式方法革新传统情感分类框架,下设四个赛道分别聚焦半监督学习、细粒度情感识别、多模态可解释性及情感预测对人格识别的优化。
English: MER2025 focuses on integrating affective computing with large language models to transition from traditional emotion classification to generative approaches, featuring four specialized tracks that address semi-supervised learning, fine-grained emotions, multimodal interpretability, and personality recognition enhancements.

Authors:Tobias Labarta, Nhi Hoang, Katharina Weitz, Wojciech Samek, Sebastian Lapuschkin, Leander Weber
Title: See What I Mean? CUE: A Cognitive Model of Understanding Explanations
Abstract:
As machine learning systems increasingly inform critical decisions, the need for human-understandable explanations grows. Current evaluations of Explainable AI (XAI) often prioritize technical fidelity over cognitive accessibility which critically affects users, in particular those with visual impairments. We propose CUE, a model for Cognitive Understanding of Explanations, linking explanation properties to cognitive sub-processes: legibility (perception), readability (comprehension), and interpretability (interpretation). In a study (N=455) testing heatmaps with varying colormaps (BWR, Cividis, Coolwarm), we found comparable task performance but lower confidence/effort for visually impaired users. Unlike expected, these gaps were not mitigated and sometimes worsened by accessibility-focused color maps like Cividis. These results challenge assumptions about perceptual optimization and support the need for adaptive XAI interfaces. They also validate CUE by demonstrating that altering explanation legibility affects understandability. We contribute: (1) a formalized cognitive model for explanation understanding, (2) an integrated definition of human-centered explanation properties, and (3) empirical evidence motivating accessible, user-tailored XAI.
中文摘要:本研究提出CUE模型,通过将解释特性与认知子过程(可读性、可理解性和可解释性)联系起来评估可解释人工智能,发现针对可访问性的色彩映射并不总能改善视障用户的理解,强调了自适应XAI界面的必要性。
English Summary: The study introduces the CUE model to evaluate explainable AI (XAI) by connecting explanation properties to cognitive processes, revealing that accessibility-focused color maps do not always improve understanding for visually impaired users and emphasizing the need for adaptive XAI interfaces.

Authors:Eduardo Baena, Paolo Testolina, Michele Polese, Sergi Aliaga, Andrew Benincasa, Dimitrios Koutsonikolas, Josep Jornet, Tommaso Melodia
Title: Agentic Semantic Control for Autonomous Wireless Space Networks: Extending Space-O-RAN with MCP-Driven Distributed Intelligence
Abstract:
Lunar surface operations impose stringent requirements on wireless communication systems, including autonomy, robustness to disruption, and the ability to adapt to environmental and mission-driven context. While Space-O-RAN provides a distributed orchestration model aligned with 3GPP standards, its decision logic is limited to static policies and lacks semantic integration. We propose a novel extension incorporating a semantic agentic layer enabled by the Model Context Protocol (MCP) and Agent-to-Agent (A2A) communication protocols, allowing context-aware decision making across real-time, near-real-time, and non-real-time control layers. Distributed cognitive agents deployed in rovers, landers, and lunar base stations implement wireless-aware coordination strategies, including delay-adaptive reasoning and bandwidth-aware semantic compression, while interacting with multiple MCP servers to reason over telemetry, locomotion planning, and mission constraints.
中文: 该方案在Space-O-RAN中引入基于MCP和A2A协议的语义智能体层,使月球设备能通过分布式认知代理实现动态的情境感知无线协调与自适应通信。
English: The proposed extension to Space-O-RAN introduces a semantic agentic layer using MCP and A2A protocols, enabling dynamic, context-aware wireless coordination across lunar assets for adaptive communication strategies.

Authors:Fei Tang, Haolei Xu, Hang Zhang, Siqi Chen, Xingyu Wu, Yongliang Shen, Wenqi Zhang, Guiyang Hou, Zeqi Tan, Yuchen Yan, Kaitao Song, Jian Shao, Weiming Lu, Jun Xiao, Yueting Zhuang
Title: A Survey on (M)LLM-Based GUI Agents
Abstract:
Graphical User Interface (GUI) Agents have emerged as a transformative paradigm in human-computer interaction, evolving from rule-based automation scripts to sophisticated AI-driven systems capable of understanding and executing complex interface operations. This survey provides a comprehensive examination of the rapidly advancing field of LLM-based GUI Agents, systematically analyzing their architectural foundations, technical components, and evaluation methodologies. We identify and analyze four fundamental components that constitute modern GUI Agents: (1) perception systems that integrate text-based parsing with multimodal understanding for comprehensive interface comprehension; (2) exploration mechanisms that construct and maintain knowledge bases through internal modeling, historical experience, and external information retrieval; (3) planning frameworks that leverage advanced reasoning methodologies for task decomposition and execution; and (4) interaction systems that manage action generation with robust safety controls. Through rigorous analysis of these components, we reveal how recent advances in large language models and multimodal learning have revolutionized GUI automation across desktop, mobile, and web platforms. We critically examine current evaluation frameworks, highlighting methodological limitations in existing benchmarks while proposing directions for standardization. This survey also identifies key technical challenges, including accurate element localization, effective knowledge retrieval, long-horizon planning, and safety-aware execution control, while outlining promising research directions for enhancing GUI Agents' capabilities. Our systematic review provides researchers and practitioners with a thorough understanding of the field's current state and offers insights into future developments in intelligent interface automation.
中文摘要:本综述系统分析了基于大语言模型的图形用户界面智能体,剖析其感知、探索、规划与交互四大核心组件,揭示人工智能技术如何推动界面自动化革新,并探讨当前挑战与未来发展方向。
English Summary: This survey comprehensively examines LLM-based GUI Agents, analyzing their core components—perception, exploration, planning, and interaction—and highlighting how advances in AI have revolutionized interface automation while addressing current challenges and future directions.

Authors:Ji Won Chung, Tongyu Zhou, Ivy Chen, Kevin Hsu, Ryan A. Rossi, Alexa Siu, Shunan Guo, Franck Dernoncourt, James Tompkin, Jeff Huang
Title: InfoVids: Reimagining the Viewer Experience with Alternative Visualization-Presenter Relationships
Abstract:
Traditional data presentations typically separate the presenter and visualization into two separate spaces--the 3D world and a 2D screen--enforcing visualization-centric stories. To create a more human-centric viewing experience, we establish a more equitable relationship between the visualization and the presenter through our InfoVids. These infographics-inspired informational videos are crafted to redefine relationships between the presenter and visualizations. As we design InfoVids, we explore how the use of layout, form, and interactions affects the viewer experience. We compare InfoVids against their baseline 2D `slides' equivalents across 9 metrics with 30 participants and provide practical, long-term insights from an autobiographical perspective. Our mixed methods analyses reveal that this paradigm reduced viewer attention splitting, shifted the focus from the visualization to the presenter, and led to more interactive, natural, and engaging full-body data performances for viewers. Ultimately, InfoVids helped viewers re-imagine traditional dynamics between the presenter and visualizations.
中文: InfoVids通过在统一空间中整合演示者和可视化内容,建立了更平等的关系,减少了观众注意力分散,并通过互动式全身数据演示提升了参与度。
English: InfoVids create a more equitable relationship between presenters and visualizations by integrating them into a unified space, reducing attention splitting and enhancing engagement through interactive full-body data performances.

Authors:Zhendong Chu, Shen Wang, Jian Xie, Tinghui Zhu, Yibo Yan, Jinheng Ye, Aoxiao Zhong, Xuming Hu, Jing Liang, Philip S. Yu, Qingsong Wen
Title: LLM Agents for Education: Advances and Applications
Abstract:
Large Language Model (LLM) agents have demonstrated remarkable capabilities in automating tasks and driving innovation across diverse educational applications. In this survey, we provide a systematic review of state-of-the-art research on LLM agents in education, categorizing them into two broad classes: (1) \emph{Pedagogical Agents}, which focus on automating complex pedagogical tasks to support both teachers and students; and (2) \emph{Domain-Specific Educational Agents}, which are tailored for specialized fields such as science education, language learning, and professional development. We comprehensively examine the technological advancements underlying these LLM agents, including key datasets, benchmarks, and algorithmic frameworks that drive their effectiveness. Furthermore, we discuss critical challenges such as privacy, bias and fairness concerns, hallucination mitigation, and integration with existing educational ecosystems. This survey aims to provide a comprehensive technological overview of LLM agents for education, fostering further research and collaboration to enhance their impact for the greater good of learners and educators alike.
中文: 本综述系统梳理了大语言模型智能体在教育领域的应用进展,将其划分为教学辅助与专业领域两类,并针对隐私保护、算法偏见等关键挑战提出见解,以推动该领域研究发展。
English: This survey systematically reviews the advancements of Large Language Model agents in education, categorizing them into pedagogical and domain-specific agents while addressing key challenges like privacy and bias to guide future research.

Authors:Xinglong Mao, Shifeng Liu, Sirui Zhao, Tong Xu, Hanchao Wang, Baozhi Jia, Enhong Chen
Title: MERba: Multi-Receptive Field MambaVision for Micro-Expression Recognition
Abstract:
Micro-expressions (MEs) are brief, involuntary facial movements that reveal genuine emotions, offering valuable insights for psychological assessment and criminal investigations. Despite significant progress in automatic ME recognition (MER), existing methods still struggle to simultaneously capture localized muscle activations and global facial dependencies, both essential for decoding subtle emotional cues. To address this challenge, we propose MERba, a hierarchical multi-receptive field architecture specially designed for MER, which incorporates a series of Local-Global Feature Integration stages. Within each stage, detailed intra-window motion patterns are captured using MERba Local Extractors, which integrate MambaVision Mixers with a tailored asymmetric multi-scanning strategy to enhance local spatial sensitivity. These localized features are then aggregated through lightweight self-attention layers that explicitly model inter-window relationships, enabling effective global context construction. Furthermore, to mitigate the challenge of high inter-class similarity among negative MEs, we introduce a Dual-Granularity Classification Module that decomposes the recognition task into a coarse-to-fine paradigm. Extensive experiments on three benchmark datasets demonstrate that MERba consistently outperforms existing methods, with ablation studies confirming the effectiveness of each proposed component.
中文:MERba是一种用于微表情识别的分层架构,通过多感受野处理和双粒度分类模块整合局部与全局面部特征,在基准数据集上实现了优越性能。
English: MERba is a hierarchical architecture for micro-expression recognition that integrates local and global facial features through multi-receptive field processing and a dual-granularity classification module, achieving superior performance on benchmark datasets.

Authors:Shifeng Liu, Xinglong Mao, Sirui Zhao, Peiming Li, Tong Xu, Enhong Chen
Title: MER-CLIP: AU-Guided Vision-Language Alignment for Micro-Expression Recognition
Abstract:
As a critical psychological stress response, micro-expressions (MEs) are fleeting and subtle facial movements revealing genuine emotions. Automatic ME recognition (MER) holds valuable applications in fields such as criminal investigation and psychological diagnosis. The Facial Action Coding System (FACS) encodes expressions by identifying activations of specific facial action units (AUs), serving as a key reference for ME analysis. However, current MER methods typically limit AU utilization to defining regions of interest (ROIs) or relying on specific prior knowledge, often resulting in limited performance and poor generalization. To address this, we integrate the CLIP model's powerful cross-modal semantic alignment capability into MER and propose a novel approach namely MER-CLIP. Specifically, we convert AU labels into detailed textual descriptions of facial muscle movements, guiding fine-grained spatiotemporal ME learning by aligning visual dynamics and textual AU-based representations. Additionally, we introduce an Emotion Inference Module to capture the nuanced relationships between ME patterns and emotions with higher-level semantic understanding. To mitigate overfitting caused by the scarcity of ME data, we put forward LocalStaticFaceMix, an effective data augmentation strategy blending facial images to enhance facial diversity while preserving critical ME features. Finally, comprehensive experiments on four benchmark ME datasets confirm the superiority of MER-CLIP. Notably, UF1 scores on CAS(ME)3 reach 0.7832, 0.6544, and 0.4997 for 3-, 4-, and 7-class classification tasks, significantly outperforming previous methods.
中文: 本研究提出MER-CLIP新方法,利用CLIP的跨模态对齐能力,将动作单元转化为文本描述并结合情感推理模块,显著提升了微表情识别的性能,在基准数据集上取得优越结果。
English: The study introduces MER-CLIP, a novel approach that leverages CLIP's cross-modal alignment to enhance micro-expression recognition by converting action units into textual descriptions and incorporating an emotion inference module, achieving superior performance on benchmark datasets.

Authors:Runcong Zhao, Chengyu Cao, Qinglin Zhu, Xiucheng Lv, Shun Shao, Lin Gui, Ruifeng Xu, Yulan He
Title: Sparse Activation Editing for Reliable Instruction Following in Narratives
Abstract:
Complex narrative contexts often challenge language models' ability to follow instructions, and existing benchmarks fail to capture these difficulties. To address this, we propose Concise-SAE, a training-free framework that improves instruction following by identifying and editing instruction-relevant neurons using only natural language instructions, without requiring labelled data. To thoroughly evaluate our method, we introduce FreeInstruct, a diverse and realistic benchmark of 1,212 examples that highlights the challenges of instruction following in narrative-rich settings. While initially motivated by complex narratives, Concise-SAE demonstrates state-of-the-art instruction adherence across varied tasks without compromising generation quality.
中文:提出的Concise-SAE框架通过仅使用自然语言识别和编辑相关神经元来增强指令遵循能力,同时FreeInstruct基准测试评估了其在多样化叙事场景中的有效性。
English: The proposed Concise-SAE framework enhances instruction following by identifying and editing relevant neurons using only natural language, while the FreeInstruct benchmark evaluates its effectiveness across diverse narrative contexts.

Authors:Jinfeng Zhou, Yuxuan Chen, Jianing Yin, Yongkang Huang, Yihan Shi, Xikun Zhang, Libiao Peng, Rongsheng Zhang, Tangjie Lv, Zhipeng Hu, Hongning Wang, Minlie Huang
Title: Crisp: Cognitive Restructuring of Negative Thoughts through Multi-turn Supportive Dialogues
Abstract:
Cognitive Restructuring (CR) is a psychotherapeutic process aimed at identifying and restructuring an individual's negative thoughts, arising from mental health challenges, into more helpful and positive ones via multi-turn dialogues. Clinician shortage and stigma urge the development of human-LLM interactive psychotherapy for CR. Yet, existing efforts implement CR via simple text rewriting, fixed-pattern dialogues, or a one-shot CR workflow, failing to align with the psychotherapeutic process for effective CR. To address this gap, we propose CRDial, a novel framework for CR, which creates multi-turn dialogues with specifically designed identification and restructuring stages of negative thoughts, integrates sentence-level supportive conversation strategies, and adopts a multi-channel loop mechanism to enable iterative CR. With CRDial, we distill Crisp, a large-scale and high-quality bilingual dialogue dataset, from LLM. We then train Crispers, Crisp-based conversational LLMs for CR, at 7B and 14B scales. Extensive human studies show the superiority of Crispers in pointwise, pairwise, and intervention evaluations.
中文摘要:本研究提出CRDial框架,通过多轮对话的专门阶段和策略改进认知重构,并利用生成的双语数据集Crisp训练出高效的心理治疗对话模型。
English Summary: The study introduces CRDial, a framework that enhances Cognitive Restructuring through multi-turn dialogues with specialized stages and strategies, and develops Crisp, a bilingual dataset used to train effective conversational models for psychotherapy.

Authors:Sahand Sabour, June M. Liu, Siyang Liu, Chris Z. Yao, Shiyao Cui, Xuanming Zhang, Wen Zhang, Yaru Cao, Advait Bhat, Jian Guan, Wei Wu, Rada Mihalcea, Hongning Wang, Tim Althoff, Tatia M. C. Lee, Minlie Huang
Title: Human Decision-making is Susceptible to AI-driven Manipulation
Abstract:
Artificial Intelligence (AI) systems are increasingly intertwined with daily life, assisting users in executing various tasks and providing guidance on decision-making. This integration introduces risks of AI-driven manipulation, where such systems may exploit users' cognitive biases and emotional vulnerabilities to steer them toward harmful outcomes. Through a randomized controlled trial with 233 participants, we examined human susceptibility to such manipulation in financial (e.g., purchases) and emotional (e.g., conflict resolution) decision-making contexts. Participants interacted with one of three AI agents: a neutral agent (NA) optimizing for user benefit without explicit influence, a manipulative agent (MA) designed to covertly influence beliefs and behaviors, or a strategy-enhanced manipulative agent (SEMA) employing explicit psychological tactics to reach its hidden objectives. By analyzing participants' decision patterns and shifts in their preference ratings post-interaction, we found significant susceptibility to AI-driven manipulation. Particularly, across both decision-making domains, participants interacting with the manipulative agents shifted toward harmful options at substantially higher rates (financial, MA: 62.3%, SEMA: 59.6%; emotional, MA: 42.3%, SEMA: 41.5%) compared to the NA group (financial, 35.8%; emotional, 12.8%). Notably, our findings reveal that even subtle manipulative objectives (MA) can be as effective as employing explicit psychological strategies (SEMA) in swaying human decision-making. By revealing the potential for covert AI influence, this study highlights a critical vulnerability in human-AI interactions, emphasizing the need for ethical safeguards and regulatory frameworks to ensure responsible deployment of AI technologies and protect human autonomy.
中文: 研究表明人工智能系统能够在金融和情感决策中有效操纵人类选择,即使仅采用隐蔽影响策略的AI也能显著引导用户做出有害决定,这凸显了建立伦理保障机制的迫切性。
English: This study demonstrates that AI systems can effectively manipulate human decision-making in financial and emotional contexts, with even subtly manipulative agents significantly swaying users toward harmful choices, highlighting the urgent need for ethical safeguards.

Authors:Liangxuan Wu, Chao Wang, Tianming Liu, Yanjie Zhao, Haoyu Wang
Title: From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents
Abstract:
The growing adoption of large language models (LLMs) has led to a new paradigm in mobile computing--LLM-powered mobile AI agents--capable of decomposing and automating complex tasks directly on smartphones. However, the security implications of these agents remain largely unexplored. In this paper, we present the first comprehensive security analysis of mobile LLM agents, encompassing three representative categories: System-level AI Agents developed by original equipment manufacturers (e.g., YOYO Assistant), Third-party Universal Agents (e.g., Zhipu AI AutoGLM), and Emerging Agent Frameworks (e.g., Alibaba Mobile Agent). We begin by analyzing the general workflow of mobile agents and identifying security threats across three core capability dimensions: language-based reasoning, GUI-based interaction, and system-level execution. Our analysis reveals 11 distinct attack surfaces, all rooted in the unique capabilities and interaction patterns of mobile LLM agents, and spanning their entire operational lifecycle. To investigate these threats in practice, we introduce AgentScan, a semi-automated security analysis framework that systematically evaluates mobile LLM agents across all 11 attack scenarios. Applying AgentScan to nine widely deployed agents, we uncover a concerning trend: every agent is vulnerable to targeted attacks. In the most severe cases, agents exhibit vulnerabilities across eight distinct attack vectors. These attacks can cause behavioral deviations, privacy leakage, or even full execution hijacking. Based on these findings, we propose a set of defensive design principles and practical recommendations for building secure mobile LLM agents. Our disclosures have received positive feedback from two major device vendors. Overall, this work highlights the urgent need for standardized security practices in the fast-evolving landscape of LLM-driven mobile automation.
中文: 本研究首次对移动端大语言模型智能体进行系统性安全分析,通过AgentScan框架识别出11个攻击面并证实所有测试代理均存在漏洞,同时提出相应防御措施以应对安全风险。
English: This study presents the first comprehensive security analysis of mobile LLM agents, identifying 11 attack surfaces and demonstrating vulnerabilities across all tested agents through the AgentScan framework, while proposing defensive measures to address these risks.

Authors:Ho Yin, Ng, Ting-Yao Hsu, Jiyoo Min, Sungchul Kim, Ryan A. Rossi, Tong Yu, Hyunggu Jung, Ting-Hao 'Kenneth' Huang
Title: Understanding How Paper Writers Use AI-Generated Captions in Figure Caption Writing
Abstract:
Figures and their captions play a key role in scientific publications. However, despite their importance, many captions in published papers are poorly crafted, largely due to a lack of attention by paper authors. While prior AI research has explored caption generation, it has mainly focused on reader-centered use cases, where users evaluate generated captions rather than actively integrating them into their writing. This paper addresses this gap by investigating how paper authors incorporate AI-generated captions into their writing process through a user study involving 18 participants. Each participant rewrote captions for two figures from their own recently published work, using captions generated by state-of-the-art AI models as a resource. By analyzing video recordings of the writing process through interaction analysis, we observed that participants often began by copying and refining AI-generated captions. Paper writers favored longer, detail-rich captions that integrated textual and visual elements but found current AI models less effective for complex figures. These findings highlight the nuanced and diverse nature of figure caption composition, revealing design opportunities for AI systems to better support the challenges of academic writing.
中文: 本研究探讨作者如何将AI生成的图注融入写作过程,发现他们常通过复制和精炼来丰富细节,但现有AI对复杂图表处理不足,这为优化学术写作辅助系统指明了设计方向。
English: This study explores how authors integrate AI-generated captions into their writing process, revealing that while they often refine these captions for richer detail, current AI struggles with complex figures, pointing to future design improvements for academic support.

Authors:Minh Duc Vu, Jieshan Chen, Zhenchang Xing, Qinghua Lu, Xiwei Xu, Qian Fu
Title: FactFlow: Automatic Fact Sheet Generation and Customization from Tabular Dataset via AI Chain Design & Implementation
Abstract:
With the proliferation of data across various domains, there is a critical demand for tools that enable non-experts to derive meaningful insights without deep data analysis skills. To address this need, existing automatic fact sheet generation tools offer heuristic-based solutions to extract facts and generate stories. However, they inadequately grasp the semantics of data and struggle to generate narratives that fully capture the semantics of the dataset or align the fact sheet with specific user needs. Addressing these shortcomings, this paper introduces \tool, a novel tool designed for the automatic generation and customisation of fact sheets. \tool applies the concept of collaborative AI workers to transform raw tabular dataset into comprehensive, visually compelling fact sheets. We define effective taxonomy to profile AI worker for specialised tasks. Furthermore, \tool empowers users to refine these fact sheets through intuitive natural language commands, ensuring the final outputs align closely with individual preferences and requirements. Our user evaluation with 18 participants confirms that \tool not only surpasses state-of-the-art baselines in automated fact sheet production but also provides a positive user experience during customization tasks.
Chinese: 本文介绍了一种名为\tool的新型工具,它通过协作式AI工作者将原始数据转化为全面且视觉吸引力强的事实报告,有效解决了现有工具在理解数据语义和适应用户需求方面的不足,并支持用户通过自然语言指令进行个性化定制。
English: This paper introduces a novel tool called \tool that automatically generates and customizes fact sheets from raw data using collaborative AI workers, overcoming the limitations of existing tools by better understanding data semantics and allowing user refinement through natural language commands.

Authors:Yuxuan Liu, Hongda Sun, Wei Liu, Jian Luan, Bo Du, Rui Yan
Title: MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions
Abstract:
Mobile phone agents can assist people in automating daily tasks on their phones, which have emerged as a pivotal research spotlight. However, existing procedure-oriented agents struggle with cross-app instructions, due to the following challenges: (1) complex task relationships, (2) diverse app environment, and (3) error propagation and information loss in multi-step execution. Drawing inspiration from object-oriented programming principles, we recognize that object-oriented solutions is more suitable for cross-app instruction. To address these challenges, we propose a self-evolving multi-agent framework named MobileSteward, which integrates multiple app-oriented StaffAgents coordinated by a centralized StewardAgent. We design three specialized modules in MobileSteward: (1) Dynamic Recruitment generates a scheduling graph guided by information flow to explicitly associate tasks among apps. (2) Assigned Execution assigns the task to app-oriented StaffAgents, each equipped with app-specialized expertise to address the diversity between apps. (3) Adjusted Evaluation conducts evaluation to provide reflection tips or deliver key information, which alleviates error propagation and information loss during multi-step execution. To continuously improve the performance of MobileSteward, we develop a Memory-based Self-evolution mechanism, which summarizes the experience from successful execution, to improve the performance of MobileSteward. We establish the first English Cross-APP Benchmark (CAPBench) in the real-world environment to evaluate the agents' capabilities of solving complex cross-app instructions. Experimental results demonstrate that MobileSteward achieves the best performance compared to both single-agent and multi-agent frameworks, highlighting the superiority of MobileSteward in better handling user instructions with diverse complexity.
中文: MobileSteward是一个自我演进的多智能体框架,通过动态任务调度、分配执行和自适应评估来协调专业智能体,有效解决了跨应用自动化中的复杂任务关系、环境差异和错误传播等挑战,在处理复杂指令方面展现出卓越性能。
English: MobileSteward is a self-evolving multi-agent framework that addresses cross-app automation challenges by coordinating specialized agents through dynamic task scheduling, assigned execution, and adaptive evaluation, demonstrating superior performance in handling complex instructions.

Authors:Rock Yuren Pang, K. J. Kevin Feng, Shangbin Feng, Chu Li, Weijia Shi, Yulia Tsvetkov, Jeffrey Heer, Katharina Reinecke
Title: Interactive Reasoning: Visualizing and Controlling Chain-of-Thought Reasoning in Large Language Models
Abstract:
The output quality of large language models (LLMs) can be improved via "reasoning": generating segments of chain-of-thought (CoT) content to further condition the model prior to producing user-facing output. While these chains contain valuable information, they are verbose and lack explicit organization, making them tedious to review. Moreover, they lack opportunities for user feedback, such as to remove unwanted considerations, add desired ones, or clarify unclear assumptions. We introduce Interactive Reasoning, an interaction design that visualizes chain-of-thought outputs as a hierarchy of topics and enables user review and modification. We implement interactive reasoning in Hippo, a prototype for AI-assisted decision making in the face of uncertain trade-offs. In a user study with 16 participants, we find that interactive reasoning in Hippo allows users to quickly identify and interrupt erroneous generations, efficiently steer the model towards customized responses, and better understand both model reasoning and model outputs. Our work contributes to a new paradigm that incorporates user oversight into LLM reasoning processes.
中文:交互式推理通过将思维链输出可视化为层次结构,使用户能够审查、修改并引导模型响应,从而增强大型语言模型的用户监督和定制能力。
English: Interactive Reasoning enhances large language models by visualizing chain-of-thought outputs as a hierarchical structure, enabling users to review, modify, and steer model responses for improved oversight and customization.

Authors:Samy Abdel-Ghaffar, Isaac Galatzer-Levy, Conor Heneghan, Xin Liu, Sarah Kernasovskiy, Brennan Garrett, Andrew Barakat, Daniel McDuff
Title: Passive Measurement of Autonomic Arousal in Real-World Settings
Abstract:
The autonomic nervous system (ANS) is activated during stress, which can have negative effects on cardiovascular health, sleep, the immune system, and mental health. While there are ways to quantify ANS activity in laboratories, there is a paucity of methods that have been validated in real-world contexts. We present the Fitbit Body Response Algorithm, an approach to continuous remote measurement of ANS activation through widely available remote wrist-based sensors. The design was validated via two experiments, a Trier Social Stress Test (n = 45) and ecological momentary assessments (EMA) of perceived stress (n=87), providing both controlled and ecologically valid test data. Model performance predicting perceived stress when using all available sensor modalities was consistent with expectations (accuracy=0.85) and outperformed models with access to only a subset of the signals. We discuss and address challenges to sensing that arise in real world settings that do not present in conventional lab environments.
Chinese: Fitbit身体反应算法通过腕部传感器实现对自主神经系统激活的持续远程监测,经过实验室和真实环境下的压力测试验证,在预测感知压力方面表现出高准确性。
English: The Fitbit Body Response Algorithm enables continuous remote monitoring of autonomic nervous system activation using wrist sensors, validated through controlled and real-world stress tests with high accuracy in predicting perceived stress.

Authors:David C Wong, Bin Wang, Gorkem Durak, Marouane Tliba, Mohamed Amine Kerkouri, Aladine Chetouani, Ahmet Enis Cetin, Cagdas Topel, Nicolo Gennaro, Camila Vendrami, Tugce Agirlar Trabzonlu, Amir Ali Rahsepar, Laetitia Perronne, Matthew Antalek, Onural Ozturk, Gokcan Okur, Andrew C. Gordon, Ayis Pyrros, Frank H Miller, Amir A Borhani, Hatice Savas, Eric M. Hart, Elizabeth A Krupinski, Ulas Bagci
Title: Shifts in Doctors' Eye Movements Between Real and AI-Generated Medical Images
Abstract:
Eye-tracking analysis plays a vital role in medical imaging, providing key insights into how radiologists visually interpret and diagnose clinical cases. In this work, we first analyze radiologists' attention and agreement by measuring the distribution of various eye-movement patterns, including saccades direction, amplitude, and their joint distribution. These metrics help uncover patterns in attention allocation and diagnostic strategies. Furthermore, we investigate whether and how doctors' gaze behavior shifts when viewing authentic (Real) versus deep-learning-generated (Fake) images. To achieve this, we examine fixation bias maps, focusing on first, last, short, and longest fixations independently, along with detailed saccades patterns, to quantify differences in gaze distribution and visual saliency between authentic and synthetic images.
中文: 眼动追踪分析通过检测眼球运动模式揭示放射科医师的诊断策略,并展示他们在观察真实与人工智能生成医学图像时注视行为的差异。
English: Eye-tracking analysis reveals radiologists' diagnostic strategies by examining eye-movement patterns and demonstrates how their gaze behavior differs when viewing real versus AI-generated medical images.

Authors:Mengyao Wang, Jiayun Wu, Shuai Ma, Nuo Li, Peng Zhang, Ning Gu, Tun Lu
Title: Adaptive Human-Agent Teaming: A Review of Empirical Studies from the Process Dynamics Perspective
Abstract:
The rapid advancement of AI, including Large Language Models, has propelled autonomous agents forward, accelerating the human-agent teaming (HAT) paradigm to leverage complementary strengths. However, HAT research remains fragmented, often focusing on isolated team development phases or specific challenges like trust calibration while overlooking the real-world need for adaptability. Addressing these gaps, a process dynamics perspective is adopted to systematically review HAT using the T$^4$ framework: Team Formation, Task and Role Development, Team Development, and Team Improvement. Each phase is examined in terms of its goals, actions, and evaluation metrics, emphasizing the co-evolution of task and team dynamics. Special focus is given to the second and third phases, highlighting key factors such as team roles, shared mental model, and backup behaviors. This holistic perspective identifies future research directions for advancing long-term adaptive HAT.
Chinese: 摘要主张采用过程动态视角,通过T⁴框架系统审视人机协作(HAT),分析团队形成、任务与角色发展、团队发展及改进阶段,以解决现有研究碎片化问题并推动长期适应性发展。
English: The abstract advocates for a process dynamics perspective using the T⁴ framework to systematically review human-agent teaming (HAT), addressing its fragmented research by examining team formation, task and role development, team development, and improvement phases to foster long-term adaptability.

Authors:Neil Mallinar, A. Ali Heydari, Xin Liu, Anthony Z. Faranesh, Brent Winslow, Nova Hammerquist, Benjamin Graef, Cathy Speed, Mark Malhotra, Shwetak Patel, Javier L. Prieto, Daniel McDuff, Ahmed A. Metwally
Title: A Scalable Framework for Evaluating Health Language Models
Abstract:
Large language models (LLMs) have emerged as powerful tools for analyzing complex datasets. Recent studies demonstrate their potential to generate useful, personalized responses when provided with patient-specific health information that encompasses lifestyle, biomarkers, and context. As LLM-driven health applications are increasingly adopted, rigorous and efficient one-sided evaluation methodologies are crucial to ensure response quality across multiple dimensions, including accuracy, personalization and safety. Current evaluation practices for open-ended text responses heavily rely on human experts. This approach introduces human factors and is often cost-prohibitive, labor-intensive, and hinders scalability, especially in complex domains like healthcare where response assessment necessitates domain expertise and considers multifaceted patient data. In this work, we introduce Adaptive Precise Boolean rubrics: an evaluation framework that streamlines human and automated evaluation of open-ended questions by identifying gaps in model responses using a minimal set of targeted rubrics questions. Our approach is based on recent work in more general evaluation settings that contrasts a smaller set of complex evaluation targets with a larger set of more precise, granular targets answerable with simple boolean responses. We validate this approach in metabolic health, a domain encompassing diabetes, cardiovascular disease, and obesity. Our results demonstrate that Adaptive Precise Boolean rubrics yield higher inter-rater agreement among expert and non-expert human evaluators, and in automated assessments, compared to traditional Likert scales, while requiring approximately half the evaluation time of Likert-based methods. This enhanced efficiency, particularly in automated evaluation and non-expert contributions, paves the way for more extensive and cost-effective evaluation of LLMs in health.
中文: 本文提出的自适应精确布尔评估框架通过针对性布尔问题改进了医疗领域大语言模型的评估效率,相比传统方法在提升评分一致性的同时将评估时间缩短约一半。
English: This paper introduces Adaptive Precise Boolean rubrics, an efficient evaluation framework that improves assessment of large language models in healthcare by using targeted boolean questions, achieving higher agreement and faster evaluation compared to traditional methods.

Authors:Shun Liao, Paolo Di Achille, Jiang Wu, Silviu Borac, Jonathan Wang, Xin Liu, Eric Teasley, Lawrence Cai, Yuzhe Yang, Yun Liu, Daniel McDuff, Hao-Wei Su, Brent Winslow, Anupam Pathak, Shwetak Patel, James A. Taylor, Jameson K. Rogers, Ming-Zher Poh
Title: Passive Heart Rate Monitoring During Smartphone Use in Everyday Life
Abstract:
Resting heart rate (RHR) is an important biomarker of cardiovascular health and mortality, but tracking it longitudinally generally requires a wearable device, limiting its availability. We present PHRM, a deep learning system for passive heart rate (HR) and RHR measurements during everyday smartphone use, using facial video-based photoplethysmography. Our system was developed using 225,773 videos from 495 participants and validated on 185,970 videos from 205 participants in laboratory and free-living conditions, representing the largest validation study of its kind. Compared to reference electrocardiogram, PHRM achieved a mean absolute percentage error (MAPE) < 10% for HR measurements across three skin tone groups of light, medium and dark pigmentation; MAPE for each skin tone group was non-inferior versus the others. Daily RHR measured by PHRM had a mean absolute error < 5 bpm compared to a wearable HR tracker, and was associated with known risk factors. These results highlight the potential of smartphones to enable passive and equitable heart health monitoring.
中文: PHRM深度学习系统通过智能手机面部视频实现无接触心率与静息心率监测,在不同肤色人群中均保持高精度,展现了智能手机在心血管健康平等监测领域的应用潜力。
English: PHRM is a deep learning system that uses facial videos from smartphones to passively measure heart rate and resting heart rate, achieving high accuracy across diverse skin tones and demonstrating potential for equitable cardiovascular health monitoring.

Authors:Wenxin Zhao, Fangyu Yu, Peng Zhang, Hansu Gu, Lin Wang, Siyuan Qiao, Tun Lu, Ning Gu
Title: YouthCare: Building a Personalized Collaborative Video Censorship Tool to Support Parent-Child Joint Media Engagement
Abstract:
To mitigate the negative impacts of online videos on teenagers, existing research and platforms have implemented various parental mediation mechanisms, such as Parent-Child Joint Media Engagement (JME). However, JME generally relies heavily on parents' time, knowledge, and experience. To fill this gap, we aim to design an automatic tool to help parents/children censor videos more effectively and efficiently in JME. For this goal, we first conducted a formative study to identify the needs and expectations of teenagers and parents for such a system. Based on the findings, we designed YouthCare, a personalized collaborative video censorship tool that supports parents and children to collaboratively filter out inappropriate content and select appropriate content in JME. An evaluation with 10 parent-child pairs demonstrated YouthCare's several strengths in supporting video censorship, while also highlighting some potential problems. These findings inspire us to propose several insights for the future design of parent-child collaborative JME systems.
中文摘要:为弥补亲子共同媒介参与中家长时间与经验的不足,本研究开发了YouthCare自动化工具,通过协同过滤机制帮助家庭管理视频内容,实验验证其有效性的同时为未来设计提供了改进方向。
English Summary: To address the limitations of time and expertise in Parent-Child Joint Media Engagement, this study designed YouthCare, an automated tool that helps families collaboratively filter online video content, which was evaluated positively while revealing areas for future improvement.

Authors:Xuechen Zhang, Changyang He, Peng Zhang, Hansu Gu, Ning Gu, Qi Shen, Zhan Hu, Tun Lu
Title: RemiHaven: Integrating "In-Town" and "Out-of-Town" Peers to Provide Personalized Reminiscence Support for Older Drifters
Abstract:
With increasing social mobility and an aging society, more older adults in China are migrating to new cities, known as "older drifters." Due to fewer social connections and cultural adaptation challenges, they face negative emotions such as loneliness and depression. While reminiscence-based interventions have been used to improve older adults' psychological well-being, challenges such as the lack of tangible materials and limited social resources constrain the feasibility of traditional reminiscence approaches for older drifters. To address this challenge, we designed RemiHaven, a personalized reminiscence support tool based on a two-phase formative study. It integrates "In-Town" and "Out-of-Town" peer agents to enhance personalization, engagement, and emotional resonance in the reminiscence process, powered by Multimodal Large Language Models (MLLMs). Our evaluations show RemiHaven's strengths in supporting reminiscence while identifying potential challenges. We conclude by offering insights for the future design of reminiscence support tools for older migrants.
Chinese: 为解决中国老年漂群体的孤独抑郁问题,RemiHaven这一基于多模态大语言模型和同伴代理的个性化怀旧工具被开发出来,并通过怀旧过程被证明能有效支持情感健康。
English: To address the loneliness and depression faced by older drifters in China, RemiHaven, a personalized reminiscence tool using multimodal large language models and peer agents, was developed and shown to effectively support emotional well-being through reminiscence.

Authors:Changlun Li, Yao Shi, Yuyu Luo, Nan Tang
Title: Rise of the Community Champions: From Reviewer Crunch to Community Power
Abstract:
Academic publishing is facing a crisis driven by exponential growth in submissions and an overwhelmed peer review system, leading to inconsistent decisions and a severe reviewer shortage. This paper introduces Panvas, a platform that reimagines academic publishing as a continuous, community-driven process. Panvas addresses these systemic failures with a novel combination of economic incentives (paid reviews) and rich interaction mechanisms (multi-dimensional ratings, threaded discussions, and expert-led reviews). By moving beyond the traditional accept/reject paradigm and integrating paper hosting with code/data repositories and social networking, Panvas fosters a meritocratic environment for scholarly communication and presents a radical rethinking of how we evaluate and disseminate scientific knowledge. We present the system design, development roadmap, and a user study plan to evaluate its effectiveness.
Chinese: Panvas平台通过引入付费评审和互动机制,将学术出版重塑为持续、社区驱动的流程,超越传统的接受/拒绝模式,构建了基于学术贡献的知识共享体系。
English: The Panvas platform tackles the crisis in academic publishing by transforming it into a continuous, community-driven process with paid reviews and interactive features, moving beyond traditional accept/reject decisions to create a merit-based system for sharing knowledge.

Authors:Siyu Yan, Tiancheng Liu, Weikai Yang, Nan Tang, Yuyu Luo
Title: ChartEditor: A Human-AI Paired Tool for Authoring Pictorial Charts
Abstract:
Pictorial charts are favored for their memorability and visual appeal, offering a more engaging alternative to basic charts. However, their creation can be complex and time-consuming due to the lack of native support in popular visualization tools like Tableau. While AI-generated content (AIGC) tools have lowered the barrier to creating pictorial charts, they often lack precise design control. To address this issue, we introduce ChartEditor, a human-AI paired tool that transforms basic charts into pictorial versions based on user intent. ChartEditor decomposes chart images into visual components and organizes them within a hierarchical tree. Based on this tree, users can express their intent in natural language, which is then translated into modifications to the hierarchy. In addition, users can directly interact with and modify specific chart components via an intuitive interface to achieve fine-grained design control. A user study demonstrates the effectiveness and usability of ChartEditor in simplifying the creation of pictorial charts.
中文摘要:ChartEditor是一款人机协作工具,通过解析自然语言指令并支持直接组件编辑,将基础图表转化为图示化版本,显著简化了图示图表的制作流程。
English Summary: ChartEditor is a human-AI collaboration tool that converts standard charts into pictorial versions by interpreting natural language commands and enabling direct component manipulation, effectively simplifying their creation process.

Authors:Yi-Cheng Lin, Kang-Chieh Chen, Zhe-Yan Li, Tzu-Heng Wu, Tzu-Hsuan Wu, Kuan-Yu Chen, Hung-yi Lee, Yun-Nung Chen
Title: Creativity in LLM-based Multi-Agent Systems: A Survey
Abstract:
Large language model (LLM)-driven multi-agent systems (MAS) are transforming how humans and AIs collaboratively generate ideas and artifacts. While existing surveys provide comprehensive overviews of MAS infrastructures, they largely overlook the dimension of \emph{creativity}, including how novel outputs are generated and evaluated, how creativity informs agent personas, and how creative workflows are coordinated. This is the first survey dedicated to creativity in MAS. We focus on text and image generation tasks, and present: (1) a taxonomy of agent proactivity and persona design; (2) an overview of generation techniques, including divergent exploration, iterative refinement, and collaborative synthesis, as well as relevant datasets and evaluation metrics; and (3) a discussion of key challenges, such as inconsistent evaluation standards, insufficient bias mitigation, coordination conflicts, and the lack of unified benchmarks. This survey offers a structured framework and roadmap for advancing the development, evaluation, and standardization of creative MAS.
中文: 本综述首次聚焦于大语言模型驱动的多智能体系统中的创造力问题,通过提出智能体分类体系、生成技术及核心挑战,为推进创造性人工智能协作提供了结构化框架与发展路线图。
English: This survey is the first to focus on creativity in large language model-driven multi-agent systems, presenting a taxonomy of agent design, generation techniques, and key challenges to provide a framework for advancing creative AI collaboration.

Authors:Yue Xing, Wensheng Gan, Qidi Chen, Philip S. Yu
Title: AI-Generated Content in Landscape Architecture: A Survey
Abstract:
Landscape design is a complex process that requires designers to engage in intricate planning, analysis, and decision-making. This process involves the integration and reconstruction of science, art, and technology. Traditional landscape design methods often rely on the designer's personal experience and subjective aesthetics, with design standards rooted in subjective perception. As a result, they lack scientific and objective evaluation criteria and systematic design processes. Data-driven artificial intelligence (AI) technology provides an objective and rational design process. With the rapid development of different AI technologies, AI-generated content (AIGC) has permeated various aspects of landscape design at an unprecedented speed, serving as an innovative design tool. This article aims to explore the applications and opportunities of AIGC in landscape design. AIGC can support landscape design in areas such as site research and analysis, design concepts and scheme generation, parametric design optimization, plant selection and visual simulation, construction management, and process optimization. However, AIGC also faces challenges in landscape design, including data quality and reliability, design expertise and judgment, technical challenges and limitations, site characteristics and sustainability, user needs and participation, the balance between technology and creativity, ethics, and social impact. Finally, this article provides a detailed outlook on the future development trends and prospects of AIGC in landscape design. Through in-depth research and exploration in this review, readers can gain a better understanding of the relevant applications, potential opportunities, and key challenges of AIGC in landscape design.
Chinese: 数据驱动的人工智能技术,特别是AIGC,为景观设计提供了客观系统的创新工具,在场地分析与方案生成等方面带来机遇,同时也面临数据可靠性及伦理考量等挑战。
English: Data-driven AI technology, particularly AIGC, offers innovative tools for objective and systematic landscape design processes, presenting opportunities in site analysis and design generation while facing challenges in data reliability and ethical considerations.

Authors:Chen Huang, Yang Deng, Wenqiang Lei, Jiancheng Lv, Tat-Seng Chua, Jimmy Xiangji Huang
Title: How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and Beyond
Abstract:
With the advancement of large language models (LLMs), intelligent models have evolved from mere tools to autonomous agents with their own goals and strategies for cooperating with humans. This evolution has birthed a novel paradigm in NLP, i.e., human-model cooperation, that has yielded remarkable progress in numerous NLP tasks in recent years. In this paper, we take the first step to present a thorough review of human-model cooperation, exploring its principles, formalizations, and open challenges. In particular, we introduce a new taxonomy that provides a unified perspective to summarize existing approaches. Also, we discuss potential frontier areas and their corresponding challenges. We regard our work as an entry point, paving the way for more breakthrough research in this regard.
中文摘要:随着大语言模型发展为具有自主目标的智能体,人机协作已成为自然语言处理领域的新范式,本文通过统一分类法首次系统综述该领域并展望未来研究方向。
English Summary: The evolution of large language models into autonomous agents has established human-model cooperation as a transformative paradigm in NLP, which this paper systematically reviews through a unified taxonomy while identifying future research directions.

Authors:Tanmay Chakraborty, Marion Koelle, Jörg Schlötterer, Nadine Schlicker, Christian Wirth, Christin Seifert
Title: Explanation format does not matter; but explanations do -- An Eggsbert study on explaining Bayesian Optimisation tasks
Abstract:
Bayesian Optimisation (BO) is a family of methods for finding optimal parameters when the underlying function to be optimised is unknown. BO is used, for example, for hyperparameter tuning in machine learning and as an expert support tool for tuning cyberphysical systems. For settings where humans are involved in the tuning task, methods have been developed to explain BO (Explainable Bayesian Optimization, XBO). However, there is little guidance on how to present XBO results to humans so that they can tune the system effectively and efficiently. In this paper, we investigate how the XBO explanation format affects users' task performance, task load, understanding and trust in XBO. We chose a task that is accessible to a wide range of users. Specifically, we set up an egg cooking scenario with 6 parameters that participants had to adjust to achieve a perfect soft-boiled egg. We compared three different explanation formats: a bar chart, a list of rules and a textual explanation in a between-subjects online study with 213 participants. Our results show that adding any type of explanation increases task success, reduces the number of trials needed to achieve success, and improves comprehension and confidence. While explanations add more information for participants to process, we found no increase in user task load. We also found that the aforementioned results were independent of the explanation format; all formats had a similar effect. This is an interesting finding for practical applications, as it suggests that explanations can be added to BO tuning tasks without the burden of designing or selecting specific explanation formats. In the future, it would be interesting to investigate scenarios of prolonged use of the explanation formats and whether they have different effects on users' mental models of the underlying system.
中文: 本研究表明,在贝叶斯优化中加入任何形式的解释都能显著提高用户的任务表现、理解和信心,且不会增加任务负担,所有测试的解释格式均产生相似效果。
English: This study demonstrates that incorporating any form of explanation into Bayesian Optimization significantly improves user performance, comprehension, and confidence without increasing task load, with all tested formats yielding similar benefits.

Authors:Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Chunyu Miao, Dongyuan Li, Aiwei Liu, Yue Zhou, Yankai Chen, Weizhi Zhang, Yangning Li, Liancheng Fang, Renhe Jiang, Philip S. Yu
Title: A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy
Abstract:
Recent improvements in large language models (LLMs) have led many researchers to focus on building fully autonomous AI agents. This position paper questions whether this approach is the right path forward, as these autonomous systems still have problems with reliability, transparency, and understanding the actual requirements of human. We suggest a different approach: LLM-based Human-Agent Systems (LLM-HAS), where AI works with humans rather than replacing them. By keeping human involved to provide guidance, answer questions, and maintain control, these systems can be more trustworthy and adaptable. Looking at examples from healthcare, finance, and software development, we show how human-AI teamwork can handle complex tasks better than AI working alone. We also discuss the challenges of building these collaborative systems and offer practical solutions. This paper argues that progress in AI should not be measured by how independent systems become, but by how well they can work with humans. The most promising future for AI is not in systems that take over human roles, but in those that enhance human capabilities through meaningful partnership.
中文: 本立场文件主张发展基于大语言模型的人机协作系统,而非完全自主的AI代理,强调在医疗、金融等领域通过人机协同能提升系统可靠性、透明度和适应能力,实现增强人类智能的合作伙伴关系。
English: This position paper advocates for LLM-based Human-Agent Systems (LLM-HAS) over fully autonomous AI agents, arguing that human-AI collaboration ensures greater reliability, transparency, and adaptability in complex tasks across fields like healthcare and finance.

Authors:Jiankai Tang, Kegang Wang, Yingke Ding, Jiatong Ji, Zeyu Wang, Xiyuxing Zhang, Ping Chen, Yuanchun Shi, Yuntao Wang
Title: A Dataset and Toolkit for Multiparameter Cardiovascular Physiology Sensing on Rings
Abstract:
Smart rings offer a convenient way to continuously and unobtrusively monitor cardiovascular physiological signals. However, a gap remains between the ring hardware and reliable methods for estimating cardiovascular parameters, partly due to the lack of publicly available datasets and standardized analysis tools. In this work, we present $τ$-Ring, the first open-source ring-based dataset designed for cardiovascular physiological sensing. The dataset comprises photoplethysmography signals (infrared and red channels) and 3-axis accelerometer data collected from two rings (reflective and transmissive optical paths), with 28.21 hours of raw data from 34 subjects across seven activities. $τ$-Ring encompasses both stationary and motion scenarios, as well as stimulus-evoked abnormal physiological states, annotated with four ground-truth labels: heart rate, respiratory rate, oxygen saturation, and blood pressure. Using our proposed RingTool toolkit, we evaluated three widely-used physics-based methods and four cutting-edge deep learning approaches. Our results show superior performance compared to commercial rings, achieving best MAE values of 5.18 BPM for heart rate, 2.98 BPM for respiratory rate, 3.22\% for oxygen saturation, and 13.33/7.56 mmHg for systolic/diastolic blood pressure estimation. The open-sourced dataset and toolkit aim to foster further research and community-driven advances in ring-based cardiovascular health sensing.
智能戒指为持续监测心血管信号提供了便捷方式,但在可靠参数估计方面仍存在挑战;本研究推出了首个开源戒指数据集及工具包,在关键健康指标测量上展现出优于商用设备的性能。
Smart rings provide a convenient means for continuous cardiovascular monitoring, yet face challenges in reliable parameter estimation due to limited datasets and tools; this work introduces the first open-source ring-based dataset and toolkit, demonstrating superior performance over commercial devices in measuring key health metrics.

Authors:Jun Fang, Yanuo Zhou, Ka I Chan, Jiajin Li, Zeyi Sun, Zhengnan Li, Zicong Fu, Hongjing Piao, Haodong Xu, Yuanchun Shi, Yuntao Wang
Title: A Review of Behavioral Closed-Loop Paradigm from Sensing to Intervention for Ingestion Health
Abstract:
Ingestive behavior plays a critical role in health, yet many existing interventions remain limited to static guidance or manual self-tracking. With the increasing integration of sensors, context-aware computing, and perceptual computing, recent systems have begun to support closed-loop interventions that dynamically sense user behavior and provide feedback during or around ingestion episodes. In this survey, we review 136 studies that leverage sensor-enabled or interaction-mediated approaches to influence ingestive behavior. We propose a behavioral closed-loop paradigm rooted in context-aware computing and inspired by HCI behavior change frameworks, comprising four components: target behaviors, sensing modalities, reasoning and intervention strategies. A taxonomy of sensing and intervention modalities is presented, organized along human- and environment-based dimensions. Our analysis also examines evaluation methods and design trends across different modality-behavior pairings. This review reveals prevailing patterns and critical gaps, offering design insights for future adaptive and context-aware ingestion health interventions.
中文:随着传感器和情境感知计算的发展,闭环干预系统能够动态监测并影响摄食行为,本文通过分析136项研究提出分类框架并揭示未来健康技术的设计方向。
English: Recent advances in sensor and context-aware computing enable closed-loop interventions that dynamically monitor and influence ingestive behavior, with this survey analyzing 136 studies to propose a taxonomy and reveal design insights for future health technologies.

Authors:Qiaosi Wang, Xuhui Zhou, Maarten Sap, Jodi Forlizzi, Hong Shen
Title: Rethinking Theory of Mind Benchmarks for LLMs: Towards A User-Centered Perspective
Abstract:
The last couple of years have witnessed emerging research that appropriates Theory-of-Mind (ToM) tasks designed for humans to benchmark LLM's ToM capabilities as an indication of LLM's social intelligence. However, this approach has a number of limitations. Drawing on existing psychology and AI literature, we summarize the theoretical, methodological, and evaluation limitations by pointing out that certain issues are inherently present in the original ToM tasks used to evaluate human's ToM, which continues to persist and exacerbated when appropriated to benchmark LLM's ToM. Taking a human-computer interaction (HCI) perspective, these limitations prompt us to rethink the definition and criteria of ToM in ToM benchmarks in a more dynamic, interactional approach that accounts for user preferences, needs, and experiences with LLMs in such evaluations. We conclude by outlining potential opportunities and challenges towards this direction.
中文摘要:近期研究采用人类心智理论任务评估大语言模型社交智能存在诸多局限,呼吁建立更动态、交互式的评估标准,充分考虑用户需求与体验。
English Summary: Recent research using human-designed Theory-of-Mind tasks to evaluate LLMs' social intelligence faces significant limitations, prompting a need for more dynamic, interaction-focused benchmarks that consider user experiences.

Authors:Jiexin Ding, Bowen Zhao, Yuntao Wang, Xinyun Liu, Rui Hao, Ishan Chatterjee, Yuanchun Shi
Title: Unknown Word Detection for English as a Second Language (ESL) Learners Using Gaze and Pre-trained Language Models
Abstract:
English as a Second Language (ESL) learners often encounter unknown words that hinder their text comprehension. Automatically detecting these words as users read can enable computing systems to provide just-in-time definitions, synonyms, or contextual explanations, thereby helping users learn vocabulary in a natural and seamless manner. This paper presents EyeLingo, a transformer-based machine learning method that predicts the probability of unknown words based on text content and eye gaze trajectory in real time with high accuracy. A 20-participant user study revealed that our method can achieve an accuracy of 97.6%, and an F1-score of 71.1%. We implemented a real-time reading assistance prototype to show the effectiveness of EyeLingo. The user study shows improvement in willingness to use and usefulness compared to baseline methods.
中文:EyeLingo是一种基于Transformer的方法,通过结合文本内容和眼动轨迹实时预测ESL学习者的生词,准确率达97.6%,有效提升了阅读辅助的实用性和用户体验。
English: EyeLingo is a transformer-based method that accurately predicts ESL learners' unfamiliar words using text and eye gaze data, achieving 97.6% accuracy and enhancing reading assistance effectiveness.

Authors:Zhipeng Li, Yishu Ji, Ruijia Chen, Tianqi Liu, Yuntao Wang, Yuanchun Shi, Yukang Yan
Title: Modeling the Impact of Visual Stimuli on Redirection Noticeability with Gaze Behavior in Virtual Reality
Abstract:
While users could embody virtual avatars that mirror their physical movements in Virtual Reality, these avatars' motions can be redirected to enable novel interactions. Excessive redirection, however, could break the user's sense of embodiment due to perceptual conflicts between vision and proprioception. While prior work focused on avatar-related factors influencing the noticeability of redirection, we investigate how the visual stimuli in the surrounding virtual environment affect user behavior and, in turn, the noticeability of redirection. Given the wide variety of different types of visual stimuli and their tendency to elicit varying individual reactions, we propose to use users' gaze behavior as an indicator of their response to the stimuli and model the noticeability of redirection. We conducted two user studies to collect users' gaze behavior and noticeability, investigating the relationship between them and identifying the most effective gaze behavior features for predicting noticeability. Based on the data, we developed a regression model that takes users' gaze behavior as input and outputs the noticeability of redirection. We then conducted an evaluation study to test our model on unseen visual stimuli, achieving an accuracy of 0.012 MSE. We further implemented an adaptive redirection technique and conducted a proof-of-concept study to evaluate its effectiveness with complex visual stimuli in two applications. The results indicated that participants experienced less physical demanding and a stronger sense of body ownership when using our adaptive technique, demonstrating the potential of our model to support real-world use cases.
中文: 本研究开发了一种基于视线行为的模型来预测虚拟现实中化身运动重定向的可察觉性,通过响应用户的视觉注意力,自适应技术能减轻身体负担并增强身体拥有感。
English: This study develops a gaze-based model to predict the noticeability of avatar motion redirection in Virtual Reality, enabling adaptive techniques that reduce physical strain and enhance body ownership by responding to users' visual attention.

Authors:Zeyu Wang, Ruotong Yu, Xiangyang Wang, Jiexin Ding, Jiankai Tang, Jun Fang, Zhe He, Zhuojun Li, Tobias Röddiger, Weiye Xu, Xiyuxing Zhang, huan-ang Gao, Nan Gao, Chun Yu, Yuanchun Shi, Yuntao Wang
Title: Computing with Smart Rings: A Systematic Literature Review
Abstract:
A smart ring is a wearable electronic device in the form of a ring that incorporates diverse sensors and computing technologies to perform a variety of functions. Designed for use with fingers, smart rings are capable of sensing more subtle and abundant hand movements, thus making them a good platform for interaction. Meanwhile, fingers are abundant with blood vessels and nerve endings and accustomed to wearing rings, providing an ideal site for continuous health monitoring through smart rings, which combine comfort with the ability to capture vital biometric data, making them suitable for all-day wear. We collected in total of 206 smart ring-related publications and conducted a systematic literature review. We provide a taxonomy regarding the sensing and feedback modalities, applications, and phenomena. We review and categorize these literatures into four main areas: (1) interaction - input, (2) interaction - output, (3) passive sensing - in body feature, (4) passive sensing - out body activity. This comprehensive review highlights the current advancements within the field of smart ring and identifies potential areas for future research.
中文: 智能戒指是一种利用传感器和计算技术实现交互与健康监测的可穿戴设备,通过对206篇文献的系统综述,将其应用划分为交互和被动感知两大领域。
English: Smart rings are wearable devices that utilize sensors and computing for interaction and health monitoring, with a systematic review of 206 publications categorizing their applications into interaction and passive sensing areas.

Authors:Nan Gao, Yibin Liu, Xin Tang, Yanyan Liu, Chun Yu, Yun Huang, Yuntao Wang, Flora D. Salim, Xuhai Orson Xu, Jun Wei, Yuanchun Shi
Title: The Homework Wars: Exploring Emotions, Behaviours, and Conflicts in Parent-Child Homework Interactions
Abstract:
Parental involvement in homework is a crucial aspect of family education, but it often triggers emotional strain and conflicts. Despite growing concern over its impact on family well-being, prior research has lacked access to fine-grained, real-time dynamics of these interactions. To bridge this gap, we present a framework that leverages naturalistic parent-child interaction data and large language models (LLMs) to analyse homework conversations at scale. In a four-week in situ study with 78 Chinese families, we collected 475 hours of audio recordings and accompanying daily surveys, capturing 602 homework sessions in everyday home settings. Our LLM-based pipeline reliably extracted and categorised parental behaviours and conflict patterns from transcribed conversations, achieving high agreement with expert annotations. The analysis revealed significant emotional shifts in parents before and after homework, 18 recurring parental behaviours and seven common conflict types, with Knowledge Conflict being the most frequent. Notably, even well-intentioned behaviours were significantly positively correlated with specific conflicts. This work advances ubiquitous computing methods for studying complex family dynamics and offers empirical insights to enrich family education theory and inform more effective parenting strategies and interventions in the future.
中文: 本研究利用自然亲子互动数据和大语言模型分析家庭作业对话,揭示了78个中国家庭中显著的情绪变化、重复行为及冲突模式,为家庭教育理论和有效育儿策略提供了实证依据。
English: This study introduces a framework using naturalistic parent-child interaction data and large language models to analyze homework conversations, revealing significant emotional shifts, recurring behaviors, and conflict patterns in 78 Chinese families, with implications for family education theory and parenting strategies.

Authors:Pegah Salehi, Sajad Amouei Sheshkal, Vajira Thambawita, Michael A. Riegler, PÃ¥l Halvorsen
Title: Multimodal Integration Challenges in Emotionally Expressive Child Avatars for Training Applications
Abstract:
Dynamic facial emotion is essential for believable AI-generated avatars, yet most systems remain visually static, limiting their use in simulations like virtual training for investigative interviews with abused children. We present a real-time architecture combining Unreal Engine 5 MetaHuman rendering with NVIDIA Omniverse Audio2Face to generate facial expressions from vocal prosody in photorealistic child avatars. Due to limited TTS options, both avatars were voiced using young adult female models from two systems to better fit character profiles, introducing a voice-age mismatch. This confound may affect audiovisual alignment. We used a two-PC setup to decouple speech generation from GPU-intensive rendering, enabling low-latency interaction in desktop and VR. A between-subjects study (N=70) compared audio+visual vs. visual-only conditions as participants rated emotional clarity, facial realism, and empathy for avatars expressing joy, sadness, and anger. While emotions were generally recognized - especially sadness and joy - anger was harder to detect without audio, highlighting the role of voice in high-arousal expressions. Interestingly, silencing clips improved perceived realism by removing mismatches between voice and animation, especially when tone or age felt incongruent. These results emphasize the importance of audiovisual congruence: mismatched voice undermines expression, while a good match can enhance weaker visuals - posing challenges for emotionally coherent avatars in sensitive contexts.
中文摘要:本研究开发了一种实时系统,通过语音韵律驱动生成具有动态面部表情的逼真儿童虚拟形象,发现尽管悲伤和喜悦等情绪可被识别,但声音与视觉的不匹配会削弱愤怒情绪的感知并影响整体情感一致性,强调了在敏感应用场景中视听协调的至关重要性。
English Summary: This study introduces a real-time system for generating photorealistic child avatars with dynamic facial expressions driven by vocal prosody, revealing that while emotions like sadness and joy were recognizable, voice-visual mismatches impaired anger perception and overall emotional coherence, highlighting the critical need for audiovisual alignment in sensitive applications.

Authors:Chen Wang, Fei Xia, Wenhao Yu, Tingnan Zhang, Ruohan Zhang, C. Karen Liu, Li Fei-Fei, Jie Tan, Jacky Liang
Title: Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models
Abstract:
Learning to perform manipulation tasks from human videos is a promising approach for teaching robots. However, many manipulation tasks require changing control parameters during task execution, such as force, which visual data alone cannot capture. In this work, we leverage sensing devices such as armbands that measure human muscle activities and microphones that record sound, to capture the details in the human manipulation process, and enable robots to extract task plans and control parameters to perform the same task. To achieve this, we introduce Chain-of-Modality (CoM), a prompting strategy that enables Vision Language Models to reason about multimodal human demonstration data -- videos coupled with muscle or audio signals. By progressively integrating information from each modality, CoM refines a task plan and generates detailed control parameters, enabling robots to perform manipulation tasks based on a single multimodal human video prompt. Our experiments show that CoM delivers a threefold improvement in accuracy for extracting task plans and control parameters compared to baselines, with strong generalization to new task setups and objects in real-world robot experiments. Videos and code are available at https://chain-of-modality.github.io
中文摘要:本研究提出Chain-of-Modality(CoM)提示策略,通过融合人类演示中的视频与肌肉/音频信号,使机器人能够从多模态数据中提取任务规划和控制参数,在实验中相比基线方法实现了三倍的精度提升。
English Summary: This study introduces Chain-of-Modality (CoM), a prompting strategy that enables robots to learn manipulation tasks from multimodal human demonstrations—combining video with muscle or audio signals—to extract detailed task plans and control parameters, achieving a threefold accuracy improvement over baseline methods.

Authors:Tahsin Alamgir Kheya, Mohamed Reda Bouadjenek, Sunil Aryal
Title: Unmasking Gender Bias in Recommendation Systems and Enhancing Category-Aware Fairness
Abstract:
Recommendation systems are now an integral part of our daily lives. We rely on them for tasks such as discovering new movies, finding friends on social media, and connecting job seekers with relevant opportunities. Given their vital role, we must ensure these recommendations are free from societal stereotypes. Therefore, evaluating and addressing such biases in recommendation systems is crucial. Previous work evaluating the fairness of recommended items fails to capture certain nuances as they mainly focus on comparing performance metrics for different sensitive groups. In this paper, we introduce a set of comprehensive metrics for quantifying gender bias in recommendations. Specifically, we show the importance of evaluating fairness on a more granular level, which can be achieved using our metrics to capture gender bias using categories of recommended items like genres for movies. Furthermore, we show that employing a category-aware fairness metric as a regularization term along with the main recommendation loss during training can help effectively minimize bias in the models' output. We experiment on three real-world datasets, using five baseline models alongside two popular fairness-aware models, to show the effectiveness of our metrics in evaluating gender bias. Our metrics help provide an enhanced insight into bias in recommended items compared to previous metrics. Additionally, our results demonstrate how incorporating our regularization term significantly improves the fairness in recommendations for different categories without substantial degradation in overall recommendation performance.
中文摘要:本文提出了一套通过分析项目类别来评估推荐系统中性别偏见的综合指标,并证明在训练过程中将这些指标作为正则化项可有效减少偏见,同时保持推荐性能。
English Summary: This paper introduces comprehensive metrics for evaluating gender bias in recommendation systems by analyzing item categories, and demonstrates that incorporating these metrics as regularization during training effectively reduces bias while maintaining performance.

Authors:Hao Liang, Zhipeng Dong, Kaixin Chen, Jiyuan Guo, Yufeng Yue, Yi Yang, Mengyin Fu
Title: ChatStitch: Visualizing Through Structures via Surround-View Unsupervised Deep Image Stitching with Collaborative LLM-Agents
Abstract:
Surround-view perception has garnered significant attention for its ability to enhance the perception capabilities of autonomous driving vehicles through the exchange of information with surrounding cameras. However, existing surround-view perception systems are limited by inefficiencies in unidirectional interaction pattern with human and distortions in overlapping regions exponentially propagating into non-overlapping areas. To address these challenges, this paper introduces ChatStitch, a surround-view human-machine co-perception system capable of unveiling obscured blind spot information through natural language commands integrated with external digital assets. To dismantle the unidirectional interaction bottleneck, ChatStitch implements a cognitively grounded closed-loop interaction multi-agent framework based on Large Language Models. To suppress distortion propagation across overlapping boundaries, ChatStitch proposes SV-UDIS, a surround-view unsupervised deep image stitching method under the non-global-overlapping condition. We conducted extensive experiments on the UDIS-D, MCOV-SLAM open datasets, and our real-world dataset. Specifically, our SV-UDIS method achieves state-of-the-art performance on the UDIS-D dataset for 3, 4, and 5 image stitching tasks, with PSNR improvements of 9\%, 17\%, and 21\%, and SSIM improvements of 8\%, 18\%, and 26\%, respectively.
Chinese: 本文提出ChatStitch环视人机协同感知系统,通过基于大语言模型的闭环交互框架打破单向交互瓶颈,并创新性地提出SV-UDIS非全局重叠条件下的无监督深度图像拼接方法,在多个数据集上实现了最先进的性能表现。
English: This paper presents ChatStitch, a surround-view human-machine co-perception system that overcomes limitations in existing systems by implementing a closed-loop interaction framework using Large Language Models and introducing SV-UDIS, a novel unsupervised deep image stitching method that achieves state-of-the-art performance on multiple datasets.

Authors:Gijs Luijten, Roberto Maria Scardigno, Lisle Faray de Paiva, Peter Hoyer, Jens Kleesiek, Domenico Buongiorno, Vitoantonio Bevilacqua, Jan Egger
Title: Deep Learning-Based Semantic Segmentation for Real-Time Kidney Imaging and Measurements with Augmented Reality-Assisted Ultrasound
Abstract:
Ultrasound (US) is widely accessible and radiation-free but has a steep learning curve due to its dynamic nature and non-standard imaging planes. Additionally, the constant need to shift focus between the US screen and the patient poses a challenge. To address these issues, we integrate deep learning (DL)-based semantic segmentation for real-time (RT) automated kidney volumetric measurements, which are essential for clinical assessment but are traditionally time-consuming and prone to fatigue. This automation allows clinicians to concentrate on image interpretation rather than manual measurements. Complementing DL, augmented reality (AR) enhances the usability of US by projecting the display directly into the clinician's field of view, improving ergonomics and reducing the cognitive load associated with screen-to-patient transitions. Two AR-DL-assisted US pipelines on HoloLens-2 are proposed: one streams directly via the application programming interface for a wireless setup, while the other supports any US device with video output for broader accessibility. We evaluate RT feasibility and accuracy using the Open Kidney Dataset and open-source segmentation models (nnU-Net, Segmenter, YOLO with MedSAM and LiteMedSAM). Our open-source GitHub pipeline includes model implementations, measurement algorithms, and a Wi-Fi-based streaming solution, enhancing US training and diagnostics, especially in point-of-care settings.
Chinese: 本研究结合深度学习实现实时自动肾脏体积测量,并利用增强现实技术将超声显示投射到医生视野中,通过两种AR-DL辅助方案解决传统超声学习曲线陡峭和认知负荷高的问题,已验证其可行性与准确性。
English: This study integrates deep learning for real-time automated kidney volume measurements and augmented reality to project ultrasound displays into the clinician's view, addressing the steep learning curve and cognitive load of traditional ultrasound through two proposed AR-DL-assisted pipelines evaluated for feasibility and accuracy.

Authors:Yuanfang Ren, Esra Adiyeke, Ziyuan Guan, Zhenhong Hu, Mackenzie J Meni, Benjamin Shickel, Parisa Rashidi, Tezcan Ozrazgat-Baslanti, Azra Bihorac
Title: Validation of the MySurgeryRisk Algorithm for Predicting Complications and Death after Major Surgery: A Retrospective Multicenter Study Using OneFlorida Data Trust
Abstract:
Despite advances in surgical techniques and care, postoperative complications are prevalent and effects up to 15% of the patients who underwent a major surgery. The objective of this study is to develop and validate models for predicting postoperative complications and death after major surgery on a large and multicenter dataset, following the previously validated MySurgeryRisk algorithm. This retrospective, longitudinal and multicenter cohort analysis included 508,097 encounters from 366,875 adult inpatients who underwent major surgeries and were admitted to healthcare institutions within the OneFlorida+ network between 01/01/2012 and 04/29/2023. We applied the validated feature selection and transformation approach in MySurgeryRisk models and redeveloped eXtreme Gradient Boosting (XGBoost) models for predicting risk of postoperative acute kidney injury (AKI), need for intensive care unit (ICU) admission, need for mechanical ventilation (MV) therapy and in-hospital mortality on a development set and evaluated the model performance on a validation set. Area under the receiver operating characteristics curve values were obtained for need for ICU admission, 0.93 (95% Confidence Interval [CI], 0.93-0.93); need for MV, 0.94 (95% CI, 0.94-0.94); AKI, 0.92 (95% CI, 0.92-0.92); and in-hospital mortality, 0.95 (95% CI, 0.94-0.95). Area under the precision-recall curve values were computed for need for ICU admission, 0.62 (95% CI, 0.62-0.63); need for MV, 0.51 (95% CI, 0.49-0.52); AKI, 0.53 (95% CI, 0.53-0.54); and in-hospital mortality, 0.26 (95% CI, 0.24-0.29). The performance of these models is comparable to that of the previously validated MySurgeryRisk models, suggesting the enhanced generalizability of the models. Primary procedure code and provider specialty consistently appeared as the top influential variables, providing valuable insights into the factors influencing surgical outcomes.
中文: 本研究基于大型多中心数据集开发并验证了预测术后并发症的XGBoost模型,其性能与现有算法相当,其中主要手术代码和医生专业被确认为关键预测因素。
English: This study developed and validated XGBoost models using a large multicenter dataset to predict postoperative complications, demonstrating high performance comparable to existing algorithms with primary procedure codes and provider specialties as key predictors.

Authors:Ryota Okumura, Tadahiro Taniguchi, Akira Taniguchi, Yoshinobu Hagiwara
Title: Co-Creative Learning via Metropolis-Hastings Interaction between Humans and AI
Abstract:
We propose co-creative learning as a novel paradigm where humans and AI, i.e., biological and artificial agents, mutually integrate their partial perceptual information and knowledge to construct shared external representations, a process we interpret as symbol emergence. Unlike traditional AI teaching based on unilateral knowledge transfer, this addresses the challenge of integrating information from inherently different modalities. We empirically test this framework using a human-AI interaction model based on the Metropolis-Hastings naming game (MHNG), a decentralized Bayesian inference mechanism. In an online experiment, 69 participants played a joint attention naming game (JA-NG) with one of three computer agent types (MH-based, always-accept, or always-reject) under partial observability. Results show that human-AI pairs with an MH-based agent significantly improved categorization accuracy through interaction and achieved stronger convergence toward a shared sign system. Furthermore, human acceptance behavior aligned closely with the MH-derived acceptance probability. These findings provide the first empirical evidence for co-creative learning emerging in human-AI dyads via MHNG-based interaction. This suggests a promising path toward symbiotic AI systems that learn with humans, rather than from them, by dynamically aligning perceptual experiences, opening a new venue for symbiotic AI alignment.
中文: 协同创造式学习通过人类与AI在部分可观测环境下基于Metropolis-Hastings命名游戏的交互,实现了双方感知经验的动态对齐,首次实证验证了人机通过双向符号涌现而非单向知识传授达成协同进化的新模式。
English: Co-creative learning enables humans and AI to jointly construct shared symbols by integrating their partial perceptions through decentralized Bayesian interaction, achieving improved categorization and sign convergence without unilateral knowledge transfer.

Authors:Gijs Luijten, Lisle Faray de Paiva, Sebastian Krueger, Alexander Brost, Laura Mazilescu, Ana Sofia Ferreira Santos, Peter Hoyer, Jens Kleesiek, Sophia Marie-Therese Schmitz, Ulf Peter Neumann, Jan Egger
Title: From Screen to Space: Evaluating Siemens' Cinematic Reality
Abstract:
As one of the first research teams with full access to Siemens' Cinematic Reality, we evaluate its usability and clinical potential for cinematic volume rendering on the Apple Vision Pro. We visualized venous-phase liver computed tomography and magnetic resonance cholangiopancreatography scans from the CHAOS and MRCP\_DLRecon datasets. Fourteen medical experts assessed usability and anticipated clinical integration potential using the System Usability Scale, ISONORM 9242-110-S questionnaire, and an open-ended survey. Their feedback identified feasibility, key usability strengths, and required features to catalyze the adaptation in real-world clinical workflows. The findings provide insights into the potential of immersive cinematic rendering in medical imaging.
中文摘要:我们团队评估了西门子电影级现实技术在苹果Vision Pro上的医学影像应用,专家反馈表明其具备临床可行性、显著可用性优势及特定功能需求,以推动实际工作流程整合。
English Summary: Our team evaluated Siemens' Cinematic Reality on Apple Vision Pro for medical imaging, finding it feasible with strong usability and specific feature needs for clinical integration based on expert feedback.

Authors:Lisle Faray de Paiva, Gijs Luijten, Ana Sofia Ferreira Santos, Moon Kim, Behrus Puladi, Jens Kleesiek, Jan Egger
Title: Beyond the Desktop: XR-Driven Segmentation with Meta Quest 3 and MX Ink
Abstract:
Medical imaging segmentation is essential in clinical settings for diagnosing diseases, planning surgeries, and other procedures. However, manual annotation is a cumbersome and effortful task. To mitigate these aspects, this study implements and evaluates the usability and clinical applicability of an extended reality (XR)-based segmentation tool for anatomical CT scans, using the Meta Quest 3 headset and Logitech MX Ink stylus. We develop an immersive interface enabling real-time interaction with 2D and 3D medical imaging data in a customizable workspace designed to mitigate workflow fragmentation and cognitive demands inherent to conventional manual segmentation tools. The platform combines stylus-driven annotation, mirroring traditional pen-on-paper workflows, with instant 3D volumetric rendering. A user study with a public craniofacial CT dataset demonstrated the tool's foundational viability, achieving a System Usability Scale (SUS) score of 66, within the expected range for medical applications. Participants highlighted the system's intuitive controls (scoring 4.1/5 for self-descriptiveness on ISONORM metrics) and spatial interaction design, with qualitative feedback highlighting strengths in hybrid 2D/3D navigation and realistic stylus ergonomics. While users identified opportunities to enhance task-specific precision and error management, the platform's core workflow enabled dynamic slice adjustment, reducing cognitive load compared to desktop tools. Results position the XR-stylus paradigm as a promising foundation for immersive segmentation tools, with iterative refinements targeting haptic feedback calibration and workflow personalization to advance adoption in preoperative planning.
中文: 本研究开发了一款基于Meta Quest 3和罗技手写笔的扩展现实医疗影像分割工具,用户测试表明其空间交互直观且能降低认知负荷,但在精度控制和错误处理方面仍需完善。
English: This study develops an extended reality tool using Meta Quest 3 and Logitech stylus for medical CT scan segmentation, demonstrating through user testing its intuitive spatial interaction and reduced cognitive load despite needing improvements in precision and error handling.

Authors:Andrea E Davidson, Jessica M Ray, Yulia Levites Strekalova, Parisa Rashidi, Azra Bihorac
Title: Human-Centered Development of an Explainable AI Framework for Real-Time Surgical Risk Surveillance
Abstract:
Background: Artificial Intelligence (AI) clinical decision support (CDS) systems have the potential to augment surgical risk assessments, but successful adoption depends on an understanding of end-user needs and current workflows. This study reports the initial co-design of MySurgeryRisk, an AI CDS tool to predict the risk of nine post-operative complications in surgical patients. Methods: Semi-structured focus groups and interviews were held as co-design sessions with perioperative physicians at a tertiary academic hospital in the Southeastern United States. Participants were read a surgical vignette and asked questions to elicit an understanding of their current decision-making practices before being introduced to the MySurgeryRisk prototype web interface. They were asked to provide feedback on the user interface and system features. Session transcripts were qualitatively coded, after which thematic analysis took place. Results: Data saturation was reached after 20 surgeons and anesthesiologists from varying career stages participated across 11 co-design sessions. Thematic analysis resulted in five themes: (1) decision-making cognitive processes, (2) current approach to decision-making, (3) future approach to decision-making with MySurgeryRisk, (4) feedback on current MySurgeryRisk prototype, and (5) trustworthy considerations. Conclusion: Clinical providers perceived MySurgeryRisk as a promising CDS tool that factors in a large volume of data and is computed in real-time without any need for manual input. Participants provided feedback on the design of the interface and imaged applications of the tool in the clinical workflow. However, its successful implementation will depend on its actionability and explainability of model outputs, integration into current electronic systems, and calibration of trust among end-users.
中文: MySurgeryRisk作为一种人工智能临床决策支持工具,通过与围手术期医生共同设计以预测术后并发症,被认为因其实时数据处理能力而具有前景,但其成功实施依赖于输出的可操作性、可解释性以及与现有系统的无缝集成。
English: MySurgeryRisk, an AI clinical decision support tool, was co-designed with perioperative physicians to predict postoperative complications and was perceived as promising for its real-time data processing, though its implementation depends on actionability, explainability, and seamless integration into existing systems.

Authors:Andrea E. Davidson, Jessica M. Ray, Ayush K. Patel, Yulia Strekalova Levites, Parisa Rashidi, Azra Bihorac
Title: An Iterative, User-Centered Design of a Clinical Decision Support System for Critical Care Assessments: Co-Design Sessions with ICU Clinical Providers
Abstract:
This study reports the findings of qualitative interview sessions conducted with ICU clinicians for the co-design of a system user interface of an artificial intelligence (AI)-driven clinical decision support (CDS) system. This system integrates medical record data with wearable sensor, video, and environmental data into a real-time dynamic model that quantifies patients' risk of clinical decompensation and risk of developing delirium, providing actionable alerts to augment clinical decision-making in the ICU setting. Co-design sessions were conducted as semi-structured focus groups and interviews with ICU clinicians, including physicians, mid-level practitioners, and nurses. Study participants were asked about their perceptions on AI-CDS systems, their system preferences, and were asked to provide feedback on the current user interface prototype. Session transcripts were qualitatively analyzed to identify key themes related to system utility, interface design features, alert preferences, and implementation considerations. Ten clinicians participated in eight sessions. The analysis identified five themes: (1) AI's computational utility, (2) workflow optimization, (3) effects on patient care, (4) technical considerations, and (5) implementation considerations. Clinicians valued the CDS system's multi-modal continuous monitoring and AI's capacity to process large volumes of data in real-time to identify patient risk factors and suggest action items. Participants underscored the system's unique value in detecting delirium and promoting non-pharmacological delirium prevention measures. The actionability and intuitive interpretation of the presented information was emphasized. ICU clinicians recognize the potential of an AI-driven CDS system for ICU delirium and acuity to improve patient outcomes and clinical workflows.
中文: 本研究通过ICU临床医生共同参与设计,开发了一种整合多模态数据的AI临床决策支持系统,用于实时评估患者病情恶化及谵妄风险,参与者强调该系统通过可操作警报和工作流程优化显著提升了护理质量。
English: This study engaged ICU clinicians in co-designing an AI-driven clinical decision support system that integrates multi-modal data for real-time risk assessment of patient decompensation and delirium, with participants highlighting its utility in enhancing care through actionable alerts and workflow optimization.

Authors:Xuebo Ji, Zherong Pan, Xifeng Gao, Lei Yang, Xinxin Du, Kaiyun Li, Yongjin Liu, Wenping Wang, Changhe Tu, Jia Pan
Title: Internal State Estimation in Groups via Active Information Gathering
Abstract:
Accurately estimating human internal states, such as personality traits or behavioral patterns, is critical for enhancing the effectiveness of human-robot interaction, particularly in group settings. These insights are key in applications ranging from social navigation to autism diagnosis. However, prior methods are limited by scalability and passive observation, making real-time estimation in complex, multi-human settings difficult. In this work, we propose a practical method for active human personality estimation in groups, with a focus on applications related to Autism Spectrum Disorder (ASD). Our method combines a personality-conditioned behavior model, based on the Eysenck 3-Factor theory, with an active robot information gathering policy that triggers human behaviors through a receding-horizon planner. The robot's belief about human personality is then updated via Bayesian inference. We demonstrate the effectiveness of our approach through simulations, user studies with typical adults, and preliminary experiments involving participants with ASD. Our results show that our method can scale to tens of humans and reduce personality prediction error by 29.2% and uncertainty by 79.9% in simulation. User studies with typical adults confirm the method's ability to generalize across complex personality distributions. Additionally, we explore its application in autism-related scenarios, demonstrating that the method can identify the difference between neurotypical and autistic behavior, highlighting its potential for diagnosing ASD. The results suggest that our framework could serve as a foundation for future ASD-specific interventions.
中文: 本研究提出了一种主动式机器人方法,用于群体环境中的实时人格特质估计,该方法结合人格条件行为模型与贝叶斯推理,显著降低了预测误差和不确定性,并在自闭症谱系障碍诊断中展现出应用潜力。
English: This study introduces an active robot-based method for real-time personality estimation in group settings, leveraging a personality-conditioned behavior model and Bayesian inference to significantly reduce prediction errors and uncertainty, with demonstrated applications in Autism Spectrum Disorder diagnosis and intervention potential.

Authors:Luis Moreno, Miguel Altamirano Cabrera, Muhammad Haris Khan, Issatay Tokmurziyev, Yara Mahmoud, Valerii Serpiva, Dzmitry Tsetserukou
Title: FlyHaptics: Flying Multi-contact Haptic Interface
Abstract:
This work presents FlyHaptics, an aerial haptic interface tracked via a Vicon optical motion capture system and built around six five-bar linkage assemblies enclosed in a lightweight protective cage. We predefined five static tactile patterns - each characterized by distinct combinations of linkage contact points and vibration intensities - and evaluated them in a grounded pilot study, where participants achieved 86.5 recognition accuracy (F(4, 35) = 1.47, p = 0.23) with no significant differences between patterns. Complementary flight demonstrations confirmed stable hover performance and consistent force output under realistic operating conditions. These pilot results validate the feasibility of drone-mounted, multi-contact haptic feedback and lay the groundwork for future integration into fully immersive VR, teleoperation, and remote interaction scenarios.
中文: FlyHaptics是一种基于无人机的触觉接口,通过六组五连杆机构传递触觉模式,在测试中获得86.5%的识别准确率,其稳定表现为未来虚拟现实和遥操作应用奠定了基础。
English: FlyHaptics is a drone-based haptic interface using six five-bar linkages to deliver tactile patterns, achieving 86.5% recognition accuracy in tests and demonstrating stable performance for future VR and teleoperation applications.

Authors:Muhammad Haris Khan, Miguel Altamirano Cabrera, Dmitrii Iarchuk, Yara Mahmoud, Daria Trinitatova, Issatay Tokmurziyev, Dzmitry Tsetserukou
Title: HapticVLM: VLM-Driven Texture Recognition Aimed at Intelligent Haptic Interaction
Abstract:
This paper introduces HapticVLM, a novel multimodal system that integrates vision-language reasoning with deep convolutional networks to enable real-time haptic feedback. HapticVLM leverages a ConvNeXt-based material recognition module to generate robust visual embeddings for accurate identification of object materials, while a state-of-the-art Vision-Language Model (Qwen2-VL-2B-Instruct) infers ambient temperature from environmental cues. The system synthesizes tactile sensations by delivering vibrotactile feedback through speakers and thermal cues via a Peltier module, thereby bridging the gap between visual perception and tactile experience. Experimental evaluations demonstrate an average recognition accuracy of 84.67% across five distinct auditory-tactile patterns and a temperature estimation accuracy of 86.7% based on a tolerance-based evaluation method with an 8°C margin of error across 15 scenarios. Although promising, the current study is limited by the use of a small set of prominent patterns and a modest participant pool. Future work will focus on expanding the range of tactile patterns and increasing user studies to further refine and validate the system's performance. Overall, HapticVLM presents a significant step toward context-aware, multimodal haptic interaction with potential applications in virtual reality, and assistive technologies.
中文: HapticVLM是一种结合视觉语言推理与深度学习的新型多模态系统,能通过实时触觉反馈准确识别物体材料和推断环境温度,在虚拟现实和辅助技术领域具有应用潜力。
English: HapticVLM is a multimodal system that combines vision-language reasoning and deep learning to provide real-time haptic feedback, achieving high accuracy in material recognition and temperature estimation for applications in virtual reality and assistive technologies.

Authors:Ali Alfageeh, Sadegh AlMahdi Kazemi Zarkouei, Daye Nam, Daniel Prol, Matin Amoozadeh, Souti Chattopadhyay, James Prather, Paul Denny, Juho Leinonen, Michael Hilton, Sruti Srinivasa Ragavan, Mohammad Amin Alipour
Title: From Prompts to Propositions: A Logic-Based Lens on Student-LLM Interactions
Abstract:
Background and Context. The increasing integration of large language models (LLMs) in computing education presents an emerging challenge in understanding how students use LLMs and craft prompts to solve computational tasks. Prior research has used both qualitative and quantitative methods to analyze prompting behavior, but these approaches lack scalability or fail to effectively capture the semantic evolution of prompts. Objective. In this paper, we investigate whether students prompts can be systematically analyzed using propositional logic constraints. We examine whether this approach can identify patterns in prompt evolution, detect struggling students, and provide insights into effective and ineffective strategies. Method. We introduce Prompt2Constraints, a novel method that translates students prompts into logical constraints. The constraints are able to represent the intent of the prompts in succinct and quantifiable ways. We used this approach to analyze a dataset of 1,872 prompts from 203 students solving introductory programming tasks. Findings. We find that while successful and unsuccessful attempts tend to use a similar number of constraints overall, when students fail, they often modify their prompts more significantly, shifting problem-solving strategies midway. We also identify points where specific interventions could be most helpful to students for refining their prompts. Implications. This work offers a new and scalable way to detect students who struggle in solving natural language programming tasks. This work could be extended to investigate more complex tasks and integrated into programming tools to provide real-time support.
中文: 本研究提出Prompt2Constraints方法,通过将学生提示转换为逻辑约束来分析提示演化模式并识别学习困难者,为检测自然语言编程任务中的困难提供了可扩展的新途径。
English: This study introduces Prompt2Constraints, a method that translates student prompts into logical constraints to analyze patterns in prompt evolution and identify struggling students, offering a scalable approach for detecting difficulties in natural language programming tasks.

Authors:Paul Denny, Viraj Kumar, Stephen MacNeil, James Prather, Juho Leinonen
Title: Probing the Unknown: Exploring Student Interactions with Probeable Problems at Scale in Introductory Programming
Abstract:
Introductory programming courses often rely on small code-writing exercises that have clearly specified problem statements. This limits opportunities for students to practice how to clarify ambiguous requirements -- a critical skill in real-world programming. In addition, the emerging capabilities of large language models (LLMs) to produce code from well-defined specifications may harm student engagement with traditional programming exercises. This study explores the use of ``Probeable Problems'', automatically gradable tasks that have deliberately vague or incomplete specifications. Such problems require students to submit test inputs, or `probes', to clarify requirements before implementation. Through analysis of over 40,000 probes in an introductory course, we identify patterns linking probing behaviors to task success. Systematic strategies, such as thoroughly exploring expected behavior before coding, resulted in fewer incorrect code submissions and correlated with course success. Feedback from nearly 1,000 participants highlighted the challenges and real-world relevance of these tasks, as well as benefits to critical thinking and metacognitive skills. Probeable Problems are easy to set up and deploy at scale, and help students recognize and resolve uncertainties in programming problems.
中文摘要:本研究提出“可探询问题”——一种具有模糊规格的可自动评分编程任务,要求学生通过提交测试探针来澄清需求,在入门课程中有效提升了批判性思维并减少了编码错误。
English Summary: This study introduces "Probeable Problems," gradable programming tasks with vague specifications that require students to submit test probes for clarification, which improved critical thinking and reduced coding errors in an introductory course.

Authors:Yukang Lin, Yan Hong, Zunnan Xu, Xindi Li, Chao Xu, Chuanbiao Song, Ronghui Li, Haoxing Chen, Jun Lan, Huijia Zhu, Weiqiang Wang, Jianfu Zhang, Xiu Li
Title: InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation
Abstract:
Recent video generation research has focused heavily on isolated actions, leaving interactive motions-such as hand-face interactions-largely unexamined. These interactions are essential for emerging biometric authentication systems, which rely on interactive motion-based anti-spoofing approaches. From a security perspective, there is a growing need for large-scale, high-quality interactive videos to train and strengthen authentication models. In this work, we introduce a novel paradigm for animating realistic hand-face interactions. Our approach simultaneously learns spatio-temporal contact dynamics and biomechanically plausible deformation effects, enabling natural interactions where hand movements induce anatomically accurate facial deformations while maintaining collision-free contact. To facilitate this research, we present InterHF, a large-scale hand-face interaction dataset featuring 18 interaction patterns and 90,000 annotated videos. Additionally, we propose InterAnimate, a region-aware diffusion model designed specifically for interaction animation. InterAnimate leverages learnable spatial and temporal latents to effectively capture dynamic interaction priors and integrates a region-aware interaction mechanism that injects these priors into the denoising process. To the best of our knowledge, this work represents the first large-scale effort to systematically study human hand-face interactions. Qualitative and quantitative results show InterAnimate produces highly realistic animations, setting a new benchmark. Code and data will be made public to advance research.
中文: 本研究提出了一种新颖的手脸交互动画范式,通过开发区域感知扩散模型和大规模数据集来弥补交互动作研究的空白,以增强生物特征认证系统的安全性和准确性。
English: This research introduces a novel paradigm for animating realistic hand-face interactions, addressing the gap in interactive motion studies by developing a region-aware diffusion model and a large-scale dataset to enhance biometric authentication systems.

Authors:Issatay Tokmurziyev, Miguel Altamirano Cabrera, Muhammad Haris Khan, Yara Mahmoud, Luis Moreno, Dzmitry Tsetserukou
Title: LLM-Glasses: GenAI-driven Glasses with Haptic Feedback for Navigation of Visually Impaired People
Abstract:
We present LLM-Glasses, a wearable navigation system designed to assist visually impaired individuals by combining haptic feedback, YOLO-World object detection, and GPT-4o-driven reasoning. The system delivers real-time tactile guidance via temple-mounted actuators, enabling intuitive and independent navigation. Three user studies were conducted to evaluate its effectiveness: (1) a haptic pattern recognition study achieving an 81.3% average recognition rate across 13 distinct patterns, (2) a VICON-based navigation study in which participants successfully followed predefined paths in open spaces, and (3) an LLM-guided video evaluation demonstrating 91.8% accuracy in open scenarios, 84.6% with static obstacles, and 81.5% with dynamic obstacles. These results demonstrate the system's reliability in controlled environments, with ongoing work focusing on refining its responsiveness and adaptability to diverse real-world scenarios. LLM-Glasses showcases the potential of combining generative AI with haptic interfaces to empower visually impaired individuals with intuitive and effective mobility solutions.
中文: LLM-Glasses是一款结合触觉反馈、YOLO-World物体检测和GPT-4o推理的可穿戴导航系统,通过镜腿上的执行器为视障人士提供实时触觉引导,用户研究验证了其在受控环境中的可靠性,并正针对现实场景优化响应能力。
English: LLM-Glasses is a wearable navigation system for the visually impaired that integrates haptic feedback, YOLO-World object detection, and GPT-4o reasoning to provide real-time tactile guidance, with user studies confirming its reliability in controlled environments and ongoing improvements for real-world adaptability.

Authors:Abigail Copiaco, Christian Ritz, Yassine Himeur, Valsamma Eapen, Ammar Albanna, Wathiq Mansoor
Title: Exploring Image Transforms derived from Eye Gaze Variables for Progressive Autism Diagnosis
Abstract:
The prevalence of Autism Spectrum Disorder (ASD) has surged rapidly over the past decade, posing significant challenges in communication, behavior, and focus for affected individuals. Current diagnostic techniques, though effective, are time-intensive, leading to high social and economic costs. This work introduces an AI-powered assistive technology designed to streamline ASD diagnosis and management, enhancing convenience for individuals with ASD and efficiency for caregivers and therapists. The system integrates transfer learning with image transforms derived from eye gaze variables to diagnose ASD. This facilitates and opens opportunities for in-home periodical diagnosis, reducing stress for individuals and caregivers, while also preserving user privacy through the use of image transforms. The accessibility of the proposed method also offers opportunities for improved communication between guardians and therapists, ensuring regular updates on progress and evolving support needs. Overall, the approach proposed in this work ensures timely, accessible diagnosis while protecting the subjects' privacy, improving outcomes for individuals with ASD.
中文: 本研究提出一种基于眼动数据图像转换和迁移学习的AI辅助系统,能够实现高效的家庭自闭症谱系障碍诊断与管理,在保障隐私的同时提升诊断可及性,并改善护理人员与治疗师之间的沟通。
English: This study presents an AI-assisted system using transfer learning and image transforms from eye gaze data to enable efficient, in-home ASD diagnosis and management, enhancing accessibility while safeguarding privacy and improving communication between caregivers and therapists.

Authors:Youjun Chen, Xurong Xie, Haoning Xu, Mengzhe Geng, Guinan Li, Chengxi Deng, Huimeng Wang, Shujie Hu, Xunying Liu
Title: Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition
Abstract:
This paper presents a novel end-to-end LLM-empowered explainable speech emotion recognition (SER) approach. Fine-grained speech emotion descriptor (SED) features, e.g., pitch, tone and emphasis, are disentangled from HuBERT SSL representations via alternating LLM fine-tuning to joint SER-SED prediction and ASR tasks. VAE compressed HuBERT features derived via Information Bottleneck (IB) are used to adjust feature granularity. Experiments on the IEMOCAP and MELD benchmarks demonstrate that our approach consistently outperforms comparable LLaMA-based SER baselines, including those using either (a) alternating multi-task fine-tuning alone or (b) feature disentanglement only. Statistically significant increase of SER unweighted accuracy by up to 4.0% and 3.7% absolute (5.4% and 6.6% relative) are obtained. More importantly, emotion descriptors offer further explainability for SER.
中文: 本文提出一种端到端的LLM赋能可解释语音情感识别方法,通过细粒度特征解耦与联合优化,在提升识别准确率的同时利用情感描述符增强模型可解释性。
English: This paper introduces an end-to-end LLM-based explainable speech emotion recognition method that enhances accuracy and interpretability by jointly optimizing emotion recognition and descriptor prediction through fine-grained feature disentanglement and compression.

Authors:Katrin Hänsel, Luca Maria Aiello, Daniele Quercia, Rossano Schifanella, Krisztian Zsolt Varga, Linus W. Dietz, Marios Constantinides
Title: The Experience of Running: Recommending Routes Using Sensory Mapping in Urban Environments
Abstract:
Depending on the route, runners may experience frustration, freedom, or fulfilment. However, finding routes that are conducive to the psychological experience of running remains an unresolved task in the literature. In a mixed-method study, we interviewed 7 runners to identify themes contributing to running experience, and quantitatively examined these themes in an online survey with 387 runners. Using Principal Component Analysis on the survey responses, we developed a short experience sampling questionnaire that captures the three most important dimensions of running experience: \emph{performance \& achievement}, \emph{environment}, and \emph{mind \& social connectedness}. Using path preferences obtained from the online survey, we clustered them into two types of routes: \emph{scenic} (associated with nature and greenery) and \emph{urban} (characterized by the presence of people); and developed a routing engine for path recommendations. We discuss challenges faced in developing the routing engine, and provide guidelines to integrate it into mobile and wearable running apps.
中文摘要:本研究通过混合方法确定了跑步体验的三大关键维度——表现与成就、环境和心理社交联系,并开发了一个路径推荐引擎,根据跑者偏好提供风景型或城市型路线建议。
English Summary: This study identifies three key dimensions of running experience—performance & achievement, environment, and mind & social connectedness—through a mixed-method approach and develops a routing engine that recommends scenic or urban paths based on runner preferences.

Authors:Dong Won Lee, Yubin Kim, Denison Guvenoz, Sooyeon Jeong, Parker Malachowsky, Louis-Philippe Morency, Cynthia Breazeal, Hae Won Park
Title: The Human Robot Social Interaction (HSRI) Dataset: Benchmarking Foundational Models' Social Reasoning
Abstract:
Our work aims to advance the social reasoning of embodied artificial intelligence (AI) agents in real-world social interactions. Recently, language models (LMs) and foundational models (FMs) are being utilized as automatic evaluators of human-AI interactions with the goal of eventually being used to improve the policy of the AI agent. To enable further research in this direction, we introduce a large-scale real-world Human Robot Social Interaction (HSRI) Dataset to benchmark the capabilities of LMs and FMs to identify and reason about social interactions, specifically with regard to robot social errors and competencies . Our dataset consists of 400 real-world human social robot interaction videos and over 10K annotations, detailing the robot's social errors, competencies, rationale, and corrective actions, capturing unique aspects of human-AI interaction only present in real-world interactions. To further assess AI models' ability to reason about social interactions, we propose eight new benchmark tasks for evaluating centered around whether AI models can (1) evaluate social interactions via detecting social errors and competencies, (2) identify the explanatory factors associated to errors and competencies, (3) understand the flow of real-world social interactions, and (4) provide reasons and corrective actions for social errors. Human studies and experiments with modern LMs and FMs reveal that current models struggle with these tasks, demonstrating that our dataset and benchmark provides a step forward towards socially intelligent AI.
中文摘要:本研究通过引入大规模人机社交互动数据集和八项基准任务,旨在提升人工智能在真实社交场景中的推理能力,实验表明当前语言模型虽具潜力,但在社交互动评估方面仍存在明显不足。
English Summary: This research introduces a large-scale Human-Robot Social Interaction dataset and eight benchmark tasks to advance AI's social reasoning capabilities, revealing that current language models still struggle with real-world social interaction evaluation despite their potential.

Authors:Gustavo Moreira, Edyta Paulina Bogucka, Marios Constantinides, Daniele Quercia
Title: The Hall of AI Fears and Hopes: Comparing the Views of AI Influencers and those of Members of the U.S. Public Through an Interactive Platform
Abstract:
AI development is shaped by academics and industry leaders - let us call them ``influencers'' - but it is unclear how their views align with those of the public. To address this gap, we developed an interactive platform that served as a data collection tool for exploring public views on AI, including their fears, hopes, and overall sense of hopefulness. We made the platform available to 330 participants representative of the U.S. population in terms of age, sex, ethnicity, and political leaning, and compared their views with those of 100 AI influencers identified by Time magazine. The public fears AI getting out of control, while influencers emphasize regulation, seemingly to deflect attention from their alleged focus on monetizing AI's potential. Interestingly, the views of AI influencers from underrepresented groups such as women and people of color often differ from the views of underrepresented groups in the public.
中文: 研究发现AI领域意见领袖与公众观点存在差异,公众担忧AI失控而意见领袖强调监管,且不同群体内部观点也存在显著分歧。
English: This study reveals a disconnect between AI influencers and the public, with the public fearing AI's uncontrollability while influencers focus on regulation, and highlights differing views within underrepresented groups.

Authors:Jocelyn Shen, Audrey Lee, Sharifa Alghowinem, River Adkins, Cynthia Breazeal, Hae Won Park
Title: Social Robots as Social Proxies for Fostering Connection and Empathy Towards Humanity
Abstract:
Despite living in an increasingly connected world, social isolation is a prevalent issue today. While social robots have been explored as tools to enhance social connection through companionship, their potential as asynchronous social platforms for fostering connection towards humanity has received less attention. In this work, we introduce the design of a social support companion that facilitates the exchange of emotionally relevant stories and scaffolds reflection to enhance feelings of connection via five design dimensions. We investigate how social robots can serve as "social proxies" facilitating human stories, passing stories from other human narrators to the user. To this end, we conduct a real-world deployment of 40 robot stations in users' homes over the course of two weeks. Through thematic analysis of user interviews, we find that social proxy robots can foster connection towards other people's experiences via mechanisms such as identifying connections across stories or offering diverse perspectives. We present design guidelines from our study insights on the use of social robot systems that serve as social platforms to enhance human empathy and connection.
Chinese: 本研究探讨了社交机器人作为异步平台分享情感故事的作用,发现通过充当促进故事交流和反思的社交代理,它们能够增强人类的同理心和连接感。
English: This study explores social robots as asynchronous platforms for sharing emotional stories, finding they can enhance human empathy and connection by serving as social proxies that facilitate story exchange and reflection.

Authors:Marios Constantinides, Daniele Quercia
Title: AI, Jobs, and the Automation Trap: Where Is HCI?
Abstract:
As artificial intelligence (AI) continues to reshape the workforce, its current trajectory raises pressing questions about its ultimate purpose. Why does job automation dominate the agenda, even at the expense of human agency and equity? This paper critiques the automation-centric paradigm, arguing that current reward structures, which largely focus on cost reduction, drive the overwhelming emphasis on task replacement in AI patents. Meanwhile, Human-Centered AI (HCAI), which envisions AI as a collaborator augmenting human capabilities and aligning with societal values, remains a fugitive from the mainstream narrative. Despite its promise, HCAI has gone ``missing'', with little evidence of its principles translating into patents or real-world impact. To increase impact, actionable interventions are needed to disrupt existing incentive structures within the HCI community. We call for a shift in priorities to support translational research, foster cross-disciplinary collaboration, and promote metrics that reward tangible and real-world impact.
中文摘要:本文批判了当前以自动化为中心的AI发展模式,指出现有激励机制过度强调岗位替代而非人本主义路径,呼吁通过结构性改革推动增强人类能力并符合社会价值的AI发展。
English Summary: This paper critiques the automation-focused trajectory of AI development, arguing that current incentives prioritize job replacement over human-centered approaches, and calls for structural changes to promote AI that augments human capabilities and aligns with societal values.

Authors:Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Haotian Ye, Siyu He, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, James Zou, Qingli Zhu, Yong Wang, Liwei Wang
Title: A Foundational Generative Model for Breast Ultrasound Image Analysis
Abstract:
Foundational models have emerged as powerful tools for addressing various tasks in clinical settings. However, their potential development to breast ultrasound analysis remains untapped. In this paper, we present BUSGen, the first foundational generative model specifically designed for breast ultrasound image analysis. Pretrained on over 3.5 million breast ultrasound images, BUSGen has acquired extensive knowledge of breast structures, pathological features, and clinical variations. With few-shot adaptation, BUSGen can generate repositories of realistic and informative task-specific data, facilitating the development of models for a wide range of downstream tasks. Extensive experiments highlight BUSGen's exceptional adaptability, significantly exceeding real-data-trained foundational models in breast cancer screening, diagnosis, and prognosis. In breast cancer early diagnosis, our approach outperformed all board-certified radiologists (n=9), achieving an average sensitivity improvement of 16.5% (P-value<0.0001). Additionally, we characterized the scaling effect of using generated data which was as effective as the collected real-world data for training diagnostic models. Moreover, extensive experiments demonstrated that our approach improved the generalization ability of downstream models. Importantly, BUSGen protected patient privacy by enabling fully de-identified data sharing, making progress forward in secure medical data utilization. An online demo of BUSGen is available at https://aibus.bio.
中文:BUSGen是首个专为乳腺超声分析设计的基础生成模型,通过预训练超过350万张图像生成特定任务数据,显著提升癌症筛查、诊断和预后能力,其表现优于放射科医生并有效保护患者隐私。
English: BUSGen is the first foundational generative model for breast ultrasound analysis, pretrained on over 3.5 million images to generate task-specific data that enhances cancer screening, diagnosis, and prognosis, outperforming radiologists and ensuring patient privacy.

Authors:Hyeon Jeon, Hyunwook Lee, Yun-Hsin Kuo, Taehyun Yang, Daniel Archambault, Sungahn Ko, Takanori Fujiwara, Kwan-Liu Ma, Jinwook Seo
Title: Navigating High-Dimensional Backstage: A Guide for Exploring Literature for the Reliable Use of Dimensionality Reduction
Abstract:
Visual analytics using dimensionality reduction (DR) can easily be unreliable for various reasons, e.g., inherent distortions in representing the original data. The literature has thus proposed a wide range of methodologies to make DR-based visual analytics reliable. However, the diversity and extensiveness of the literature can leave novice analysts and researchers uncertain about where to begin and proceed. To address this problem, we propose a guide for reading papers for reliable visual analytics with DR. Relying on the previous classification of the relevant literature, our guide helps both practitioners to (1) assess their current DR expertise and (2) identify papers that will further enhance their understanding. Interview studies with three experts in DR and data visualizations validate the significance, comprehensiveness, and usefulness of our guide.
中文: 该摘要提出了一份指南,帮助从业者评估其降维专业知识并确定相关文献以实现可靠的可视化分析,其有效性已通过专家访谈得到验证。
English: The abstract proposes a guide to help practitioners assess their dimensionality reduction expertise and identify relevant literature for reliable visual analytics, validated through expert interviews.

Authors:Hyeon Jeon, Jeongin Park, Sungbok Shin, Jinwook Seo
Title: Stop Misusing t-SNE and UMAP for Visual Analytics
Abstract:
Misuses of t-SNE and UMAP in visual analytics have become increasingly common. For example, although t-SNE and UMAP projections often do not faithfully reflect true distances between clusters, practitioners frequently use them to investigate inter-cluster relationships. In this paper, we bring this issue to the surface and comprehensively investigate why such misuse occurs and how to prevent it. We conduct a literature review of 114 papers to verify the prevalence of the misuse and analyze the reasonings behind it. We then execute an interview study to uncover practitioners' implicit motivations for using these techniques -- rationales often undisclosed in the literature. Our findings indicate that misuse of t-SNE and UMAP primarily stems from limited discourse on their appropriate use in visual analytics. We conclude by proposing future directions and concrete action items to promote more reasonable use of DR.
Chinese: t-SNE和UMAP在可视化分析中的误用普遍存在,主要源于使用者对降维技术理解不足,现有解决方案效果不佳,因此探讨通过自动化选择最佳投影来避免误导性分析。
English: The misuse of t-SNE and UMAP in visual analytics is widespread due to practitioners' limited understanding of dimensionality reduction, and existing solutions have proven ineffective, prompting a discussion on automating optimal projection selection to prevent misleading interpretations.

Authors:Seokweon Jung, Hyeon Jeon, Jeongmin Rhee, Jinwook Seo
Title: Can VLMs Assess Similarity Between Graph Visualizations?
Abstract:
Graph visualizations have been studied for tasks such as clustering and temporal analysis, but how these visual similarities relate to established graph similarity measures remains unclear. In this paper, we explore the potential of Vision Language Models (VLMs) to approximate human-like perception of graph similarity. We generate graph datasets of various sizes and densities and compare VLM-derived visual similarity scores with feature-based measures. Our findings indicate VLMs can assess graph similarity in a manner similar to feature-based measures, even though differences among the measures exist. In future work, we plan to extend our research by conducting experiments on human visual graph perception.
中文摘要:本研究探索了视觉语言模型在图形相似性评估中模拟人类视觉感知的能力,发现尽管存在差异,但模型得出的视觉相似度与基于特征的度量方法具有一致性。
English Summary: This study investigates how Vision Language Models (VLMs) can mimic human-like visual perception of graph similarity, finding that VLM-derived scores align with feature-based measures despite some variations.

Authors:Ziwei Wang, Weizhi Chen, Leyang Yang, Sheng Zhou, Shengchu Zhao, Hanbei Zhan, Jiongchao Jin, Liangcheng Li, Zirui Shao, Jiajun Bu
Title: MP-GUI: Modality Perception with MLLMs for GUI Understanding
Abstract:
Graphical user interface (GUI) has become integral to modern society, making it crucial to be understood for human-centric systems. However, unlike natural images or documents, GUIs comprise artificially designed graphical elements arranged to convey specific semantic meanings. Current multi-modal large language models (MLLMs) already proficient in processing graphical and textual components suffer from hurdles in GUI understanding due to the lack of explicit spatial structure modeling. Moreover, obtaining high-quality spatial structure data is challenging due to privacy issues and noisy environments. To address these challenges, we present MP-GUI, a specially designed MLLM for GUI understanding. MP-GUI features three precisely specialized perceivers to extract graphical, textual, and spatial modalities from the screen as GUI-tailored visual clues, with spatial structure refinement strategy and adaptively combined via a fusion gate to meet the specific preferences of different GUI understanding tasks. To cope with the scarcity of training data, we also introduce a pipeline for automatically data collecting. Extensive experiments demonstrate that MP-GUI achieves impressive results on various GUI understanding tasks with limited data.
中文摘要:MP-GUI作为一种专为图形界面理解设计的特殊多模态大语言模型,通过专门设计的感知器提取图形、文本和空间模态信息,并采用自适应融合机制,在有限数据条件下实现了优异的界面理解性能。
English Summary: MP-GUI is a specialized multi-modal language model designed to overcome challenges in GUI understanding by integrating graphical, textual, and spatial information through tailored perceivers and adaptive fusion, achieving strong performance with limited training data.

Authors:Hyeon Jeon, Hyunwook Lee, Yun-Hsin Kuo, Taehyun Yang, Daniel Archambault, Sungahn Ko, Takanori Fujiwara, Kwan-Liu Ma, Jinwook Seo
Title: Unveiling High-dimensional Backstage: A Survey for Reliable Visual Analytics with Dimensionality Reduction
Abstract:
Dimensionality reduction (DR) techniques are essential for visually analyzing high-dimensional data. However, visual analytics using DR often face unreliability, stemming from factors such as inherent distortions in DR projections. This unreliability can lead to analytic insights that misrepresent the underlying data, potentially resulting in misguided decisions. To tackle these reliability challenges, we review 133 papers that address the unreliability of visual analytics using DR. Through this review, we contribute (1) a workflow model that describes the interaction between analysts and machines in visual analytics using DR, and (2) a taxonomy that identifies where and why reliability issues arise within the workflow, along with existing solutions for addressing them. Our review reveals ongoing challenges in the field, whose significance and urgency are validated by five expert researchers. This review also finds that the current research landscape is skewed toward developing new DR techniques rather than their interpretation or evaluation, where we discuss how the HCI community can contribute to broadening this focus.
中文: 本文通过综述133篇论文,针对降维技术用于可视化分析时的不可靠性提出了工作流模型和分类法,以识别并解决可靠性问题,同时指出当前研究过于偏重技术开发而忽视解释评估的现状。
English: This review of 133 papers addresses the unreliability in visual analytics using dimensionality reduction by proposing a workflow model and taxonomy to identify and solve reliability issues, highlighting the field's current overemphasis on technique development over interpretation.

Authors:Sarah Seifi, Tobias Sukianto, Cecilia Carbonelli, Lorenzo Servadei, Robert Wille
Title: Learning Interpretable Rules from Neural Networks: Neurosymbolic AI for Radar Hand Gesture Recognition
Abstract:
Rule-based models offer interpretability but struggle with complex data, while deep neural networks excel in performance yet lack transparency. This work investigates a neuro-symbolic rule learning neural network named RL-Net that learns interpretable rule lists through neural optimization, applied for the first time to radar-based hand gesture recognition (HGR). We benchmark RL-Net against a fully transparent rule-based system (MIRA) and an explainable black-box model (XentricAI), evaluating accuracy, interpretability, and user adaptability via transfer learning. Our results show that RL-Net achieves a favorable trade-off, maintaining strong performance (93.03% F1) while significantly reducing rule complexity. We identify optimization challenges specific to rule pruning and hierarchy bias and propose stability-enhancing modifications. Compared to MIRA and XentricAI, RL-Net emerges as a practical middle ground between transparency and performance. This study highlights the real-world feasibility of neuro-symbolic models for interpretable HGR and offers insights for extending explainable AI to edge-deployable sensing systems.
中文摘要:RL-Net是一种神经符号模型,在手势识别中实现了可解释性与性能的平衡,相比透明规则系统和黑箱模型,在保持93.03% F1分数的同时显著降低了规则复杂度。
English Summary: RL-Net is a neuro-symbolic model that balances interpretability and performance in hand gesture recognition, achieving 93.03% F1 score while reducing rule complexity compared to transparent and black-box alternatives.

Authors:Shuning Zhang, Jingruo Chen, Zhiqi Gao, Jiajing Gao, Xin Yi, Hewu Li
Title: Characterizing Unintended Consequences in Human-GUI Agent Collaboration for Web Browsing
Abstract:
The proliferation of Large Language Model (LLM)-based Graphical User Interface (GUI) agents in web browsing scenarios present complex unintended consequences (UCs). This paper characterizes three UCs from three perspectives: phenomena, influence and mitigation, drawing on social media analysis (N=221 posts) and semi-structured interviews (N=14). Key phenomenon for UCs include agents' deficiencies in comprehending instructions and planning tasks, challenges in executing accurate GUI interactions and adapting to dynamic interfaces, the generation of unreliable or misaligned outputs, and shortcomings in error handling and feedback processing. These phenomena manifest as influences from unanticipated actions and user frustration, to privacy violations and security vulnerabilities, and further to eroded trust and wider ethical concerns. Our analysis also identifies user-initiated mitigation, such as technical adjustments and manual oversight, and provides implications for designing future LLM-based GUI agents that are robust, user-centric, and transparent, fostering a crucial balance between automation and human oversight.
中文: 本文揭示了基于大语言模型的图形界面代理在网络浏览中的意外后果,包括操作缺陷和伦理风险,并提出了用户主导的缓解措施及未来代理的设计改进方向。
English: This paper identifies unintended consequences of LLM-based GUI agents in web browsing, including operational deficiencies and ethical risks, and suggests user-led mitigations and design improvements for future agents.

Authors:Jason Phang, Michael Lampe, Lama Ahmad, Sandhini Agarwal, Cathy Mengying Fang, Auren R. Liu, Valdemar Danry, Eunhae Lee, Samantha W. T. Chan, Pat Pataranutaporn, Pattie Maes
Title: Investigating Affective Use and Emotional Well-being on ChatGPT
Abstract:
As AI chatbots see increased adoption and integration into everyday life, questions have been raised about the potential impact of human-like or anthropomorphic AI on users. In this work, we investigate the extent to which interactions with ChatGPT (with a focus on Advanced Voice Mode) may impact users' emotional well-being, behaviors and experiences through two parallel studies. To study the affective use of AI chatbots, we perform large-scale automated analysis of ChatGPT platform usage in a privacy-preserving manner, analyzing over 3 million conversations for affective cues and surveying over 4,000 users on their perceptions of ChatGPT. To investigate whether there is a relationship between model usage and emotional well-being, we conduct an Institutional Review Board (IRB)-approved randomized controlled trial (RCT) on close to 1,000 participants over 28 days, examining changes in their emotional well-being as they interact with ChatGPT under different experimental settings. In both on-platform data analysis and the RCT, we observe that very high usage correlates with increased self-reported indicators of dependence. From our RCT, we find that the impact of voice-based interactions on emotional well-being to be highly nuanced, and influenced by factors such as the user's initial emotional state and total usage duration. Overall, our analysis reveals that a small number of users are responsible for a disproportionate share of the most affective cues.
中文摘要:本研究通过大规模数据分析及受控实验,发现过度使用ChatGPT(特别是语音模式)可能引发用户依赖,且其对情绪健康的影响因用户初始状态和使用时长呈现复杂差异。
English Summary: This study examines how interactions with ChatGPT, especially its voice mode, affect users' emotional well-being through large-scale data analysis and controlled trials, revealing nuanced impacts including potential dependence from high usage.

Authors:Cathy Mengying Fang, Auren R. Liu, Valdemar Danry, Eunhae Lee, Samantha W. T. Chan, Pat Pataranutaporn, Pattie Maes, Jason Phang, Michael Lampe, Lama Ahmad, Sandhini Agarwal
Title: How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Randomized Controlled Study
Abstract:
AI chatbots, especially those with voice capabilities, have become increasingly human-like, with more users seeking emotional support and companionship from them. Concerns are rising about how such interactions might impact users' loneliness and socialization with real people. We conducted a four-week randomized, controlled, IRB-approved experiment (n=981, >300K messages) to investigate how AI chatbot interaction modes (text, neutral voice, and engaging voice) and conversation types (open-ended, non-personal, and personal) influence psychosocial outcomes such as loneliness, social interaction with real people, emotional dependence on AI and problematic AI usage. Results showed that while voice-based chatbots initially appeared beneficial in mitigating loneliness and dependence compared with text-based chatbots, these advantages diminished at high usage levels, especially with a neutral-voice chatbot. Conversation type also shaped outcomes: personal topics slightly increased loneliness but tended to lower emotional dependence compared with open-ended conversations, whereas non-personal topics were associated with greater dependence among heavy users. Overall, higher daily usage - across all modalities and conversation types - correlated with higher loneliness, dependence, and problematic use, and lower socialization. Exploratory analyses revealed that those with stronger emotional attachment tendencies and higher trust in the AI chatbot tended to experience greater loneliness and emotional dependence, respectively. These findings underscore the complex interplay between chatbot design choices (e.g., voice expressiveness) and user behaviors (e.g., conversation content, usage frequency). We highlight the need for further research on whether chatbots' ability to manage emotional content without fostering dependence or replacing human relationships benefits overall well-being.
中文: 尽管不同互动模式和对话类型未产生显著影响,但自愿增加使用聊天机器人的参与者普遍出现更差的心理社会结果,且用户对AI的信任和社交吸引力等特质与更高的情感依赖和问题性使用相关。
English: Despite varied interaction modes and conversation types showing no significant effects, increased voluntary chatbot usage consistently worsened psychosocial outcomes, with user traits like trust and social attraction correlating with higher emotional dependence and problematic use.

Authors:Sarah Seifi, Tobias Sukianto, Cecilia Carbonelli, Lorenzo Servadei, Robert Wille
Title: Complying with the EU AI Act: Innovations in Explainable and User-Centric Hand Gesture Recognition
Abstract:
The EU AI Act underscores the importance of transparency, user-centricity, and robustness in AI systems, particularly for high-risk systems. In response, we present advancements in XentricAI, an explainable hand gesture recognition (HGR) system designed to meet these regulatory requirements. XentricAI adresses fundamental challenges in HGR, such as the opacity of black-box models using explainable AI methods and the handling of distributional shifts in real-world data through transfer learning techniques. We extend an existing radar-based HGR dataset by adding 28,000 new gestures, with contributions from multiple users across varied locations, including 24,000 out-of-distribution gestures. Leveraging this real-world dataset, we enhance XentricAI's capabilities by integrating a variational autoencoder module for improved gesture anomaly detection, incorporating user-specific thresholding. This integration enables the identification of 11.50% more anomalous gestures. Our extensive evaluations demonstrate a 97.5% sucess rate in characterizing these anomalies, significantly improving system explainability. Furthermore, the implementation of transfer learning techniques has shown a substantial increase in user adaptability, with an average improvement of at least 15.17%. This work contributes to the development of trustworthy AI systems by providing both technical advancements and regulatory compliance, offering a commercially viable solution that aligns with the EU AI Act requirements.
中文: 欧盟AI法案强调AI系统的透明度和稳健性,XentricAI作为可解释的手势识别系统,通过改进异常检测和用户适应性,满足了法规要求并提供了商业可行的解决方案。
English: The EU AI Act promotes transparency and robustness in AI, leading to the development of XentricAI, an explainable hand gesture recognition system that enhances anomaly detection and user adaptability through advanced methods, ensuring regulatory compliance and commercial viability.

Authors:Alexander Doudkin, Pat Pataranutaporn, Pattie Maes
Title: AI persuading AI vs AI persuading Humans: LLMs' Differential Effectiveness in Promoting Pro-Environmental Behavior
Abstract:
Pro-environmental behavior (PEB) is vital to combat climate change, yet turning awareness into intention and action remains elusive. We explore large language models (LLMs) as tools to promote PEB, comparing their impact across 3,200 participants: real humans (n=1,200), simulated humans based on actual participant data (n=1,200), and fully synthetic personas (n=1,200). All three participant groups faced personalized or standard chatbots, or static statements, employing four persuasion strategies (moral foundations, future self-continuity, action orientation, or "freestyle" chosen by the LLM). Results reveal a "synthetic persuasion paradox": synthetic and simulated agents significantly affect their post-intervention PEB stance, while human responses barely shift. Simulated participants better approximate human trends but still overestimate effects. This disconnect underscores LLM's potential for pre-evaluating PEB interventions but warns of its limits in predicting real-world behavior. We call for refined synthetic modeling and sustained and extended human trials to align conversational AI's promise with tangible sustainability outcomes.
中文摘要:大型语言模型在模拟测试中显示出评估环保行为干预措施的潜力,但会高估其实际效果,因此需要改进合成建模并加强人类试验。
English Summary: Large language models show promise for evaluating pro-environmental behavior interventions through simulated testing but overestimate their real-world impact, highlighting the need for improved synthetic modeling and human trials.

Authors:Shuning Zhang, Xin Yi, Shixuan Li, Chuye Hong, Gujun Chen, Jiarui Liu, Xueyang Wang, Yongquan Hu, Yuntao Wang, Hewu Li
Title: Actual Achieved Gain and Optimal Perceived Gain: Modeling Human Take-over Decisions Towards Automated Vehicles' Suggestions
Abstract:
Driver decision quality in take-overs is critical for effective human-Autonomous Driving System (ADS) collaboration. However, current research lacks detailed analysis of its variations. This paper introduces two metrics--Actual Achieved Gain (AAG) and Optimal Perceived Gain (OPG)--to assess decision quality, with OPG representing optimal decisions and AAG reflecting actual outcomes. Both are calculated as weighted averages of perceived gains and losses, influenced by ADS accuracy. Study 1 (N=315) used a 21-point Thurstone scale to measure perceived gains and losses-key components of AAG and OPG-across typical tasks: route selection, overtaking, and collision avoidance. Studies 2 (N=54) and 3 (N=54) modeled decision quality under varying ADS accuracy and decision time. Results show with sufficient time (>3.5s), AAG converges towards OPG, indicating rational decision-making, while limited time leads to intuitive and deterministic choices. Study 3 also linked AAG-OPG deviations to irrational behaviors. An intervention study (N=8) and a pilot (N=4) employing voice alarms and multi-modal alarms based on these deviations demonstrated AAG's potential to improve decision quality.
中文摘要:本文提出AAG和OPG指标评估接管过程中的驾驶员决策质量,研究表明充足决策时间可实现理性选择而时间压力导致直觉决策,基于这些指标的干预措施显示出提升决策质量的潜力。
English Summary: This paper introduces AAG and OPG metrics to evaluate driver decision quality during take-overs, revealing that sufficient decision time enables rational choices while time constraints lead to intuitive decisions, with interventions using these metrics showing potential for improvement.

Authors:Pat Pataranutaporn, Alexander Doudkin, Pattie Maes
Title: OceanChat: The Effect of Virtual Conversational AI Agents on Sustainable Attitude and Behavior Change
Abstract:
Marine ecosystems face unprecedented threats from climate change and plastic pollution, yet traditional environmental education often struggles to translate awareness into sustained behavioral change. This paper presents OceanChat, an interactive system leveraging large language models to create conversational AI agents represented as animated marine creatures -- specifically a beluga whale, a jellyfish, and a seahorse -- designed to promote environmental behavior (PEB) and foster awareness through personalized dialogue. Through a between-subjects experiment (N=900), we compared three conditions: (1) Static Scientific Information, providing conventional environmental education through text and images; (2) Static Character Narrative, featuring first-person storytelling from 3D-rendered marine creatures; and (3) Conversational Character Narrative, enabling real-time dialogue with AI-powered marine characters. Our analysis revealed that the Conversational Character Narrative condition significantly increased behavioral intentions and sustainable choice preferences compared to static approaches. The beluga whale character demonstrated consistently stronger emotional engagement across multiple measures, including perceived anthropomorphism and empathy. However, impacts on deeper measures like climate policy support and psychological distance were limited, highlighting the complexity of shifting entrenched beliefs. Our work extends research on sustainability interfaces facilitating PEB and offers design principles for creating emotionally resonant, context-aware AI characters. By balancing anthropomorphism with species authenticity, OceanChat demonstrates how interactive narratives can bridge the gap between environmental knowledge and real-world behavior change.
中文摘要:OceanChat是一款利用海洋生物角色进行对话的交互式AI系统,通过个性化交流显著提升了环保行为意愿,但对深层信念体系的影响仍显不足。
English Summary: OceanChat is an interactive AI system using conversational marine creature characters that significantly boosts environmental behavioral intentions through personalized dialogue, though its impact on deeper belief systems remains limited.

Authors:Jionghao Lin, Jiarui Rao, Yiyang Zhao, Yuting Wang, Ashish Gurung, Amanda Barany, Jaclyn Ocumpaugh, Ryan S. Baker, Kenneth R. Koedinger
Title: Automatic Large Language Models Creation of Interactive Learning Lessons
Abstract:
We explore the automatic generation of interactive, scenario-based lessons designed to train novice human tutors who teach middle school mathematics online. Employing prompt engineering through a Retrieval-Augmented Generation approach with GPT-4o, we developed a system capable of creating structured tutor training lessons. Our study generated lessons in English for three key topics: Encouraging Students' Independence, Encouraging Help-Seeking Behavior, and Turning on Cameras, using a task decomposition prompting strategy that breaks lesson generation into sub-tasks. The generated lessons were evaluated by two human evaluators, who provided both quantitative and qualitative evaluations using a comprehensive rubric informed by lesson design research. Results demonstrate that the task decomposition strategy led to higher-rated lessons compared to single-step generation. Human evaluators identified several strengths in the LLM-generated lessons, including well-structured content and time-saving potential, while also noting limitations such as generic feedback and a lack of clarity in some instructional sections. These findings underscore the potential of hybrid human-AI approaches for generating effective lessons in tutor training.
中文摘要:本研究利用GPT-4o和任务分解提示技术开发了自动生成互动式导师培训课程的系统,人工评估显示生成课程结构清晰且节省时间,但也存在反馈模板化等局限性。
English Summary: This study develops an AI system using GPT-4o and task decomposition prompting to automatically generate interactive tutor training lessons, with human evaluation showing structured content and time efficiency despite some generic feedback limitations.

Authors:Sam Earle, Ahmed Khalifa, Muhammad Umair Nasir, Zehua Jiang, Graham Todd, Andrzej Banburski-Fahey, Julian Togelius
Title: ScriptDoctor: Automatic Generation of PuzzleScript Games via Large Language Models and Tree Search
Abstract:
There is much interest in using large pre-trained models in Automatic Game Design (AGD), whether via the generation of code, assets, or more abstract conceptualization of design ideas. But so far this interest largely stems from the ad hoc use of such generative models under persistent human supervision. Much work remains to show how these tools can be integrated into longer-time-horizon AGD pipelines, in which systems interface with game engines to test generated content autonomously. To this end, we introduce ScriptDoctor, a Large Language Model (LLM)-driven system for automatically generating and testing games in PuzzleScript, an expressive but highly constrained description language for turn-based puzzle games over 2D gridworlds. ScriptDoctor generates and tests game design ideas in an iterative loop, where human-authored examples are used to ground the system's output, compilation errors from the PuzzleScript engine are used to elicit functional code, and search-based agents play-test generated games. ScriptDoctor serves as a concrete example of the potential of automated, open-ended LLM-based workflows in generating novel game content.
中文:目前大型预训练模型在自动游戏设计中的应用多依赖人工监督,而ScriptDoctor作为一个基于大语言模型的系统,能在PuzzleScript中通过迭代生成与测试游戏,展现了自动化生成新颖游戏内容的潜力。
English: There is growing interest in using large pre-trained models for Automatic Game Design, but current applications rely heavily on human supervision, whereas ScriptDoctor demonstrates an automated LLM-driven system that iteratively generates and tests games in PuzzleScript, showcasing the potential for open-ended content creation.

Authors:Yichi Zhang, Xin Luna Dong, Zhaojiang Lin, Andrea Madotto, Anuj Kumar, Babak Damavandi, Joyce Chai, Seungwhan Moon
Title: Proactive Assistant Dialogue Generation from Streaming Egocentric Videos
Abstract:
Recent advances in conversational AI have been substantial, but developing real-time systems for perceptual task guidance remains challenging. These systems must provide interactive, proactive assistance based on streaming visual inputs, yet their development is constrained by the costly and labor-intensive process of data collection and system evaluation. To address these limitations, we present a comprehensive framework with three key contributions. First, we introduce a novel data curation pipeline that synthesizes dialogues from annotated egocentric videos, resulting in \dataset, a large-scale synthetic dialogue dataset spanning multiple domains. Second, we develop a suite of automatic evaluation metrics, validated through extensive human studies. Third, we propose an end-to-end model that processes streaming video inputs to generate contextually appropriate responses, incorporating novel techniques for handling data imbalance and long-duration videos. This work lays the foundation for developing real-time, proactive AI assistants capable of guiding users through diverse tasks. Project page: https://pro-assist.github.io/

Authors:Ziyi Wang, Yuxuan Lu, Wenbo Li, Amirali Amini, Bo Sun, Yakov Bart, Weimin Lyu, Jiri Gesi, Tian Wang, Jing Huang, Yu Su, Upol Ehsan, Malihe Alikhani, Toby Jia-Jun Li, Lydia Chilton, Dakuo Wang
Title: OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation
Abstract:
Can large language models (LLMs) accurately simulate the next web action of a specific user? While LLMs have shown promising capabilities in generating ``believable'' human behaviors, evaluating their ability to mimic real user behaviors remains an open challenge, largely due to the lack of high-quality, publicly available datasets that capture both the observable actions and the internal reasoning of an actual human user. To address this gap, we introduce OPERA, a novel dataset of Observation, Persona, Rationale, and Action collected from real human participants during online shopping sessions. OPERA is the first public dataset that comprehensively captures: user personas, browser observations, fine-grained web actions, and self-reported just-in-time rationales. We developed both an online questionnaire and a custom browser plugin to gather this dataset with high fidelity. Using OPERA, we establish the first benchmark to evaluate how well current LLMs can predict a specific user's next action and rationale with a given persona and history. This dataset lays the groundwork for future research into LLM agents that aim to act as personalized digital twins for human.
中文摘要:本研究推出OPERA数据集,通过记录真实用户在网购中的行为来评估大型语言模型模拟个体用户网络操作及思维过程的准确性。
English Summary: This study introduces OPERA, a novel dataset capturing real user behaviors during online shopping to evaluate how accurately large language models can simulate individual users' next web actions and reasoning.

Authors:Xihuai Wang, Ziyi Zhao, Siyu Ren, Shao Zhang, Song Li, Xiaoyu Li, Ziwen Wang, Lin Qiu, Guanglu Wan, Xuezhi Cao, Xunliang Cai, Weinan Zhang
Title: Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese
Abstract:
Recent advances in large language models (LLMs) have significantly improved text-to-speech (TTS) systems, enhancing control over speech style, naturalness, and emotional expression, which brings TTS Systems closer to human-level performance. Although the Mean Opinion Score (MOS) remains the standard for TTS System evaluation, it suffers from subjectivity, environmental inconsistencies, and limited interpretability. Existing evaluation datasets also lack a multi-dimensional design, often neglecting factors such as speaking styles, context diversity, and trap utterances, which is particularly evident in Chinese TTS evaluation. To address these challenges, we introduce the Audio Turing Test (ATT), a multi-dimensional Chinese corpus dataset ATT-Corpus paired with a simple, Turing-Test-inspired evaluation protocol. Instead of relying on complex MOS scales or direct model comparisons, ATT asks evaluators to judge whether a voice sounds human. This simplification reduces rating bias and improves evaluation robustness. To further support rapid model development, we also finetune Qwen2-Audio-Instruct with human judgment data as Auto-ATT for automatic evaluation. Experimental results show that ATT effectively differentiates models across specific capability dimensions using its multi-dimensional design. Auto-ATT also demonstrates strong alignment with human evaluations, confirming its value as a fast and reliable assessment tool. The white-box ATT-Corpus and Auto-ATT can be found in ATT Hugging Face Collection (https://huggingface.co/collections/meituan/audio-turing-test-682446320368164faeaf38a4).

Authors:Jie Cao, Chloe Qianhui Zhao, Xian Chen, Shuman Wang, Christian Schunn, Kenneth R. Koedinger, Jionghao Lin
Title: From First Draft to Final Insight: A Multi-Agent Approach for Feedback Generation
Abstract:
Producing large volumes of high-quality, timely feedback poses significant challenges to instructors. To address this issue, automation technologies-particularly Large Language Models (LLMs)-show great potential. However, current LLM-based research still shows room for improvement in terms of feedback quality. Our study proposed a multi-agent approach performing "generation, evaluation, and regeneration" (G-E-RG) to further enhance feedback quality. In the first-generation phase, six methods were adopted, combining three feedback theoretical frameworks and two prompt methods: zero-shot and retrieval-augmented generation with chain-of-thought (RAG_CoT). The results indicated that, compared to first-round feedback, G-E-RG significantly improved final feedback across six methods for most dimensions. Specifically:(1) Evaluation accuracy for six methods increased by 3.36% to 12.98% (p<0.001); (2) The proportion of feedback containing four effective components rose from an average of 27.72% to an average of 98.49% among six methods, sub-dimensions of providing critiques, highlighting strengths, encouraging agency, and cultivating dialogue also showed great enhancement (p<0.001); (3) There was a significant improvement in most of the feature values (p<0.001), although some sub-dimensions (e.g., strengthening the teacher-student relationship) still require further enhancement; (4) The simplicity of feedback was effectively enhanced (p<0.001) for three methods.
中文摘要:本研究提出的多智能体G-E-RG方法通过整合理论框架与提示策略,在六个反馈维度上显著提升了评估准确性、有效反馈成分占比及简洁性,有效解决了自动化反馈的质量优化问题。
English Summary: This study introduces a multi-agent G-E-RG approach that significantly enhances feedback quality by integrating theoretical frameworks and prompt methods, showing marked improvements in evaluation accuracy, feedback components, and simplicity across multiple dimensions.

Authors:Chloe Qianhui Zhao, Jie Cao, Eason Chen, Kenneth R. Koedinger, Jionghao Lin
Title: SlideItRight: Using AI to Find Relevant Slides and Provide Feedback for Open-Ended Questions
Abstract:
Feedback is important in supporting student learning. While various automated feedback systems have been implemented to make the feedback scalable, many existing solutions only focus on generating text-based feedback. As is indicated in the multimedia learning principle, learning with more modalities could help utilize more separate channels, reduce the cognitive load and facilitate students' learning. Hence, it is important to explore the potential of Artificial Intelligence (AI) in feedback generation from and to different modalities. Our study leverages Large Language Models (LLMs) for textual feedback with the supplementary guidance from other modality - relevant lecture slide retrieved from the slides hub. Through an online crowdsourcing study (N=91), this study investigates learning gains and student perceptions using a 2x2 design (i.e., human feedback vs. AI feedback and with vs. without relevant slide), evaluating the clarity, engagement, perceived effectiveness, and reliability) of AI-facilitated multimodal feedback. We observed significant pre-to-post learning gains across all conditions. However, the differences in these gains were not statistically significant between conditions. The post-survey revealed that students found the slide feedback helpful in their learning process, though they reported difficulty in understanding it. Regarding the AI-generated open-ended feedback, students considered it personalized and relevant to their responses, but they expressed lower trust in the AI feedback compared to human-generated feedback.
中文摘要:本研究利用大型语言模型和补充幻灯片探索人工智能生成的多模态反馈,发现尽管学生在所有条件下均取得学习进步,但相较于人工反馈,他们对AI反馈的信任度较低,尽管认为其具有相关性和个性化特点。
English Summary: This study explores AI-generated multimodal feedback using LLMs and lecture slides, finding that while students achieved learning gains across all conditions, they trusted AI feedback less than human feedback despite its perceived relevance and personalization.

Authors:Megan Gu, Chloe Qianhui Zhao, Claire Liu, Nikhil Patel, Jahnvi Shah, Jionghao Lin, Kenneth R. Koedinger
Title: Toward Automated Qualitative Analysis: Leveraging Large Language Models for Tutoring Dialogue Evaluation
Abstract:
Our study introduces an automated system leveraging large language models (LLMs) to assess the effectiveness of five key tutoring strategies: 1. giving effective praise, 2. reacting to errors, 3. determining what students know, 4. helping students manage inequity, and 5. responding to negative self-talk. Using a public dataset from the Teacher-Student Chatroom Corpus, our system classifies each tutoring strategy as either being employed as desired or undesired. Our study utilizes GPT-3.5 with few-shot prompting to assess the use of these strategies and analyze tutoring dialogues. The results show that for the five tutoring strategies, True Negative Rates (TNR) range from 0.655 to 0.738, and Recall ranges from 0.327 to 0.432, indicating that the model is effective at excluding incorrect classifications but struggles to consistently identify the correct strategy. The strategy \textit{helping students manage inequity} showed the highest performance with a TNR of 0.738 and Recall of 0.432. The study highlights the potential of LLMs in tutoring strategy analysis and outlines directions for future improvements, including incorporating more advanced models for more nuanced feedback.
本研究利用GPT-3.5自动评估五种关键辅导策略,结果显示模型虽能有效排除错误分类,但在准确识别正确策略方面一致性不足,其中帮助学生应对不平等策略表现最佳。
Our study employs GPT-3.5 to automatically evaluate five key tutoring strategies, showing strong capability in avoiding incorrect classifications but limited consistency in accurately identifying correct strategies, with the highest performance observed in helping students manage inequity.

Authors:Zhen Wen, Luoxuan Weng, Yinghao Tang, Runjin Zhang, Yuxin Liu, Bo Pan, Minfeng Zhu, Wei Chen
Title: Exploring Multimodal Prompt for Visualization Authoring with Large Language Models
Abstract:
Recent advances in large language models (LLMs) have shown great potential in automating the process of visualization authoring through simple natural language utterances. However, instructing LLMs using natural language is limited in precision and expressiveness for conveying visualization intent, leading to misinterpretation and time-consuming iterations. To address these limitations, we conduct an empirical study to understand how LLMs interpret ambiguous or incomplete text prompts in the context of visualization authoring, and the conditions making LLMs misinterpret user intent. Informed by the findings, we introduce visual prompts as a complementary input modality to text prompts, which help clarify user intent and improve LLMs' interpretation abilities. To explore the potential of multimodal prompting in visualization authoring, we design VisPilot, which enables users to easily create visualizations using multimodal prompts, including text, sketches, and direct manipulations on existing visualizations. Through two case studies and a controlled user study, we demonstrate that VisPilot provides a more intuitive way to create visualizations without affecting the overall task efficiency compared to text-only prompting approaches. Furthermore, we analyze the impact of text and visual prompts in different visualization tasks. Our findings highlight the importance of multimodal prompting in improving the usability of LLMs for visualization authoring. We discuss design implications for future visualization systems and provide insights into how multimodal prompts can enhance human-AI collaboration in creative visualization tasks. All materials are available at https://OSF.IO/2QRAK.
中文: 大型语言模型在通过自然语言自动化创建可视化方面展现出潜力,但存在精度和表达力不足的问题,易导致误解和耗时迭代;为此,研究引入视觉提示作为补充输入方式,设计VisPilot系统支持多模态提示,实现更直观的可视化创作,在不影响效率的同时提升可用性。
English: Recent advances in large language models (LLMs) show potential for automating visualization authoring through natural language, but face limitations in precision and expressiveness, leading to misinterpretations and time-consuming iterations; to address this, the study introduces visual prompts as a complementary input modality, designing VisPilot to enable intuitive multimodal prompting that improves usability without sacrificing efficiency.

Authors:Zi Haur Pang, Yahui Fu, Divesh Lala, Mikey Elmers, Koji Inoue, Tatsuya Kawahara
Title: Does the Appearance of Autonomous Conversational Robots Affect User Spoken Behaviors in Real-World Conference Interactions?
Abstract:
We investigate the impact of robot appearance on users' spoken behavior during real-world interactions by comparing a human-like android, ERICA, with a less anthropomorphic humanoid, TELECO. Analyzing data from 42 participants at SIGDIAL 2024, we extracted linguistic features such as disfluencies and syntactic complexity from conversation transcripts. The results showed moderate effect sizes, suggesting that participants produced fewer disfluencies and employed more complex syntax when interacting with ERICA. Further analysis involving training classification models like Naïve Bayes, which achieved an F1-score of 71.60\%, and conducting feature importance analysis, highlighted the significant role of disfluencies and syntactic complexity in interactions with robots of varying human-like appearances. Discussing these findings within the frameworks of cognitive load and Communication Accommodation Theory, we conclude that designing robots to elicit more structured and fluent user speech can enhance their communicative alignment with humans.
中文: 研究发现,与类人程度较低的机器人TELECO相比,参与者与高度仿真的机器人ERICA互动时言语更流畅、句法更复杂,表明机器人外观通过影响用户言语模式可促进人机沟通协调。
English: This study found that interacting with the more human-like robot ERICA led users to produce fewer disfluencies and more complex syntax compared to the less anthropomorphic TELECO, suggesting that robot appearance influences speech patterns and can enhance human-robot communicative alignment.

Authors:Chentianye Xu, Jionghao Lin, Tongshuang Wu, Vincent Aleven, Kenneth R. Koedinger
Title: Improving Automated Feedback Systems for Tutor Training in Low-Resource Scenarios through Data Augmentation
Abstract:
Tutoring is an effective instructional method for enhancing student learning, yet its success relies on the skill and experience of the tutors. This reliance presents challenges for the widespread implementation of tutoring, particularly in training novice tutors. To support tutor training programs, real-time automated feedback systems are essential for efficiently training large numbers of tutors. Lin et al.'s previous study employed Generative Pre-Trained Transformers (GPT) for sequence labeling to identify desirable and undesirable praise components in a tutor training dataset, providing explanatory feedback. However, this approach requires a significant amount of labeled data for fine-tuning, which is both labor-intensive and dependent on expert input. To address the challenges associated with extensive data labeling, the current study explores the use of prompting more advanced GPT models like GPT-4o to generate synthetic datasets for augmenting labeled response data, followed by fine-tuning a GPT-3.5 model. Our results demonstrate that our data augmentation approach generalizes effectively to identify other types of praise, compared to the same model fine-tuned without augmentation. These findings suggest that for data-intensive tasks, synthetic data generated through GPT model prompting can substantially enhance fine-tuned model performance in low-resource scenarios.
Chinese: 利用GPT模型生成合成数据,可在低资源环境下通过增强微调模型性能来提升导师培训效果,从而克服对大量标注数据的依赖。
English: Real-time automated feedback systems using GPT models can enhance tutor training by generating synthetic data to improve fine-tuned model performance in low-resource settings, overcoming the need for extensive labeled datasets.

Authors:Ehsan Latif, Ying Chen, Xiaoming Zhai, Yue Yin
Title: Human-Centered Design for AI-based Automatically Generated Assessment Reports: A Systematic Review
Abstract:
This paper provides a comprehensive review of the design and implementation of automatically generated assessment reports (AutoRs) for formative use in K-12 Science, Technology, Engineering, and Mathematics (STEM) classrooms. With the increasing adoption of technology-enhanced assessments, there is a critical need for human-computer interactive tools that efficiently support the interpretation and application of assessment data by teachers. AutoRs are designed to provide synthesized, interpretable, and actionable insights into students' performance, learning progress, and areas for improvement. Guided by cognitive load theory, this study emphasizes the importance of reducing teachers' cognitive demands through user-centered and intuitive designs. It highlights the potential of diverse information presentation formats such as text, visual aids, and plots and advanced functionalities such as live and interactive features to enhance usability. However, the findings also reveal that many existing AutoRs fail to fully utilize these approaches, leading to high initial cognitive demands and limited engagement. This paper proposes a conceptual framework to inform the design, implementation, and evaluation of AutoRs, balancing the trade-offs between usability and functionality. The framework aims to address challenges in engaging teachers with technology-enhanced assessment results, facilitating data-driven decision-making, and providing personalized feedback to improve the teaching and learning process.
本文综述了K-12 STEM教育中自动生成评估报告的设计,提出了一个概念框架,旨在通过直观交互功能降低教师认知负荷,提升报告可用性与功能性,促进数据驱动的教学决策。
This paper reviews the design of automatically generated assessment reports for K-12 STEM education, proposing a conceptual framework to enhance their usability and functionality while reducing teachers' cognitive load through intuitive, interactive features.

Authors:Guy Laban, Micol Spitale, Minja Axelsson, Nida Itrat Abbasi, Hatice Gunes
Title: Critical Insights about Robots for Mental Wellbeing
Abstract:
Social robots are increasingly being explored as tools to support emotional wellbeing, particularly in non-clinical settings. Drawing on a range of empirical studies and practical deployments, this paper outlines six key insights that highlight both the opportunities and challenges in using robots to promote mental wellbeing. These include (1) the lack of a single, objective measure of wellbeing, (2) the fact that robots don't need to act as companions to be effective, (3) the growing potential of virtual interactions, (4) the importance of involving clinicians in the design process, (5) the difference between one-off and long-term interactions, and (6) the idea that adaptation and personalization are not always necessary for positive outcomes. Rather than positioning robots as replacements for human therapists, we argue that they are best understood as supportive tools that must be designed with care, grounded in evidence, and shaped by ethical and psychological considerations. Our aim is to inform future research and guide responsible, effective use of robots in mental health and wellbeing contexts.
中文: 本文概述了利用社交机器人支持心理健康的六个关键见解,强调其应作为辅助工具而非人类治疗师的替代品,并指出设计需基于证据并符合伦理考量。
English: This paper outlines six key insights on using social robots to support mental wellbeing, emphasizing their role as supportive tools rather than replacements for human therapists, and stressing the need for evidence-based, ethically-informed design.

Authors:Stefano Scanzio, Gabriele Formis, Pietro Chiavassa, Lukasz Wisniewski, Gianluca Cena
Title: Compression of executable QR codes or sQRy for Industry: an example for Wi-Fi access points
Abstract:
Executable QR codes, or sQRy, is a technology dated 2022 that permits to include a runnable program inside a QR code, enabling interaction with the user even in the absence of an Internet connection. sQRy are enablers for different practical applications, including network equipment configuration, diagnostics, and enhanced smart manuals in industrial contexts. Many other non-industry-related fields can also benefit from this technology. Regardless of where sQRy are used, text strings are among the most commonly embedded data. However, due to strict limitations on the available payload, the occupancy of strings limits the length of the programs that can be embedded. In this work, we propose a simple yet effective strategy that can reduce the space taken by strings, hence broadening sQRy applicability.
Chinese: 2022年问世的sQRy技术可在二维码中嵌入可执行程序实现离线交互,但字符串数据因有效载荷限制影响程序长度,为此提出一种节省空间的策略以拓宽其应用范围。
English: The 2022-developed sQRy technology embeds executable programs in QR codes for offline use, but string data limits program length due to payload constraints, prompting a new space-saving strategy to enhance its applicability.

Authors:Tianbao Xie, Jiaqi Deng, Xiaochuan Li, Junlin Yang, Haoyuan Wu, Jixuan Chen, Wenjing Hu, Xinyuan Wang, Yuhui Xu, Zekun Wang, Yiheng Xu, Junli Wang, Doyen Sahoo, Tao Yu, Caiming Xiong
Title: Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
Abstract:
Graphical user interface (GUI) grounding, the ability to map natural language instructions to specific actions on graphical user interfaces, remains a critical bottleneck in computer use agent development. Current benchmarks oversimplify grounding tasks as short referring expressions, failing to capture the complexity of real-world interactions that require software commonsense, layout understanding, and fine-grained manipulation capabilities. To address these limitations, we introduce OSWorld-G, a comprehensive benchmark comprising 564 finely annotated samples across diverse task types including text matching, element recognition, layout understanding, and precise manipulation. Additionally, we synthesize and release the largest computer use grounding dataset Jedi, which contains 4 million examples through multi-perspective decoupling of tasks. Our multi-scale models trained on Jedi demonstrate its effectiveness by outperforming existing approaches on ScreenSpot-v2, ScreenSpot-Pro, and our OSWorld-G. Furthermore, we demonstrate that improved grounding with Jedi directly enhances agentic capabilities of general foundation models on complex computer tasks, improving from 5% to 27% on OSWorld. Through detailed ablation studies, we identify key factors contributing to grounding performance and verify that combining specialized data for different interface elements enables compositional generalization to novel interfaces. All benchmark, data, checkpoints, and code are open-sourced and available at https://osworld-grounding.github.io.
中文:OSWorld-G是一个解决图形用户界面定位局限性的新基准,包含564个标注样本和400万条数据的Jedi数据集,显著提升了模型在复杂计算机任务上的性能和智能体能力。
English: OSWorld-G is a new benchmark addressing GUI grounding limitations with 564 annotated samples and the Jedi dataset of 4 million examples, significantly improving model performance and agent capabilities on complex computer tasks.

Authors:Efe Bozkir, Christian Kosel, Tina Seidel, Enkelejda Kasneci
Title: Automated Visual Attention Detection using Mobile Eye Tracking in Behavioral Classroom Studies
Abstract:
Teachers' visual attention and its distribution across the students in classrooms can constitute important implications for student engagement, achievement, and professional teacher training. Despite that, inferring the information about where and which student teachers focus on is not trivial. Mobile eye tracking can provide vital help to solve this issue; however, the use of mobile eye tracking alone requires a significant amount of manual annotations. To address this limitation, we present an automated processing pipeline concept that requires minimal manually annotated data to recognize which student the teachers focus on. To this end, we utilize state-of-the-art face detection models and face recognition feature embeddings to train face recognition models with transfer learning in the classroom context and combine these models with the teachers' gaze from mobile eye trackers. We evaluated our approach with data collected from four different classrooms, and our results show that while it is possible to estimate the visually focused students with reasonable performance in all of our classroom setups, U-shaped and small classrooms led to the best results with accuracies of approximately 0.7 and 0.9, respectively. While we did not evaluate our method for teacher-student interactions and focused on the validity of the technical approach, as our methodology does not require a vast amount of manually annotated data and offers a non-intrusive way of handling teachers' visual attention, it could help improve instructional strategies, enhance classroom management, and provide feedback for professional teacher development.
中文: 本研究提出一种自动化处理流程,结合人脸检测、识别技术及移动眼动仪数据,以最少人工标注识别教师课堂视觉关注对象,在U形和小型教室中准确率分别达约0.7和0.9,有望优化教学策略与教师专业发展。
English: The study introduces an automated pipeline using face detection and recognition with minimal manual data to identify which students teachers focus on via mobile eye tracking, showing promising accuracy especially in U-shaped and small classrooms, which could enhance teaching strategies and professional development.

Authors:Suleyman Ozdel, Can Sarpkaya, Efe Bozkir, Hong Gao, Enkelejda Kasneci
Title: Examining the Role of LLM-Driven Interactions on Attention and Cognitive Engagement in Virtual Classrooms
Abstract:
Transforming educational technologies through the integration of large language models (LLMs) and virtual reality (VR) offers the potential for immersive and interactive learning experiences. However, the effects of LLMs on user engagement and attention in educational environments remain open questions. In this study, we utilized a fully LLM-driven virtual learning environment, where peers and teachers were LLM-driven, to examine how students behaved in such settings. Specifically, we investigate how peer question-asking behaviors influenced student engagement, attention, cognitive load, and learning outcomes and found that, in conditions where LLM-driven peer learners asked questions, students exhibited more targeted visual scanpaths, with their attention directed toward the learning content, particularly in complex subjects. Our results suggest that peer questions did not introduce extraneous cognitive load directly, as the cognitive load is strongly correlated with increased attention to the learning material. Considering these findings, we provide design recommendations for optimizing VR learning spaces.
中文摘要:通过将大型语言模型与虚拟现实结合于教育中,同伴提问能引导学生注意力聚焦于学习内容,增强参与度且不增加额外认知负担,从而优化沉浸式学习效果。
English Summary: Integrating large language models with virtual reality in education enhances student engagement by directing attention to learning content through peer-driven questions, without increasing cognitive load, leading to more effective immersive learning experiences.

Authors:Faisal Haque Bappy, EunJeong Cheon, Tariqul Islam
Title: Centralized Trust in Decentralized Systems: Unveiling Hidden Contradictions in Blockchain and Cryptocurrency
Abstract:
Blockchain technology promises to democratize finance and promote social equity through decentralization, but questions remain about whether current implementations advance or hinder these goals. Through a mixed-methods study combining semi-structured interviews with 13 diverse blockchain stakeholders and analysis of over 3,000 cryptocurrency discussions on Reddit, we examine how trust manifests in cryptocurrency ecosystems despite their decentralized architecture. Our findings uncover that users actively seek out and create centralized trust anchors, such as established exchanges, prominent community figures, and recognized development teams, contradicting blockchain's fundamental promise of trustless interactions. We identify how this contradiction arises from users' mental need for accountability and their reluctance to shoulder the full responsibility of self-custody. The study also reveals how these centralized trust patterns disproportionately impact different user groups, with newer and less technical users showing stronger preferences for centralized intermediaries. This work contributes to our understanding of the inherent tensions between theoretical decentralization and practical implementation in cryptocurrency systems, highlighting the persistent role of centralized trust in supposedly trustless environments.
中文摘要:尽管区块链技术承诺去中心化,但用户在实践中反而依赖交易所和意见领袖等中心化信任锚,揭示了理论上的无需信任理念与实际对责任主体的需求之间存在根本矛盾。
English Summary: Despite blockchain's promise of decentralization, users paradoxically create centralized trust anchors like exchanges and influencers, revealing a contradiction between theoretical trustless ideals and practical reliance on accountability.

Authors:Süleyman Özdel, Kadir Burak Buldu, Enkelejda Kasneci, Efe Bozkir
Title: Exploring Context-aware and LLM-driven Locomotion for Immersive Virtual Reality
Abstract:
Locomotion plays a crucial role in shaping the user experience within virtual reality environments. In particular, hands-free locomotion offers a valuable alternative by supporting accessibility and freeing users from reliance on handheld controllers. To this end, traditional speech-based methods often depend on rigid command sets, limiting the naturalness and flexibility of interaction. In this study, we propose a novel locomotion technique powered by large language models (LLMs), which allows users to navigate virtual environments using natural language with contextual awareness. We evaluate three locomotion methods: controller-based teleportation, voice-based steering, and our language model-driven approach. Our evaluation measures include eye-tracking data analysis, including explainable machine learning through SHAP analysis as well as standardized questionnaires for usability, presence, cybersickness, and cognitive load to examine user attention and engagement. Our findings indicate that the LLM-driven locomotion possesses comparable usability, presence, and cybersickness scores to established methods like teleportation, demonstrating its novel potential as a comfortable, natural language-based, hands-free alternative. In addition, it enhances user attention within the virtual environment, suggesting greater engagement. Complementary to these findings, SHAP analysis revealed that fixation, saccade, and pupil-related features vary across techniques, indicating distinct patterns of visual attention and cognitive processing. Overall, we state that our method can facilitate hands-free locomotion in virtual spaces, especially in supporting accessibility.
中文: 本研究提出了一种基于大语言模型的新型虚拟现实无手柄移动技术,通过自然语言导航实现,与传统方法相比具有相当的可用性并提升了用户参与度。
English: This study introduces a novel hands-free locomotion technique for virtual reality using large language models, which enables natural language navigation and demonstrates comparable usability and enhanced user engagement compared to traditional methods.

Authors:Minqian Liu, Zhiyang Xu, Xinyi Zhang, Heajun An, Sarvech Qadir, Qi Zhang, Pamela J. Wisniewski, Jin-Hee Cho, Sang Won Lee, Ruoxi Jia, Lifu Huang
Title: LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models
Abstract:
Recent advancements in Large Language Models (LLMs) have enabled them to approach human-level persuasion capabilities. However, such potential also raises concerns about the safety risks of LLM-driven persuasion, particularly their potential for unethical influence through manipulation, deception, exploitation of vulnerabilities, and many other harmful tactics. In this work, we present a systematic investigation of LLM persuasion safety through two critical aspects: (1) whether LLMs appropriately reject unethical persuasion tasks and avoid unethical strategies during execution, including cases where the initial persuasion goal appears ethically neutral, and (2) how influencing factors like personality traits and external pressures affect their behavior. To this end, we introduce PersuSafety, the first comprehensive framework for the assessment of persuasion safety which consists of three stages, i.e., persuasion scene creation, persuasive conversation simulation, and persuasion safety assessment. PersuSafety covers 6 diverse unethical persuasion topics and 15 common unethical strategies. Through extensive experiments across 8 widely used LLMs, we observe significant safety concerns in most LLMs, including failing to identify harmful persuasion tasks and leveraging various unethical persuasion strategies. Our study calls for more attention to improve safety alignment in progressive and goal-driven conversations such as persuasion.
大型语言模型存在显著的劝说安全风险,无法有效拒绝不道德任务并频繁使用有害策略,这一发现基于覆盖多种不道德主题和手法的综合评估框架。
Large language models exhibit concerning persuasion safety risks by failing to reject unethical tasks and employing harmful strategies, as revealed through a comprehensive assessment framework covering multiple unethical topics and tactics.

Authors:Micol Spitale, Srikar Babu, Serhan Cakmak, Jiaee Cheong, Hatice Gunes
Title: Exploring Causality for HRI: A Case Study on Robotic Mental Well-being Coaching
Abstract:
One of the primary goals of Human-Robot Interaction (HRI) research is to develop robots that can interpret human behavior and adapt their responses accordingly. Adaptive learning models, such as continual and reinforcement learning, play a crucial role in improving robots' ability to interact effectively in real-world settings. However, these models face significant challenges due to the limited availability of real-world data, particularly in sensitive domains like healthcare and well-being. This data scarcity can hinder a robot's ability to adapt to new situations. To address these challenges, causality provides a structured framework for understanding and modeling the underlying relationships between actions, events, and outcomes. By moving beyond mere pattern recognition, causality enables robots to make more explainable and generalizable decisions. This paper presents an exploratory causality-based analysis through a case study of an adaptive robotic coach delivering positive psychology exercises over four weeks in a workplace setting. The robotic coach autonomously adapts to multimodal human behaviors, such as facial valence and speech duration. By conducting both macro- and micro-level causal analyses, this study aims to gain deeper insights into how adaptability can enhance well-being during interactions. Ultimately, this research seeks to advance our understanding of how causality can help overcome challenges in HRI, particularly in real-world applications.
中文: 人机交互研究通过因果分析开发自适应机器人,以解决现实数据稀缺问题并提升决策能力,如自主机器人教练案例所示,旨在增进交互中的幸福感。
English: Human-Robot Interaction research aims to develop adaptive robots using causal analysis to overcome data scarcity and enhance decision-making in sensitive domains like well-being, as demonstrated through a case study of an autonomous robotic coach.

Authors:Mengdi Wang, Efe Bozkir, Enkelejda Kasneci
Title: Iris Style Transfer: Enhancing Iris Recognition with Style Features and Privacy Preservation through Neural Style Transfer
Abstract:
Iris texture is widely regarded as a gold standard biometric modality for authentication and identification. The demand for robust iris recognition methods, coupled with growing security and privacy concerns regarding iris attacks, has escalated recently. Inspired by neural style transfer, an advanced technique that leverages neural networks to separate content and style features, we hypothesize that iris texture's style features provide a reliable foundation for recognition and are more resilient to variations like rotation and perspective shifts than traditional approaches. Our experimental results support this hypothesis, showing a significantly higher classification accuracy compared to conventional features. Further, we propose using neural style transfer to obfuscate the identifiable iris style features, ensuring the protection of sensitive biometric information while maintaining the utility of eye images for tasks like eye segmentation and gaze estimation. This work opens new avenues for iris-oriented, secure, and privacy-aware biometric systems.
中文: 该研究通过神经风格迁移提取虹膜纹理的风格特征,不仅提升了识别精度和对变化的鲁棒性,还能在保护敏感生物信息隐私的同时,维持图像在眼部分割和视线估计等任务中的实用性。
English: The study demonstrates that using neural style transfer to extract style features from iris textures enhances recognition accuracy and resilience to variations, while also enabling privacy protection by obfuscating sensitive biometric data without compromising utility for other tasks.

Authors:Artin Saberpour Abadian, Yi-Chi Liao, Ata Otaran, Rishabh Dabral, Marie Muehlhaus, Christian Theobalt, Martin Schmitz, Jürgen Steimle
Title: 3HANDS Dataset: Learning from Humans for Generating Naturalistic Handovers with Supernumerary Robotic Limbs
Abstract:
Supernumerary robotic limbs (SRLs) are robotic structures integrated closely with the user's body, which augment human physical capabilities and necessitate seamless, naturalistic human-machine interaction. For effective assistance in physical tasks, enabling SRLs to hand over objects to humans is crucial. Yet, designing heuristic-based policies for robots is time-consuming, difficult to generalize across tasks, and results in less human-like motion. When trained with proper datasets, generative models are powerful alternatives for creating naturalistic handover motions. We introduce 3HANDS, a novel dataset of object handover interactions between a participant performing a daily activity and another participant enacting a hip-mounted SRL in a naturalistic manner. 3HANDS captures the unique characteristics of SRL interactions: operating in intimate personal space with asymmetric object origins, implicit motion synchronization, and the user's engagement in a primary task during the handover. To demonstrate the effectiveness of our dataset, we present three models: one that generates naturalistic handover trajectories, another that determines the appropriate handover endpoints, and a third that predicts the moment to initiate a handover. In a user study (N=10), we compare the handover interaction performed with our method compared to a baseline. The findings show that our method was perceived as significantly more natural, less physically demanding, and more comfortable.
中文: 3HANDS数据集支持生成模型为超限机器人肢体创建自然的交接动作,用户评价显示该方法比基线方法显著更自然、舒适且体力消耗更低。
English: The 3HANDS dataset enables generative models to create naturalistic handover motions for supernumerary robotic limbs, which users rated as significantly more natural, comfortable, and less physically demanding than baseline methods.

Authors:Leixian Shen, Haotian Li, Yifang Wang, Xing Xie, Huamin Qu
Title: Prompting Generative AI with Interaction-Augmented Instructions
Abstract:
The emergence of generative AI (GenAI) models, including large language models and text-to-image models, has significantly advanced the synergy between humans and AI with not only their outstanding capability but more importantly, the intuitive communication method with text prompts. Though intuitive, text-based instructions suffer from natural languages' ambiguous and redundant nature. To address the issue, researchers have explored augmenting text-based instructions with interactions that facilitate precise and effective human intent expression, such as direct manipulation. However, the design strategy of interaction-augmented instructions lacks systematic investigation, hindering our understanding and application. To provide a panorama of interaction-augmented instructions, we propose a framework to analyze related tools from why, when, who, what, and how interactions are applied to augment text-based instructions. Notably, we identify four purposes for applying interactions, including restricting, expanding, organizing, and refining text instructions. The design paradigms for each purpose are also summarized to benefit future researchers and practitioners.
中文摘要:生成式AI模型通过文本提示实现直观的人机交互,但存在语言模糊性问题,研究者通过结合直接操作等交互方式增强指令表达,提出了分析交互目的与设计范式的框架以指导未来研究。
English Summary: Generative AI models enable intuitive human-AI communication through text prompts, but face ambiguity issues that researchers address by augmenting instructions with interactive methods like direct manipulation, leading to a proposed framework analyzing interaction purposes and design paradigms.

Authors:Yunfan Zhou, Xiwen Cai, Qiming Shi, Yanwei Huang, Haotian Li, Huamin Qu, Di Weng, Yingcai Wu
Title: Xavier: Toward Better Coding Assistance in Authoring Tabular Data Wrangling Scripts
Abstract:
Data analysts frequently employ code completion tools in writing custom scripts to tackle complex tabular data wrangling tasks. However, existing tools do not sufficiently link the data contexts such as schemas and values with the code being edited. This not only leads to poor code suggestions, but also frequent interruptions in coding processes as users need additional code to locate and understand relevant data. We introduce Xavier, a tool designed to enhance data wrangling script authoring in computational notebooks. Xavier maintains users' awareness of data contexts while providing data-aware code suggestions. It automatically highlights the most relevant data based on the user's code, integrates both code and data contexts for more accurate suggestions, and instantly previews data transformation results for easy verification. To evaluate the effectiveness and usability of Xavier, we conducted a user study with 16 data analysts, showing its potential to streamline data wrangling scripts authoring.
Chinese: Xavier是一种数据感知的代码补全工具,通过整合数据上下文提供精准建议、自动高亮相关数据并即时预览转换结果,经16位分析师用户研究验证,可有效提升计算笔记本中数据整理脚本的编写效率。
English: Xavier is a data-aware code completion tool that enhances data wrangling in computational notebooks by integrating data contexts for accurate suggestions, automatic highlighting of relevant data, and instant previews of transformations, as validated by a user study with 16 analysts.

Authors:Haotian Li, Yun Wang, Huamin Qu
Title: Reflection on Data Storytelling Tools in the Generative AI Era from the Human-AI Collaboration Perspective
Abstract:
Human-AI collaborative tools attract attentions from the data storytelling community to lower the barrier of expertise and streamline the workflow. The recent advance in large-scale generative AI techniques, e.g., large language models (LLMs) and text-to-image models, has the potential to enhance data storytelling with their power in visual and narration generation. After two years since these techniques were publicly available, it is important to reflect our progress of applying them and have an outlook for future opportunities. To achieve the goal, we compare the collaboration patterns of the latest tools with those of earlier ones using a dedicated framework for understanding human-AI collaboration in data storytelling. Through comparison, we identify persistent collaboration patterns, e.g., human-creator + AI-assistant, and emerging ones, e.g., AI-creator + human-reviewer. The benefits of these AI techniques and other implications to human-AI collaboration are also revealed. We further propose future directions to hopefully ignite innovations.
中文: 人机协作工具利用生成式AI提升数据叙事能力,通过比较新旧协作模式揭示了如AI主导创作与人类审核等新兴趋势,并提出了推动创新的未来研究方向。
English: Human-AI collaborative tools are advancing data storytelling by leveraging generative AI to streamline workflows and introduce new partnership models, such as AI as creator with human oversight, while outlining future research directions.

Authors:Yuying Tang, Haotian Li, Minghe Lan, Xiaojuan Ma, Huamin Qu
Title: Understanding Screenwriters' Practices, Attitudes, and Future Expectations in Human-AI Co-Creation
Abstract:
With the rise of AI technologies and their growing influence in the screenwriting field, understanding the opportunities and concerns related to AI's role in screenwriting is essential for enhancing human-AI co-creation. Through semi-structured interviews with 23 screenwriters, we explored their creative practices, attitudes, and expectations in collaborating with AI for screenwriting. Based on participants' responses, we identified the key stages in which they commonly integrated AI, including story structure & plot development, screenplay text, goal & idea generation, and dialogue. Then, we examined how different attitudes toward AI integration influence screenwriters' practices across various workflow stages and their broader impact on the industry. Additionally, we categorized their expected assistance using four distinct roles of AI: actor, audience, expert, and executor. Our findings provide insights into AI's impact on screenwriting practices and offer suggestions on how AI can benefit the future of screenwriting.
中文: 本研究通过采访23位编剧,探讨了AI在剧本创作中的整合应用,识别出关键合作阶段及AI作为演员、观众、专家和执行者四种角色,以促进人机协同创作的发展。
English: This study investigates AI's integration in screenwriting through interviews with 23 professionals, identifying key collaboration stages and AI's roles as actor, audience, expert, and executor to enhance human-AI co-creation.

Authors:Yanna Lin, Leni Yang, Haotian Li, Huamin Qu, Dominik Moritz
Title: InterLink: Linking Text with Code and Output in Computational Notebooks
Abstract:
Computational notebooks, widely used for ad-hoc analysis and often shared with others, can be difficult to understand because the standard linear layout is not optimized for reading. In particular, related text, code, and outputs may be spread across the UI making it difficult to draw connections. In response, we introduce InterLink, a plugin designed to present the relationships between text, code, and outputs, thereby making notebooks easier to understand. In a formative study, we identify pain points and derive design requirements for identifying and navigating relationships among various pieces of information within notebooks. Based on these requirements, InterLink features a new layout that separates text from code and outputs into two columns. It uses visual links to signal relationships between text and associated code and outputs and offers interactions for navigating related pieces of information. In a user study with 12 participants, those using InterLink were 13.6% more accurate at finding and integrating information from complex analyses in computational notebooks. These results show the potential of notebook layouts that make them easier to understand.
Chinese: InterLink插件通过双栏布局和视觉链接将相关文本、代码和输出关联起来,解决了计算笔记本难以理解的问题,使用户在查找和整合信息时的准确率提高了13.6%。
English: The InterLink plugin addresses the difficulty in understanding computational notebooks by introducing a two-column layout with visual links to connect related text, code, and outputs, which improved users' accuracy in finding and integrating information by 13.6%.

Authors:Leixian Shen, Haotian Li, Yun Wang, Huamin Qu
Title: Reflecting on Design Paradigms of Animated Data Video Tools
Abstract:
Animated data videos have gained significant popularity in recent years. However, authoring data videos remains challenging due to the complexity of creating and coordinating diverse components (e.g., visualization, animation, audio, etc.). Although numerous tools have been developed to streamline the process, there is a lack of comprehensive understanding and reflection of their design paradigms to inform future development. To address this gap, we propose a framework for understanding data video creation tools along two dimensions: what data video components to create and coordinate, including visual, motion, narrative, and audio components, and how to support the creation and coordination. By applying the framework to analyze 46 existing tools, we summarized key design paradigms of creating and coordinating each component based on the varying work distribution for humans and AI in these tools. Finally, we share our detailed reflections, highlight gaps from a holistic view, and discuss future directions to address them.
Chinese: 本文提出一个分析数据视频创作工具的框架,从处理组件和协调方式两个维度出发,通过研究46种工具总结了关键设计模式,并指出了未来发展的方向与改进空间。
English: This paper introduces a framework to analyze data video creation tools by examining what components they handle and how they support their coordination, identifying design paradigms through a review of 46 tools and suggesting future improvements.

Authors:Haotian Li, Lu Ying, Leixian Shen, Yun Wang, Yingcai Wu, Huamin Qu
Title: Composing Data Stories with Meta Relations
Abstract:
To facilitate the creation of compelling and engaging data stories, AI-powered tools have been introduced to automate the three stages in the workflow: analyzing data, organizing findings, and creating visuals. However, these tools rely on data-level information to derive inflexible relations between findings. Therefore, they often create one-size-fits-all data stories. Differently, our formative study reveals that humans heavily rely on meta relations between these findings from diverse domain knowledge and narrative intent, going beyond datasets, to compose their findings into stylized data stories. Such a gap indicates the importance of introducing meta relations to elevate AI-created stories to a satisfactory level. Though necessary, it is still unclear where and how AI should be involved in working with humans on meta relations. To answer the question, we conducted an exploratory user study with Remex, an AI-powered data storytelling tool that suggests meta relations in the analysis stage and applies meta relations for data story organization. The user study reveals various findings about introducing AI for meta relations into the storytelling workflow, such as the benefit of considering meta relations and their diverse expected usage scenarios. Finally, the paper concludes with lessons and suggestions about applying meta relations to compose data stories to hopefully inspire future research.
中文摘要:当前AI工具仅依赖数据层面的关系生成通用数据故事,而人类讲故事者则利用领域知识和叙事意图中的元关系来创作风格化故事,这表明AI需引入元关系以提升数据故事的质量。
English Summary: AI tools currently create generic data stories by relying solely on data-level relations, but human storytellers use meta relations from domain knowledge and narrative intent to craft stylized stories, highlighting the need for AI to incorporate these meta relations for improved storytelling.

Authors:Qixuan Liu, Shi Qiu, Yinqiao Wang, Xiwen Wu, Kenneth Siu Ho Chok, Chi-Wing Fu, Pheng-Ann Heng
Title: Coordinated 2D-3D Visualization of Volumetric Medical Data in XR with Multimodal Interactions
Abstract:
Volumetric medical imaging technologies produce detailed 3D representations of anatomical structures. However, effective medical data visualization and exploration pose significant challenges, especially for individuals with limited medical expertise. We introduce a novel XR-based system with two key innovations: (1) a coordinated visualization module integrating Multi-layered Multi-planar Reconstruction with 3D mesh models and (2) a multimodal interaction framework combining hand gestures with LLM-enabled voice commands. We conduct preliminary evaluations, including a 15-participant user study and expert interviews, to demonstrate the system's abilities to enhance spatial understanding and reduce cognitive load. Experimental results show notable improvements in task completion times, usability metrics, and interaction effectiveness enhanced by LLM-driven voice control. While identifying areas for future refinement, our findings highlight the potential of this immersive visualization system to advance medical training and clinical practice. Our demo application and supplemental materials are available for download at: https://osf.io/bpjq5/.
中文: 本文介绍了一种创新的XR系统,通过整合多层可视化与手势及LLM语音控制的多模态交互,用户研究证明其能有效提升医学应用中的空间理解能力并降低认知负荷。
English: This paper presents an innovative XR system that integrates multi-layered visualization with multimodal interaction using hand gestures and LLM-powered voice commands, demonstrating improved spatial understanding and reduced cognitive load in medical applications through user studies.

Authors:Ziyi Zhang, Zhen Sun, Zongmin Zhang, Zifan Peng, Yuemeng Zhao, Zichun Wang, Zeren Luo, Ruiting Zuo, Xinlei He
Title: "I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
Abstract:
The visually impaired population, especially the severely visually impaired, is currently large in scale, and daily activities pose significant challenges for them. Although many studies use large language and vision-language models to assist the blind, most focus on static content and fail to meet real-time perception needs in dynamic and complex environments, such as daily activities. To provide them with more effective intelligent assistance, it is imperative to incorporate advanced visual understanding technologies. Although real-time vision and speech interaction VideoLLMs demonstrate strong real-time visual understanding, no prior work has systematically evaluated their effectiveness in assisting visually impaired individuals. In this work, we conduct the first such evaluation. First, we construct a benchmark dataset (VisAssistDaily), covering three categories of assistive tasks for visually impaired individuals: Basic Skills, Home Life Tasks, and Social Life Tasks. The results show that GPT-4o achieves the highest task success rate. Next, we conduct a user study to evaluate the models in both closed-world and open-world scenarios, further exploring the practical challenges of applying VideoLLMs in assistive contexts. One key issue we identify is the difficulty current models face in perceiving potential hazards in dynamic environments. To address this, we build an environment-awareness dataset named SafeVid and introduce a polling mechanism that enables the model to proactively detect environmental risks. We hope this work provides valuable insights and inspiration for future research in this field.
中文: 本研究评估了实时视觉语言模型在辅助视障人士方面的有效性,发现其在动态环境中感知潜在危险存在困难,并通过构建新数据集和引入轮询机制提出解决方案,以增强环境安全意识。
English: This study evaluates the effectiveness of real-time vision-language models in assisting visually impaired individuals, identifying challenges in dynamic hazard perception and proposing solutions through a new dataset and polling mechanism to enhance environmental safety awareness.

Authors:Wenhan Dong, Yuemeng Zhao, Zhen Sun, Yule Liu, Zifan Peng, Jingyi Zheng, Zongmin Zhang, Ziyi Zhang, Jun Wu, Ruiming Wang, Shengmin Xu, Xinyi Huang, Xinlei He
Title: Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications
Abstract:
As large language models (LLMs) are increasingly used in human-centered tasks, assessing their psychological traits is crucial for understanding their social impact and ensuring trustworthy AI alignment. While existing reviews have covered some aspects of related research, several important areas have not been systematically discussed, including detailed discussions of diverse psychological tests, LLM-specific psychological datasets, and the applications of LLMs with psychological traits. To address this gap, we systematically review six key dimensions of applying psychological theories to LLMs: (1) assessment tools; (2) LLM-specific datasets; (3) evaluation metrics (consistency and stability); (4) empirical findings; (5) personality simulation methods; and (6) LLM-based behavior simulation. Our analysis highlights both the strengths and limitations of current methods. While some LLMs exhibit reproducible personality patterns under specific prompting schemes, significant variability remains across tasks and settings. Recognizing methodological challenges such as mismatches between psychological tools and LLMs' capabilities, as well as inconsistencies in evaluation practices, this study aims to propose future directions for developing more interpretable, robust, and generalizable psychological assessment frameworks for LLMs.
中文: 本文系统梳理了心理学理论在大型语言模型中的六个应用维度,既肯定了特定条件下模型人格特征的可复现性,也指出了当前评估方法存在工具适配性与评估一致性等关键挑战。
English: This review systematically examines six key dimensions of applying psychological theories to large language models, highlighting both their reproducible personality patterns under specific conditions and the methodological challenges in current assessment frameworks.

Authors:Chandan Kumar Sah, Xiaoli Lian, Tony Xu, Li Zhang
Title: FairEval: Evaluating Fairness in LLM-Based Recommendations with Personality Awareness
Abstract:
Recent advances in Large Language Models (LLMs) have enabled their application to recommender systems (RecLLMs), yet concerns remain regarding fairness across demographic and psychological user dimensions. We introduce FairEval, a novel evaluation framework to systematically assess fairness in LLM-based recommendations. FairEval integrates personality traits with eight sensitive demographic attributes,including gender, race, and age, enabling a comprehensive assessment of user-level bias. We evaluate models, including ChatGPT 4o and Gemini 1.5 Flash, on music and movie recommendations. FairEval's fairness metric, PAFS, achieves scores up to 0.9969 for ChatGPT 4o and 0.9997 for Gemini 1.5 Flash, with disparities reaching 34.79 percent. These results highlight the importance of robustness in prompt sensitivity and support more inclusive recommendation systems.

Authors:Yue Qiu, Yuqi Tong, Yu Zhang, Qixuan Liu, Jialun Pei, Shi Qiu, Pheng-Ann Heng, Chi-Wing Fu
Title: CvhSlicer 2.0: Immersive and Interactive Visualization of Chinese Visible Human Data in XR Environments
Abstract:
The study of human anatomy through advanced visualization techniques is crucial for medical research and education. In this work, we introduce CvhSlicer 2.0, an innovative XR system designed for immersive and interactive visualization of the Chinese Visible Human (CVH) dataset. Particularly, our proposed system operates entirely on a commercial XR headset, offering a range of visualization and interaction tools for dynamic 2D and 3D data exploration. By conducting comprehensive evaluations, our CvhSlicer 2.0 demonstrates strong capabilities in visualizing anatomical data, enhancing user engagement and improving educational effectiveness. A demo video is available at https://youtu.be/CfR72S_0N-4
中文: CvhSlicer 2.0 是一款创新的扩展现实系统,能在商用头显设备上实现对中国人可视化数据集的沉浸式可视化与交互操作,显著提升了人体解剖学研究的参与度和教学效果。
English: CvhSlicer 2.0 is an innovative XR system that provides immersive visualization and interactive tools for exploring the Chinese Visible Human dataset on commercial headsets, enhancing both user engagement and educational outcomes in anatomy studies.

Authors:Kazuhiro Sasabuchi, Naoki Wake, Atsushi Kanehira, Jun Takamatsu, Katsushi Ikeuchi
Title: Agreeing to Interact in Human-Robot Interaction using Large Language Models and Vision Language Models
Abstract:
In human-robot interaction (HRI), the beginning of an interaction is often complex. Whether the robot should communicate with the human is dependent on several situational factors (e.g., the current human's activity, urgency of the interaction, etc.). We test whether large language models (LLM) and vision language models (VLM) can provide solutions to this problem. We compare four different system-design patterns using LLMs and VLMs, and test on a test set containing 84 human-robot situations. The test set mixes several publicly available datasets and also includes situations where the appropriate action to take is open-ended. Our results using the GPT-4o and Phi-3 Vision model indicate that LLMs and VLMs are capable of handling interaction beginnings when the desired actions are clear, however, challenge remains in the open-ended situations where the model must balance between the human and robot situation.
中文摘要:大型语言模型和视觉语言模型(如GPT-4o和Phi-3 Vision)能有效处理目标明确的人机交互启动,但在需要平衡人类与机器人情境的开放式场景中仍面临挑战。
English Summary: Large language and vision models like GPT-4o and Phi-3 Vision can effectively initiate human-robot interactions in clear scenarios but struggle with open-ended situations requiring nuanced balance between human and robot contexts.

Authors:Jasper Roe, Mike Perkins, Klaire Somoray, Dan Miller, Leon Furze
Title: To Deepfake or Not to Deepfake: Higher Education Stakeholders' Perceptions and Intentions towards Synthetic Media
Abstract:
Advances in deepfake technologies, which use generative artificial intelligence (GenAI) to mimic a person's likeness or voice, have led to growing interest in their use in educational contexts. However, little is known about how key stakeholders perceive and intend to use these tools. This study investigated higher education stakeholder perceptions and intentions regarding deepfakes through the lens of the Unified Theory of Acceptance and Use of Technology 2 (UTAUT2). Using a mixed-methods approach combining survey data (n=174) with qualitative interviews, we found that academic stakeholders demonstrated a relatively low intention to adopt these technologies (M=41.55, SD=34.14) and held complex views about their implementation. Quantitative analysis revealed adoption intentions were primarily driven by hedonic motivation, with a gender-specific interaction in price-value evaluations. Qualitative findings highlighted potential benefits of enhanced student engagement, improved accessibility, and reduced workload in content creation, but concerns regarding the exploitation of academic labour, institutional cost-cutting leading to automation, degradation of relationships in education, and broader societal impacts. Based on these findings, we propose a framework for implementing deepfake technologies in higher education that addresses institutional policies, professional development, and equitable resource allocation to thoughtfully integrate AI while maintaining academic integrity and professional autonomy.
中文摘要:本研究揭示了高等教育利益相关者对深度伪造技术采纳意愿较低且持有复杂观点,主要受享乐动机驱动并担忧学术劳动剥削及机构自动化,据此提出了维护学术诚信与专业自主权的伦理整合框架。
English Summary: This study explores higher education stakeholders' low adoption intentions and complex views on deepfake technologies, driven by hedonic motivation and concerns over academic labor exploitation and institutional automation, proposing a framework for ethical integration in academia.

Authors:Hasan Abu-Rasheed, Constance Jumbo, Rashed Al Amin, Christian Weber, Veit Wiese, Roman Obermaisser, Madjid Fathi
Title: LLM-Assisted Knowledge Graph Completion for Curriculum and Domain Modelling in Personalized Higher Education Recommendations
Abstract:
While learning personalization offers great potential for learners, modern practices in higher education require a deeper consideration of domain models and learning contexts, to develop effective personalization algorithms. This paper introduces an innovative approach to higher education curriculum modelling that utilizes large language models (LLMs) for knowledge graph (KG) completion, with the goal of creating personalized learning-path recommendations. Our research focuses on modelling university subjects and linking their topics to corresponding domain models, enabling the integration of learning modules from different faculties and institutions in the student's learning path. Central to our approach is a collaborative process, where LLMs assist human experts in extracting high-quality, fine-grained topics from lecture materials. We develop a domain, curriculum, and user models for university modules and stakeholders. We implement this model to create the KG from two study modules: Embedded Systems and Development of Embedded Systems Using FPGA. The resulting KG structures the curriculum and links it to the domain models. We evaluate our approach through qualitative expert feedback and quantitative graph quality metrics. Domain experts validated the relevance and accuracy of the model, while the graph quality metrics measured the structural properties of our KG. Our results show that the LLM-assisted graph completion approach enhances the ability to connect related courses across disciplines to personalize the learning experience. Expert feedback also showed high acceptance of the proposed collaborative approach for concept extraction and classification.

Authors:Naoki Wake, Atsushi Kanehira, Jun Takamatsu, Kazuhiro Sasabuchi, Katsushi Ikeuchi
Title: VLM-driven Behavior Tree for Context-aware Task Planning
Abstract:
The use of Large Language Models (LLMs) for generating Behavior Trees (BTs) has recently gained attention in the robotics community, yet remains in its early stages of development. In this paper, we propose a novel framework that leverages Vision-Language Models (VLMs) to interactively generate and edit BTs that address visual conditions, enabling context-aware robot operations in visually complex environments. A key feature of our approach lies in the conditional control through self-prompted visual conditions. Specifically, the VLM generates BTs with visual condition nodes, where conditions are expressed as free-form text. Another VLM process integrates the text into its prompt and evaluates the conditions against real-world images during robot execution. We validated our framework in a real-world cafe scenario, demonstrating both its feasibility and limitations.
中文: 本文提出了一种新颖框架,利用视觉语言模型交互式生成和编辑带有视觉条件节点的行为树,使机器人能在视觉复杂环境中进行情境感知操作,并在真实咖啡馆场景中验证了其可行性。
English: This paper introduces a novel framework that uses Vision-Language Models to interactively generate and edit Behavior Trees with visual condition nodes, enabling context-aware robot operations in visually complex environments, as validated in a real-world cafe scenario.

Authors:Adnan Qidwai, Srija Mukhopadhyay, Prerana Khatiwada, Dan Roth, Vivek Gupta
Title: PRAISE: Enhancing Product Descriptions with LLM-Driven Structured Insights
Abstract:
Accurate and complete product descriptions are crucial for e-commerce, yet seller-provided information often falls short. Customer reviews offer valuable details but are laborious to sift through manually. We present PRAISE: Product Review Attribute Insight Structuring Engine, a novel system that uses Large Language Models (LLMs) to automatically extract, compare, and structure insights from customer reviews and seller descriptions. PRAISE provides users with an intuitive interface to identify missing, contradictory, or partially matching details between these two sources, presenting the discrepancies in a clear, structured format alongside supporting evidence from reviews. This allows sellers to easily enhance their product listings for clarity and persuasiveness, and buyers to better assess product reliability. Our demonstration showcases PRAISE's workflow, its effectiveness in generating actionable structured insights from unstructured reviews, and its potential to significantly improve the quality and trustworthiness of e-commerce product catalogs.
中文: PRAISE系统利用大型语言模型自动从客户评论和卖家描述中提取并结构化信息,帮助用户识别差异,从而提升电商产品列表的清晰度和可信度。
English: PRAISE is a novel system utilizing Large Language Models to automatically extract and structure insights from customer reviews and seller descriptions, enabling users to identify discrepancies and improve e-commerce product listings for enhanced clarity and trustworthiness.

Authors:Taylor Lynn Curtis, Maximilian Puelma Touzel, William Garneau, Manon Gruaz, Mike Pinder, Li Wei Wang, Sukanya Krishna, Luda Cohen, Jean-François Godbout, Reihaneh Rabbany, Kellin Pelrine
Title: Veracity: An Open-Source AI Fact-Checking System
Abstract:
The proliferation of misinformation poses a significant threat to society, exacerbated by the capabilities of generative AI. This demo paper introduces Veracity, an open-source AI system designed to empower individuals to combat misinformation through transparent and accessible fact-checking. Veracity leverages the synergy between Large Language Models (LLMs) and web retrieval agents to analyze user-submitted claims and provide grounded veracity assessments with intuitive explanations. Key features include multilingual support, numerical scoring of claim veracity, and an interactive interface inspired by familiar messaging applications. This paper will showcase Veracity's ability to not only detect misinformation but also explain its reasoning, fostering media literacy and promoting a more informed society.
中文:Veracity是一个开源AI系统,它结合大型语言模型与网络检索技术,通过多语言支持和直观解释来核实信息真伪,旨在打击虚假信息并提升公众媒介素养。
English: Veracity is an open-source AI system that uses large language models and web retrieval to fact-check claims with multilingual support and intuitive explanations, aiming to combat misinformation and enhance media literacy.

Authors:Ruben Weijers, Denton Wu, Hannah Betts, Tamara Jacod, Yuxiang Guan, Vidya Sujaya, Kushal Dev, Toshali Goel, William Delooze, Reihaneh Rabbany, Ying Wu, Jean-François Godbout, Kellin Pelrine
Title: From Intuition to Understanding: Using AI Peers to Overcome Physics Misconceptions
Abstract:
Generative AI has the potential to transform personalization and accessibility of education. However, it raises serious concerns about accuracy and helping students become independent critical thinkers. In this study, we designed a helpful AI "Peer" to help students correct fundamental physics misconceptions related to Newtonian mechanic concepts. In contrast to approaches that seek near-perfect accuracy to create an authoritative AI tutor or teacher, we directly inform students that this AI can answer up to 40% of questions incorrectly. In a randomized controlled trial with 165 students, those who engaged in targeted dialogue with the AI Peer achieved post-test scores that were, on average, 10.5 percentage points higher - with over 20 percentage points higher normalized gain - than a control group that discussed physics history. Qualitative feedback indicated that 91% of the treatment group's AI interactions were rated as helpful. Furthermore, by comparing student performance on pre- and post-test questions about the same concept, along with experts' annotations of the AI interactions, we find initial evidence suggesting the improvement in performance does not depend on the correctness of the AI. With further research, the AI Peer paradigm described here could open new possibilities for how we learn, adapt to, and grow with AI.
中文: 本研究设计了一个明确告知存在40%错误率的AI"同伴",尽管不完美,却使学生的物理成绩提高了10.5个百分点,表明学习成效可能不依赖于AI回答的正确性。
English: This study introduces an intentionally imperfect AI "Peer" that improved students' physics scores by 10.5 percentage points despite disclosing its 40% error rate, suggesting learning gains may not depend on AI correctness.

Authors:Stephen Meisenbacher, Alexandra Klymenko, Alexander Karpp, Florian Matthes
Title: Investigating User Perspectives on Differentially Private Text Privatization
Abstract:
Recent literature has seen a considerable uptick in $\textit{Differentially Private Natural Language Processing}$ (DP NLP). This includes DP text privatization, where potentially sensitive input texts are transformed under DP to achieve privatized output texts that ideally mask sensitive information $\textit{and}$ maintain original semantics. Despite continued work to address the open challenges in DP text privatization, there remains a scarcity of work addressing user perceptions of this technology, a crucial aspect which serves as the final barrier to practical adoption. In this work, we conduct a survey study with 721 laypersons around the globe, investigating how the factors of $\textit{scenario}$, $\textit{data sensitivity}$, $\textit{mechanism type}$, and $\textit{reason for data collection}$ impact user preferences for text privatization. We learn that while all these factors play a role in influencing privacy decisions, users are highly sensitive to the utility and coherence of the private output texts. Our findings highlight the socio-technical factors that must be considered in the study of DP NLP, opening the door to further user-based investigations going forward.
Chinese: 近期差分隐私自然语言处理研究显著增多,但用户对该技术的认知仍待探索;一项针对全球721名参与者的调查显示,尽管场景和数据敏感性等因素影响隐私偏好,用户最关注的是隐私化文本输出的实用性和连贯性。
English: Recent research in differentially private natural language processing has increased, yet user perceptions of this technology remain underexplored; a global survey of 721 participants reveals that while factors like scenario and data sensitivity influence privacy preferences, users prioritize the utility and coherence of privatized text outputs.

Authors:Christoph Treude, Raula Gaikovina Kula
Title: Interacting with AI Reasoning Models: Harnessing "Thoughts" for AI-Driven Software Engineering
Abstract:
Recent advances in AI reasoning models provide unprecedented transparency into their decision-making processes, transforming them from traditional black-box systems into models that articulate step-by-step chains of thought rather than producing opaque outputs. This shift has the potential to improve software quality, explainability, and trust in AI-augmented development. However, software engineers rarely have the time or cognitive bandwidth to analyze, verify, and interpret every AI-generated thought in detail. Without an effective interface, this transparency could become a burden rather than a benefit. In this paper, we propose a vision for structuring the interaction between AI reasoning models and software engineers to maximize trust, efficiency, and decision-making power. We argue that simply exposing AI's reasoning is not enough -- software engineers need tools and frameworks that selectively highlight critical insights, filter out noise, and facilitate rapid validation of key assumptions. To illustrate this challenge, we present motivating examples in which AI reasoning models state their assumptions when deciding which external library to use and produce divergent reasoning paths and recommendations about security vulnerabilities, highlighting the need for an interface that prioritizes actionable insights while managing uncertainty and resolving conflicts. We then outline a research roadmap for integrating automated summarization, assumption validation, and multi-model conflict resolution into software engineering workflows. Achieving this vision will unlock the full potential of AI reasoning models to enable software engineers to make faster, more informed decisions without being overwhelmed by unnecessary detail.
Chinese: 人工智能推理模型现在能提供清晰的逐步决策过程,但若缺乏有效界面来突出关键见解并过滤干扰信息,这种透明度反而可能使软件工程师不堪重负而非提升工作效率。
English: AI reasoning models now offer clear step-by-step decision processes, but without effective interfaces to highlight key insights and filter noise, this transparency risks overwhelming software engineers instead of enhancing their work.

Authors:Jenny T. Liang, Aayush Kumar, Yasharth Bajpai, Sumit Gulwani, Vu Le, Chris Parnin, Arjun Radhakrishna, Ashish Tiwari, Emerson Murphy-Hill, Guastavo Soares
Title: TableTalk: Scaffolding Spreadsheet Development with a Language Agent
Abstract:
Spreadsheet programming is challenging. Programmers use spreadsheet programming knowledge (e.g., formulas) and problem-solving skills to combine actions into complex tasks. Advancements in large language models have introduced language agents that observe, plan, and perform tasks, showing promise for spreadsheet creation. We present TableTalk, a spreadsheet programming agent embodying three design principles -- scaffolding, flexibility, and incrementality -- derived from studies with seven spreadsheet programmers and 85 Excel templates. TableTalk guides programmers through structured plans based on professional workflows, generating three potential next steps to adapt plans to programmer needs. It uses pre-defined tools to generate spreadsheet components and incrementally build spreadsheets. In a study with 20 programmers, TableTalk produced higher-quality spreadsheets 2.3 times more likely to be preferred than the baseline. It reduced cognitive load and thinking time by 12.6%. From this, we derive design guidelines for agentic spreadsheet programming tools and discuss implications on spreadsheet programming, end-user programming, AI-assisted programming, and human-agent collaboration.
中文: TableTalk是一种基于人工智能的电子表格编程助手,通过结构化指导和降低认知负荷来改进电子表格制作,为程序员带来更高质量的输出和更高的工作效率。
English: TableTalk is an AI-driven spreadsheet programming agent that enhances spreadsheet creation by providing structured guidance and reducing cognitive load, resulting in higher-quality outputs and increased efficiency for programmers.

Authors:Tomasz Michalski, Adam Wróbel, Andrea Bontempelli, Jakub Luśtyk, Mikolaj Kniejski, Stefano Teso, Andrea Passerini, Bartosz Zieliński, Dawid Rymarczyk
Title: Personalized Interpretability -- Interactive Alignment of Prototypical Parts Networks
Abstract:
Concept-based interpretable neural networks have gained significant attention due to their intuitive and easy-to-understand explanations based on case-based reasoning, such as "this bird looks like those sparrows". However, a major limitation is that these explanations may not always be comprehensible to users due to concept inconsistency, where multiple visual features are inappropriately mixed (e.g., a bird's head and wings treated as a single concept). This inconsistency breaks the alignment between model reasoning and human understanding. Furthermore, users have specific preferences for how concepts should look, yet current approaches provide no mechanism for incorporating their feedback. To address these issues, we introduce YoursProtoP, a novel interactive strategy that enables the personalization of prototypical parts - the visual concepts used by the model - according to user needs. By incorporating user supervision, YoursProtoP adapts and splits concepts used for both prediction and explanation to better match the user's preferences and understanding. Through experiments on both the synthetic FunnyBirds dataset and a real-world scenario using the CUB, CARS, and PETS datasets in a comprehensive user study, we demonstrate the effectiveness of YoursProtoP in achieving concept consistency without compromising the accuracy of the model.
Chinese: YoursProtoP提出了一种交互式策略,通过个性化原型部分来解决概念不一致问题,并整合用户反馈,使模型推理与人类理解保持一致,同时保持准确性。
English: YoursProtoP introduces an interactive strategy to personalize prototypical parts in concept-based neural networks, addressing concept inconsistency and incorporating user feedback to align model reasoning with human understanding while maintaining accuracy.

Authors:Tim Engelbracht, Petar Lukovic, Tjark Behrens, Kai Lascheit, René Zurbrügg, Marc Pollefeys, Hermann Blum, Zuria Bauer
Title: Spot-On: A Mixed Reality Interface for Multi-Robot Cooperation
Abstract:
Recent progress in mixed reality (MR) and robotics is enabling increasingly sophisticated forms of human-robot collaboration. Building on these developments, we introduce a novel MR framework that allows multiple quadruped robots to operate in semantically diverse environments via a MR interface. Our system supports collaborative tasks involving drawers, swing doors, and higher-level infrastructure such as light switches. A comprehensive user study verifies both the design and usability of our app, with participants giving a "good" or "very good" rating in almost all cases. Overall, our approach provides an effective and intuitive framework for MR-based multi-robot collaboration in complex, real-world scenarios.
中文: 本文提出了一种创新的混合现实框架,使多个四足机器人能够在多样化环境中通过混合现实界面协作完成任务,用户研究验证了其良好的可用性。
English: This paper presents a novel mixed reality framework enabling multiple quadruped robots to collaboratively perform tasks in diverse environments, validated by a user study showing high usability ratings.

Authors:Qing Xiao, Rongyi Chen, Jingjia Xiao, Tianyang Fu, Alice Qian Zhang, Xianzhe Fan, Bingbing Zhang, Zhicong Lu, Hong Shen
Title: Institutionalizing Folk Theories of Algorithms: How Multi-Channel Networks (MCNs) Govern Algorithmic Labor in Chinese Live-Streaming Industry
Abstract:
As algorithmic systems increasingly structure platform labor, workers often rely on informal "folk theories", experience-based beliefs about how algorithms work, to navigate opaque and unstable algorithmic environments. Prior research has largely treated these theories as bottom-up, peer-driven strategies for coping with algorithmic opacity and uncertainty. In this study, we shift analytical attention to intermediary organizations and examine how folk theories of algorithms can be institutionally constructed and operationalized by those organizations as tools of labor management. Drawing on nine months of ethnographic fieldwork and 37 interviews with live-streamers and staff at Multi-Channel Networks (MCNs) in China, we show that MCNs develop and circulate dual algorithmic theories: internally, they acknowledge the volatility of platform systems and adopt probabilistic strategies to manage risk; externally, they promote simplified, prescriptive theories portraying the algorithm as transparent, fair, and responsive to individual effort. They have further operationalize those folk theories for labor management, encouraging streamers to self-discipline and invest in equipment, training, and routines, while absolving MCNs of accountability. We contribute to CSCW and platform labor literature by demonstrating how informal algorithmic knowledge, once institutionalized, can become infrastructures of soft control -- shaping not only how workers interpret platform algorithms, but also how their labor is structured, moralized and governed.

Authors:Rudrajit Choudhuri, Bianca Trinkenreich, Rahul Pandita, Eirini Kalliamvakou, Igor Steinmacher, Marco Gerosa, Christopher Sanchez, Anita Sarma
Title: What Needs Attention? Prioritizing Drivers of Developers' Trust and Adoption of Generative AI
Abstract:
Generative AI (genAI) tools are advertised as productivity aids. Yet, issues related to miscalibrated trust and usage friction continue to hinder their adoption. Additionally, AI can be exclusionary, failing to support diverse users adequately, further exacerbating these concerns. One such aspect of diversity is cognitive diversity -- variations in users' cognitive styles -- that leads to divergence in interaction styles. When an individual's cognitive styles are unsupported, it creates additional barriers to technology adoption. Thus, to design tools that developers trust, we must first understand what factors affect their trust and intentions to use these tools in practice? We developed a theoretical model of factors influencing trust and adoption intentions towards genAI through a large-scale survey with developers (N=238) at GitHub and Microsoft. Using Partial Least Squares-Structural Equation Modeling (PLS-SEM), we found that genAI's system/output quality, functional value, and goal maintenance significantly influence developers' trust, which along with their cognitive styles, affects their intentions to use these tools in work. An Importance-Performance Matrix Analysis (IPMA) identified factors that, despite their strong influence, underperform, revealing specific genAI aspects that need design prioritization. We bolster these findings by qualitatively analyzing developers' perceived challenges and risks of genAI usage to uncover why these gaps persist in development contexts. For genAI to indeed be a true productivity aid rather than a disguised productivity sink, it must align with developers' goals, maintain contextual transparency, reduce cognitive burden, and provide equitable interaction support. We provide practical suggestions to guide future genAI tool design for effective, trustworthy, and inclusive human-genAI interactions.
中文: 生成式AI工具的采用受制于信任失调和使用摩擦,认知多样性是关键因素,开发者调查表明系统质量、功能价值和目标维护影响信任与使用意愿,需通过设计优化实现公平交互支持。
English: Generative AI tools face adoption barriers due to trust miscalibration and usage friction, with cognitive diversity being a key factor, as shown by a developer survey revealing that system quality, functional value, and goal maintenance influence trust and usage intentions, necessitating design improvements for equitable support.

Authors:Yue Wu, Yibo Guo, Yulong Yan, Jiancheng Yang, Xin Zhou, Ching-Yu Cheng, Danli Shi, Mingguang He
Title: AI-powered virtual eye: perspective, challenges and opportunities
Abstract:
We envision the "virtual eye" as a next-generation, AI-powered platform that uses interconnected foundation models to simulate the eye's intricate structure and biological function across all scales. Advances in AI, imaging, and multiomics provide a fertile ground for constructing a universal, high-fidelity digital replica of the human eye. This perspective traces the evolution from early mechanistic and rule-based models to contemporary AI-driven approaches, integrating in a unified model with multimodal, multiscale, dynamic predictive capabilities and embedded feedback mechanisms. We propose a development roadmap emphasizing the roles of large-scale multimodal datasets, generative AI, foundation models, agent-based architectures, and interactive interfaces. Despite challenges in interpretability, ethics, data processing and evaluation, the virtual eye holds the potential to revolutionize personalized ophthalmic care and accelerate research into ocular health and disease.
中文: “虚拟眼”是一个基于人工智能的平台,通过整合多模态、多尺度的数据构建动态的人眼数字模型,有望革新个性化眼科诊疗并加速眼部疾病研究,尽管面临伦理和数据处理的挑战。
English: The "virtual eye" is a proposed AI-driven platform that integrates multimodal, multiscale data to create a dynamic digital replica of the human eye, aiming to transform personalized ophthalmology and accelerate ocular research despite challenges in ethics and data processing.

Authors:Anikait Singh, Sheryl Hsu, Kyle Hsu, Eric Mitchell, Stefano Ermon, Tatsunori Hashimoto, Archit Sharma, Chelsea Finn
Title: FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users
Abstract:
Effective personalization of LLMs is critical for a broad range of user-interfacing applications such as virtual assistants and content curation. Inspired by the strong in-context learning capabilities of LLMs, we propose Few-Shot Preference Optimization (FSPO), which reframes reward modeling as a meta-learning problem. Under this framework, an LLM learns to quickly adapt to a user via a few labeled preferences from that user, constructing a personalized reward function for them. Additionally, since real-world preference data is scarce and challenging to collect at scale, we propose careful design choices to construct synthetic preference datasets for personalization, generating over 1M synthetic personalized preferences using publicly available LLMs. In particular, to successfully transfer from synthetic data to real users, we find it crucial for the data to exhibit both high diversity and coherent, self-consistent structure. We evaluate FSPO on personalized open-ended generation for up to 1,500 synthetic users across across three domains: movie reviews, pedagogical adaptation based on educational background, and general question answering, along with a controlled human study. Overall, FSPO achieves an 87% Alpaca Eval winrate on average in generating responses that are personalized to synthetic users and a 72% winrate with real human users in open-ended question answering.
中文: 少样本偏好优化(FSPO)通过少量用户偏好数据使大语言模型快速个性化适配,在合成与真实用户测试中均实现了高胜率的定制化生成效果。
English: Few-Shot Preference Optimization (FSPO) enables LLMs to quickly adapt to individual users through minimal preference data, achieving high personalization success rates in both synthetic and human evaluations.

Authors:Tim Schreiter, Andrey Rudenko, Jens V. Rüppel, Martin Magnusson, Achim J. Lilienthal
Title: Multimodal Interaction and Intention Communication for Industrial Robots
Abstract:
Successful adoption of industrial robots will strongly depend on their ability to safely and efficiently operate in human environments, engage in natural communication, understand their users, and express intentions intuitively while avoiding unnecessary distractions. To achieve this advanced level of Human-Robot Interaction (HRI), robots need to acquire and incorporate knowledge of their users' tasks and environment and adopt multimodal communication approaches with expressive cues that combine speech, movement, gazes, and other modalities. This paper presents several methods to design, enhance, and evaluate expressive HRI systems for non-humanoid industrial robots. We present the concept of a small anthropomorphic robot communicating as a proxy for its non-humanoid host, such as a forklift. We developed a multimodal and LLM-enhanced communication framework for this robot and evaluated it in several lab experiments, using gaze tracking and motion capture to quantify how users perceive the robot and measure the task progress.
中文摘要:工业机器人需通过多模态交互和用户认知实现安全直观的人机协作,本文通过开发具备大语言模型增强功能的拟人代理机器人,并利用视线追踪与动作捕捉进行评估验证。
English Summary: Industrial robots must achieve safe, intuitive human-robot interaction through multimodal communication and user understanding, as demonstrated by this paper's development of an anthropomorphic proxy robot with LLM-enhanced framework evaluated through gaze tracking and motion capture.

Authors:Ahmed Heakl, Abdullah Sohail, Mukul Ranjan, Rania Hossam, Ghazi Shazan Ahmad, Mohamed El-Geish, Omar Maher, Zhiqiang Shen, Fahad Khan, Salman Khan
Title: KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
Abstract:
With the growing adoption of Retrieval-Augmented Generation (RAG) in document processing, robust text recognition has become increasingly critical for knowledge extraction. While OCR (Optical Character Recognition) for English and other languages benefits from large datasets and well-established benchmarks, Arabic OCR faces unique challenges due to its cursive script, right-to-left text flow, and complex typographic and calligraphic features. We present KITAB-Bench, a comprehensive Arabic OCR benchmark that fills the gaps in current evaluation systems. Our benchmark comprises 8,809 samples across 9 major domains and 36 sub-domains, encompassing diverse document types including handwritten text, structured tables, and specialized coverage of 21 chart types for business intelligence. Our findings show that modern vision-language models (such as GPT-4o, Gemini, and Qwen) outperform traditional OCR approaches (like EasyOCR, PaddleOCR, and Surya) by an average of 60% in Character Error Rate (CER). Furthermore, we highlight significant limitations of current Arabic OCR models, particularly in PDF-to-Markdown conversion, where the best model Gemini-2.0-Flash achieves only 65% accuracy. This underscores the challenges in accurately recognizing Arabic text, including issues with complex fonts, numeral recognition errors, word elongation, and table structure detection. This work establishes a rigorous evaluation framework that can drive improvements in Arabic document analysis methods and bridge the performance gap with English OCR technologies.
中文: KITAB-Bench基准通过证明现代视觉语言模型比传统OCR系统准确率高出60%,填补了阿拉伯语OCR评估的关键空白,同时揭示了复杂文本识别和文档转换中持续存在的挑战。
English: The KITAB-Bench benchmark addresses critical gaps in Arabic OCR evaluation by demonstrating that modern vision-language models outperform traditional OCR systems by 60% in accuracy, while revealing persistent challenges in complex text recognition and document conversion.

Authors:Italo Santos, Katia Romero Felizardo, Anita Sarma, Igor Steinmacher, Marco A. Gerosa
Title: OSSDoorway: A Gamified Environment to Scaffold Student Contributions to Open Source Software
Abstract:
Software engineering courses enable practical learning through assignments requiring contributions to open source software (OSS), allowing students to experience real-world projects, collaborate with global communities, and develop skills and competencies required to succeed in the tech industry. Learning software engineering through open source contribution integrates theory with hands-on practice, as students tackle real challenges in collaborative environments. However, students often struggle to contribute to OSS projects and do not understand the contribution process. Research has demonstrated that strategically incorporating game elements can promote student learning and engagement. This paper proposes and evaluates OSSDoorway, a tool designed to guide students contributing to OSS projects. We recruited 29 students and administered a self-efficacy questionnaire before and after their use of OSSDoorway, along with qualitative feedback to assess challenges, interface features, and suggestions for improvement. The results show that OSSDoorway boosts students' self-efficacy and provides a structured, gamified learning experience. Clear instructions, real-time feedback, and the quest-based system helped students navigate tasks like using GitHub features to submit pull requests and collaborating with the community. Our findings suggest that providing students with a supportive gamified environment that uses feedback and structured quests can help them navigate the OSS contribution process.
中文: OSSDoorway 是一款游戏化工具,通过结构化任务和实时反馈增强学生的自我效能感,指导他们完成开源贡献流程,有效应对软件工程教育中的实践挑战。
English: OSSDoorway is a gamified tool that enhances students' self-efficacy and guides them through the open source contribution process with structured quests and real-time feedback, addressing challenges in software engineering education.

Authors:Tim Schreiter, Jens V. Rüppel, Rishi Hazra, Andrey Rudenko, Martin Magnusson, Achim J. Lilienthal
Title: Evaluating Efficiency and Engagement in Scripted and LLM-Enhanced Human-Robot Interactions
Abstract:
To achieve natural and intuitive interaction with people, HRI frameworks combine a wide array of methods for human perception, intention communication, human-aware navigation and collaborative action. In practice, when encountering unpredictable behavior of people or unexpected states of the environment, these frameworks may lack the ability to dynamically recognize such states, adapt and recover to resume the interaction. Large Language Models (LLMs), owing to their advanced reasoning capabilities and context retention, present a promising solution for enhancing robot adaptability. This potential, however, may not directly translate to improved interaction metrics. This paper considers a representative interaction with an industrial robot involving approach, instruction, and object manipulation, implemented in two conditions: (1) fully scripted and (2) including LLM-enhanced responses. We use gaze tracking and questionnaires to measure the participants' task efficiency, engagement, and robot perception. The results indicate higher subjective ratings for the LLM condition, but objective metrics show that the scripted condition performs comparably, particularly in efficiency and focus during simple tasks. We also note that the scripted condition may have an edge over LLM-enhanced responses in terms of response latency and energy consumption, especially for trivial and repetitive interactions.

Authors:Zixuan Feng, Igor Steinmacher, Marco Gerosa, Tyler Menezes, Alexander Serebrenik, Reed Milewicz, Anita Sarma
Title: The Multifaceted Nature of Mentoring in OSS: Strategies, Qualities, and Ideal Outcomes
Abstract:
Mentorship in open source software (OSS) is a vital, multifaceted process that includes onboarding newcomers, fostering skill development, and enhancing community building. This study examines task-focused mentoring strategies that help mentees complete their tasks and the ideal personal qualities and outcomes of good mentorship in OSS communities. We conducted two surveys to gather contributor perceptions: the first survey, with 70 mentors, mapped 17 mentoring challenges to 21 strategies that help support mentees. The second survey, with 85 contributors, assessed the importance of personal qualities and ideal mentorship outcomes. Our findings not only provide actionable strategies to help mentees overcome challenges and become successful contributors but also guide current and future mentors and OSS communities in understanding the personal qualities that are the cornerstone of good mentorship and the outcomes that mentor-mentee pairs should aspire to achieve.
中文: 本研究通过两项调查探讨了开源软件社区中以任务为导向的指导策略及关键个人品质,为帮助受指导者克服挑战和引导导师实现有效成果提供了可行方案。
English: This study explores task-focused mentoring strategies and essential personal qualities in open source software communities, identifying actionable approaches through dual surveys to help mentees overcome challenges and guide mentors toward effective outcomes.

Authors:Santiago Berrezueta-Guzman, Stephan Krusche, Stefan Wagner
Title: From Coders to Critics: Empowering Students through Peer Assessment in the Age of AI Copilots
Abstract:
The rapid adoption of AI powered coding assistants like ChatGPT and other coding copilots is transforming programming education, raising questions about assessment practices, academic integrity, and skill development. As educators seek alternatives to traditional grading methods susceptible to AI enabled plagiarism, structured peer assessment could be a promising strategy. This paper presents an empirical study of a rubric based, anonymized peer review process implemented in a large introductory programming course. Students evaluated each other's final projects (2D game), and their assessments were compared to instructor grades using correlation, mean absolute error, and root mean square error (RMSE). Additionally, reflective surveys from 47 teams captured student perceptions of fairness, grading behavior, and preferences regarding grade aggregation. Results show that peer review can approximate instructor evaluation with moderate accuracy and foster student engagement, evaluative thinking, and interest in providing good feedback to their peers. We discuss these findings for designing scalable, trustworthy peer assessment systems to face the age of AI assisted coding.

Authors:Santiago Berrezueta-Guzman, María Dolón-Poza, Stefan Wagner
Title: Supporting Preschool Emotional Development with AI-Powered Robots
Abstract:
This study evaluates the integration of AI-powered robots in early childhood education, focusing on their impact on emotional self-regulation, engagement, and collaborative skills. A ten-week experimental design involving two groups of children assessed the robot's effectiveness through progress assessments, parental surveys, and teacher feedback. Results demonstrated that early exposure to the robot significantly enhanced emotional recognition, while sustained interaction further improved collaborative and social engagement. Parental and teacher feedback highlighted high acceptance levels, emphasizing the robot's ease of integration and positive influence on classroom dynamics. This research underscores the transformative potential of AI and robotics in education. The findings advocate for the broader adoption of AI-powered interventions, carefully examining equitable access, ethical considerations, and sustainable implementation. This work sets a foundation for exploring long-term impacts and expanding applications of AI in inclusive and impactful educational settings.

Authors:Janik Kaden, Maximilian Hilger, Tim Schreiter, Marius Schaab, Thomas Graichen, Andrey Rudenko, Ulrich Heinkel, Achim J. Lilienthal
Title: Collecting Human Motion Data in Large and Occlusion-Prone Environments using Ultra-Wideband Localization
Abstract:
With robots increasingly integrating into human environments, understanding and predicting human motion is essential for safe and efficient interactions. Modern human motion and activity prediction approaches require high quality and quantity of data for training and evaluation, usually collected from motion capture systems, onboard or stationary sensors. Setting up these systems is challenging due to the intricate setup of hardware components, extensive calibration procedures, occlusions, and substantial costs. These constraints make deploying such systems in new and large environments difficult and limit their usability for in-the-wild measurements. In this paper we investigate the possibility to apply the novel Ultra-Wideband (UWB) localization technology as a scalable alternative for human motion capture in crowded and occlusion-prone environments. We include additional sensing modalities such as eye-tracking, onboard robot LiDAR and radar sensors, and record motion capture data as ground truth for evaluation and comparison. The environment imitates a museum setup, with up to four active participants navigating toward random goals in a natural way, and offers more than 130 minutes of multi-modal data. Our investigation provides a step toward scalable and accurate motion data collection beyond vision-based systems, laying a foundation for evaluating sensing modalities like UWB in larger and complex environments like warehouses, airports, or convention centers.

Authors:Pushkar Mishra, Charvi Rastogi, Stephen R. Pfohl, Alicia Parrish, Tian Huey Teh, Roma Patel, Mark Diaz, Ding Wang, Michela Paganini, Vinodkumar Prabhakaran, Lora Aroyo, Verena Rieser
Title: Decoding Safety Feedback from Diverse Raters: A Data-driven Lens on Responsiveness to Severity
Abstract:
Ensuring the safety of Generative AI requires a nuanced understanding of pluralistic viewpoints. In this paper, we introduce a novel data-driven approach for interpreting granular ratings in pluralistic datasets. Specifically, we address the challenge of analyzing nuanced differences in safety feedback from a diverse population expressed via ordinal scales (e.g., a Likert scale). We distill non-parametric responsiveness metrics that quantify the consistency of raters in scoring varying levels of the severity of safety violations. Leveraging a publicly available pluralistic dataset of safety feedback on AI-generated content as our case study, we investigate how raters from different demographic groups (age, gender, ethnicity) use an ordinal scale to express their perceptions of the severity of violations. We apply our metrics across violation types, demonstrating their utility in extracting nuanced insights that are crucial for aligning AI systems reliably in multi-cultural contexts. We show that our approach can inform rater selection and feedback interpretation by capturing nuanced viewpoints across different demographic groups, hence improving the quality of pluralistic data collection and in turn contributing to more robust AI development.
中文摘要:本文提出一种数据驱动方法,用于分析不同人口群体对AI安全性的细粒度评级,通过优化评分者筛选和反馈解读机制,提升多元文化背景下AI系统的对齐可靠性。
English Summary: This paper introduces a data-driven method to analyze nuanced safety ratings across diverse demographic groups, improving AI alignment in multicultural contexts through refined rater selection and feedback interpretation.

Authors:Bo Wang, Yiqiao Li, Jianlong Zhou, Fang Chen
Title: Can LLM Assist in the Evaluation of the Quality of Machine Learning Explanations?
Abstract:
EXplainable machine learning (XML) has recently emerged to address the mystery mechanisms of machine learning (ML) systems by interpreting their 'black box' results. Despite the development of various explanation methods, determining the most suitable XML method for specific ML contexts remains unclear, highlighting the need for effective evaluation of explanations. The evaluating capabilities of the Transformer-based large language model (LLM) present an opportunity to adopt LLM-as-a-Judge for assessing explanations. In this paper, we propose a workflow that integrates both LLM-based and human judges for evaluating explanations. We examine how LLM-based judges evaluate the quality of various explanation methods and compare their evaluation capabilities to those of human judges within an iris classification scenario, employing both subjective and objective metrics. We conclude that while LLM-based judges effectively assess the quality of explanations using subjective metrics, they are not yet sufficiently developed to replace human judges in this role.
中文摘要:本研究提出了一种结合基于大语言模型的评估与人工评估的工作流程,用于评估可解释机器学习方法,发现尽管大语言模型能有效运用主观指标,但目前尚无法取代人类评估者的角色。
English Summary: This study introduces a workflow combining LLM-based and human evaluation to assess explainable machine learning methods, finding that while LLMs effectively use subjective metrics, they cannot yet replace human judges.

Authors:Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, Khaled Saab, Dan Popovici, Jacob Blum, Fan Zhang, Katherine Chou, Avinatan Hassidim, Burak Gokturk, Amin Vahdat, Pushmeet Kohli, Yossi Matias, Andrew Carroll, Kavita Kulkarni, Nenad Tomasev, Yuan Guan, Vikram Dhillon, Eeshit Dhaval Vaishnav, Byron Lee, Tiago R D Costa, José R Penadés, Gary Peltz, Yunhan Xu, Annalisa Pawlosky, Alan Karthikesalingam, Vivek Natarajan
Title: Towards an AI co-scientist
Abstract:
Scientific discovery relies on scientists generating novel hypotheses that undergo rigorous experimental validation. To augment this process, we introduce an AI co-scientist, a multi-agent system built on Gemini 2.0. The AI co-scientist is intended to help uncover new, original knowledge and to formulate demonstrably novel research hypotheses and proposals, building upon prior evidence and aligned to scientist-provided research objectives and guidance. The system's design incorporates a generate, debate, and evolve approach to hypothesis generation, inspired by the scientific method and accelerated by scaling test-time compute. Key contributions include: (1) a multi-agent architecture with an asynchronous task execution framework for flexible compute scaling; (2) a tournament evolution process for self-improving hypotheses generation. Automated evaluations show continued benefits of test-time compute, improving hypothesis quality. While general purpose, we focus development and validation in three biomedical areas: drug repurposing, novel target discovery, and explaining mechanisms of bacterial evolution and anti-microbial resistance. For drug repurposing, the system proposes candidates with promising validation findings, including candidates for acute myeloid leukemia that show tumor inhibition in vitro at clinically applicable concentrations. For novel target discovery, the AI co-scientist proposed new epigenetic targets for liver fibrosis, validated by anti-fibrotic activity and liver cell regeneration in human hepatic organoids. Finally, the AI co-scientist recapitulated unpublished experimental results via a parallel in silico discovery of a novel gene transfer mechanism in bacterial evolution. These results, detailed in separate, co-timed reports, demonstrate the potential to augment biomedical and scientific discovery and usher an era of AI empowered scientists.

Authors:Zekai Shao, Siyu Yuan, Lin Gao, Yixuan He, Deqing Yang, Siming Chen
Title: Unlocking Scientific Concepts: How Effective Are LLM-Generated Analogies for Student Understanding and Classroom Practice?
Abstract:
Teaching scientific concepts is essential but challenging, and analogies help students connect new concepts to familiar ideas. Advancements in large language models (LLMs) enable generating analogies, yet their effectiveness in education remains underexplored. In this paper, we first conducted a two-stage study involving high school students and teachers to assess the effectiveness of LLM-generated analogies in biology and physics through a controlled in-class test and a classroom field study. Test results suggested that LLM-generated analogies could enhance student understanding particularly in biology, but require teachers' guidance to prevent over-reliance and overconfidence. Classroom experiments suggested that teachers could refine LLM-generated analogies to their satisfaction and inspire new analogies from generated ones, encouraged by positive classroom feedback and homework performance boosts. Based on findings, we developed and evaluated a practical system to help teachers generate and refine teaching analogies. We discussed future directions for developing and evaluating LLM-supported teaching and learning by analogy.
中文: 大语言模型生成的类比能有效提升学生对生物学等学科的理解,但需教师指导以避免过度依赖,且教师可基于生成内容优化出满意的教学类比。
English: Large language models can generate educational analogies that enhance student understanding, particularly in biology, but require teacher guidance to prevent over-reliance and can be refined by educators for classroom use.

Authors:Xilin Jiang, Sukru Samet Dindar, Vishal Choudhari, Stephan Bickel, Ashesh Mehta, Guy M McKhann, Daniel Friedman, Adeen Flinker, Nima Mesgarani
Title: AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
Abstract:
Auditory foundation models, including auditory large language models (LLMs), process all sound inputs equally, independent of listener perception. However, human auditory perception is inherently selective: listeners focus on specific speakers while ignoring others in complex auditory scenes. Existing models do not incorporate this selectivity, limiting their ability to generate perception-aligned responses. To address this, we introduce Intention-Informed Auditory Scene Understanding (II-ASU) and present Auditory Attention-Driven LLM (AAD-LLM), a prototype system that integrates brain signals to infer listener attention. AAD-LLM extends an auditory LLM by incorporating intracranial electroencephalography (iEEG) recordings to decode which speaker a listener is attending to and refine responses accordingly. The model first predicts the attended speaker from neural activity, then conditions response generation on this inferred attentional state. We evaluate AAD-LLM on speaker description, speech transcription and extraction, and question answering in multitalker scenarios, with both objective and subjective ratings showing improved alignment with listener intention. By taking a first step toward intention-aware auditory AI, this work explores a new paradigm where listener perception informs machine listening, paving the way for future listener-centered auditory systems. Demo and code available: https://aad-llm.github.io.

Authors:Zhida Sun, Zhenyao Zhang, Yue Zhang, Min Lu, Dani Lischinski, Daniel Cohen-Or, Hui Huang
Title: Creative Blends of Visual Concepts
Abstract:
Visual blends combine elements from two distinct visual concepts into a single, integrated image, with the goal of conveying ideas through imaginative and often thought-provoking visuals. Communicating abstract concepts through visual blends poses a series of conceptual and technical challenges. To address these challenges, we introduce Creative Blends, an AI-assisted design system that leverages metaphors to visually symbolize abstract concepts by blending disparate objects. Our method harnesses commonsense knowledge bases and large language models to align designers' conceptual intent with expressive concrete objects. Additionally, we employ generative text-to-image techniques to blend visual elements through their overlapping attributes. A user study (N=24) demonstrated that our approach reduces participants' cognitive load, fosters creativity, and enhances the metaphorical richness of visual blend ideation. We explore the potential of our method to expand visual blends to include multiple object blending and discuss the insights gained from designing with generative AI.

Authors:Yongjian Fu, Ke Sun, Ruyao Wang, Xinyi Li, Ju Ren, Yaoxue Zhang, Xinyu Zhang
Title: Enabling Cardiac Monitoring using In-ear Ballistocardiogram on COTS Wireless Earbuds
Abstract:
The human ear offers a unique opportunity for cardiac monitoring due to its physiological and practical advantages. However, existing earable solutions require additional hardware and complex processing, posing challenges for commercial True Wireless Stereo (TWS) earbuds which are limited by their form factor and resources. In this paper, we propose TWSCardio, a novel system that repurposes the IMU sensors in TWS earbuds for cardiac monitoring. Our key finding is that these sensors can capture in-ear ballistocardiogram (BCG) signals. TWSCardio reuses the unstable Bluetooth channel to stream the IMU data to a smartphone for BCG processing. It incorporates a signal enhancement framework to address issues related to missing data and low sampling rate, while mitigating motion artifacts by fusing multi-axis information. Furthermore, it employs a region-focused signal reconstruction method to translate the multi-axis in-ear BCG signals into fine-grained seismocardiogram (SCG) signals. We have implemented TWSCardio as an efficient real-time app. Our experiments on 100 subjects verify that TWSCardio can accurately reconstruct cardiac signals while showing resilience to motion artifacts, missing data, and low sampling rates. Our case studies further demonstrate that TWSCardio can support diverse cardiac monitoring applications.

Authors:Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Michael Moor, Zicheng Liu, Emad Barsoum
Title: Agent Laboratory: Using LLM Agents as Research Assistants
Abstract:
Historically, scientific discovery has been a lengthy and costly process, demanding substantial time and resources from initial conception to final results. To accelerate scientific discovery, reduce research costs, and improve research quality, we introduce Agent Laboratory, an autonomous LLM-based framework capable of completing the entire research process. This framework accepts a human-provided research idea and progresses through three stages--literature review, experimentation, and report writing to produce comprehensive research outputs, including a code repository and a research report, while enabling users to provide feedback and guidance at each stage. We deploy Agent Laboratory with various state-of-the-art LLMs and invite multiple researchers to assess its quality by participating in a survey, providing human feedback to guide the research process, and then evaluate the final paper. We found that: (1) Agent Laboratory driven by o1-preview generates the best research outcomes; (2) The generated machine learning code is able to achieve state-of-the-art performance compared to existing methods; (3) Human involvement, providing feedback at each stage, significantly improves the overall quality of research; (4) Agent Laboratory significantly reduces research expenses, achieving an 84% decrease compared to previous autonomous research methods. We hope Agent Laboratory enables researchers to allocate more effort toward creative ideation rather than low-level coding and writing, ultimately accelerating scientific discovery.

Authors:Troy Schotter, Saba Kawas, James Prather, Juho Leinonen, Jon Ippolito, Greg L Nelson
Title: SPIRAL integration of generative AI in an undergraduate creative media course: effects on self-efficacy and career outcome expectations
Abstract:
Computing education and computing students are rapidly integrating generative AI, but we know relatively little about how different pedagogical strategies for intentionally integrating generative AI affect students' self-efficacy and career interests. This study investigates a SPIRAL integration of generative AI (Skills Practiced Independently, Revisited with AI Later), implemented in an introductory undergraduate creative media and technology course in Fall 2023 (n=31). Students first developed domain skills for half the semester, then revisited earlier material integrating using generative AI, with explicit instruction on how to use it critically and ethically. We contribute a mixed methods quantitative and qualitative analysis of changes in self-efficacy and career interests over time, including longitudinal qualitative interviews (n=9) and thematic analysis. We found positive changes in both students' creative media self-efficacy and generative AI use self-efficacy, and mixed changes for ethical generative AI use self-efficacy. We also found students experienced demystification, transitioning from initial fear about generative AI taking over their fields and jobs, to doubting AI capability to do so and/or that society will push back against AI, through personal use of AI and observing others' use of AI vicariously. For career interests, our SPIRAL integration of generative AI use appeared to have either a neutral or positive influence on students, including widening their perceived career options, depending on their view of how AI would influence the career itself. These findings suggest that careful pedagogical sequencing can mitigate some potential negative impacts of AI, while promoting ethical and critical AI use that supports or has a neutral effect on students' career formation. To our knowledge our SPIRAL integration strategy applied to generative AI integration is novel.
中文: 本研究探讨了SPIRAL教学策略在生成式AI教育中的应用,发现该策略通过结构化伦理实践能提升学生自我效能感与职业兴趣,同时有效消解其对AI替代的焦虑。
English: This study explores the SPIRAL pedagogical strategy for integrating generative AI in education, finding it enhances students' self-efficacy and career interests while mitigating fears through structured, ethical use.

Authors:Yuheng Lu, Qian Yu, Hongru Wang, Zeming Liu, Wei Su, Yanping Liu, Yuhang Guo, Maocheng Liang, Yunhong Wang, Haifeng Wang
Title: TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital Environments
Abstract:
Graphical User Interface (GUI) agents, which autonomously operate on digital interfaces through natural language instructions, hold transformative potential for accessibility, automation, and user experience. A critical aspect of their functionality is grounding - the ability to map linguistic intents to visual and structural interface elements. However, existing GUI agents often struggle to adapt to the dynamic and interconnected nature of real-world digital environments, where tasks frequently span multiple platforms and applications while also being impacted by version updates. To address this, we introduce TransBench, the first benchmark designed to systematically evaluate and enhance the transferability of GUI agents across three key dimensions: cross-version transferability (adapting to version updates), cross-platform transferability (generalizing across platforms like iOS, Android, and Web), and cross-application transferability (handling tasks spanning functionally distinct apps). TransBench includes 15 app categories with diverse functionalities, capturing essential pages across versions and platforms to enable robust evaluation. Our experiments demonstrate significant improvements in grounding accuracy, showcasing the practical utility of GUI agents in dynamic, real-world environments. Our code and data will be publicly available at GitHub.

Authors:Jiangrong Shen, Yulin Xie, Qi Xu, Gang Pan, Huajin Tang, Badong Chen
Title: Spiking Neural Networks with Temporal Attention-Guided Adaptive Fusion for imbalanced Multi-modal Learning
Abstract:
Multimodal spiking neural networks (SNNs) hold significant potential for energy-efficient sensory processing but face critical challenges in modality imbalance and temporal misalignment. Current approaches suffer from uncoordinated convergence speeds across modalities and static fusion mechanisms that ignore time-varying cross-modal interactions. We propose the temporal attention-guided adaptive fusion framework for multimodal SNNs with two synergistic innovations: 1) The Temporal Attention-guided Adaptive Fusion (TAAF) module that dynamically assigns importance scores to fused spiking features at each timestep, enabling hierarchical integration of temporally heterogeneous spike-based features; 2) The temporal adaptive balanced fusion loss that modulates learning rates per modality based on the above attention scores, preventing dominant modalities from monopolizing optimization. The proposed framework implements adaptive fusion, especially in the temporal dimension, and alleviates the modality imbalance during multimodal learning, mimicking cortical multisensory integration principles. Evaluations on CREMA-D, AVE, and EAD datasets demonstrate state-of-the-art performance (77.55\%, 70.65\% and 97.5\%accuracy, respectively) with energy efficiency. The system resolves temporal misalignment through learnable time-warping operations and faster modality convergence coordination than baseline SNNs. This work establishes a new paradigm for temporally coherent multimodal learning in neuromorphic systems, bridging the gap between biological sensory processing and efficient machine intelligence.

Authors:Fan Gao, Xinjie Zhao, Ding Xia, Zhongyi Zhou, Rui Yang, Jinghui Lu, Hang Jiang, Chanjun Park, Irene Li
Title: HealthGenie: Empowering Users with Healthy Dietary Guidance through Knowledge Graph and Large Language Models
Abstract:
Seeking dietary guidance often requires navigating complex professional knowledge while accommodating individual health conditions. Knowledge Graphs (KGs) offer structured and interpretable nutritional information, whereas Large Language Models (LLMs) naturally facilitate conversational recommendation delivery. In this paper, we present HealthGenie, an interactive system that combines the strengths of LLMs and KGs to provide personalized dietary recommendations along with hierarchical information visualization for a quick and intuitive overview. Upon receiving a user query, HealthGenie performs query refinement and retrieves relevant information from a pre-built KG. The system then visualizes and highlights pertinent information, organized by defined categories, while offering detailed, explainable recommendation rationales. Users can further tailor these recommendations by adjusting preferences interactively. Our evaluation, comprising a within-subject comparative experiment and an open-ended discussion, demonstrates that HealthGenie effectively supports users in obtaining personalized dietary guidance based on their health conditions while reducing interaction effort and cognitive load. These findings highlight the potential of LLM-KG integration in supporting decision-making through explainable and visualized information. We examine the system's usefulness and effectiveness with an N=12 within-subject study and provide design considerations for future systems that integrate conversational LLM and KG.

Authors:Donghuo Zeng, Roberto Legaspi, Yuewen Sun, Xinshuai Dong, Kazushi Ikeda, Peter Spirtes, Kun Zhang
Title: Generative Framework for Personalized Persuasion: Inferring Causal, Counterfactual, and Latent Knowledge
Abstract:
We hypothesize that optimal system responses emerge from adaptive strategies grounded in causal and counterfactual knowledge. Counterfactual inference allows us to create hypothetical scenarios to examine the effects of alternative system responses. We enhance this process through causal discovery, which identifies the strategies informed by the underlying causal structure that govern system behaviors. Moreover, we consider the psychological constructs and unobservable noises that might be influencing user-system interactions as latent factors. We show that these factors can be effectively estimated. We employ causal discovery to identify strategy-level causal relationships among user and system utterances, guiding the generation of personalized counterfactual dialogues. We model the user utterance strategies as causal factors, enabling system strategies to be treated as counterfactual actions. Furthermore, we optimize policies for selecting system responses based on counterfactual data. Our results using a real-world dataset on social good demonstrate significant improvements in persuasive system outcomes, with increased cumulative rewards validating the efficacy of causal discovery in guiding personalized counterfactual inference and optimizing dialogue policies for a persuasive dialogue system.

Authors:Donghuo Zeng, Roberto Legaspi, Yuewen Sun, Xinshuai Dong, Kazushi Ikeda, Peter Spirtes, Kun Zhang
Title: Causal Discovery and Counterfactual Reasoning to Optimize Persuasive Dialogue Policies
Abstract:
Tailoring persuasive conversations to users leads to more effective persuasion. However, existing dialogue systems often struggle to adapt to dynamically evolving user states. This paper presents a novel method that leverages causal discovery and counterfactual reasoning for optimizing system persuasion capability and outcomes. We employ the Greedy Relaxation of the Sparsest Permutation (GRaSP) algorithm to identify causal relationships between user and system utterance strategies, treating user strategies as states and system strategies as actions. GRaSP identifies user strategies as causal factors influencing system responses, which inform Bidirectional Conditional Generative Adversarial Networks (BiCoGAN) in generating counterfactual utterances for the system. Subsequently, we use the Dueling Double Deep Q-Network (D3QN) model to utilize counterfactual data to determine the best policy for selecting system utterances. Our experiments with the PersuasionForGood dataset show measurable improvements in persuasion outcomes using our approach over baseline methods. The observed increase in cumulative rewards and Q-values highlights the effectiveness of causal discovery in enhancing counterfactual reasoning and optimizing reinforcement learning policies for online dialogue systems.

Authors:Robert H. Thomson, Quan Nguyen, Essien Ayanam, Matthew Canham, Thomas C. Schmidt, Matthias Wählisch, Eric Osterweil
Title: Combating the Effects of Cyber-Psychosis: Using Object Security to Facilitate Critical Thinking
Abstract:
Humanity is currently facing an existential crisis about the nature of truth and reality driven by the availability of information online which overloads and overwhelms our cognitive capabilities, which we call Cyber-Psychosis. The results of this Cyber-Psychosis include the decline of critical thinking coupled with deceptive influences on the Internet which have become so prolific that they are challenging our ability to form a shared understanding of reality in either the digital or physical world. Fundamental to mending our fractured digital universe is establishing the ability to know where a digital object (i.e. a piece of information like text, audio, or video) came from, whether it was modified, what it is derived from, where it has been circulated, and what (if any) lifetime that information should have. Furthermore, we argue that on-by-default object security for genuine objects will provide the necessary grounding to support critical thinking and rational online behavior, even with the ubiquity of deceptive content. To this end, we propose that the Internet needs an object security service layer. This proposition may not be as distant as it may first seem. Through an examination of several venerable (and new) protocols, we show how pieces of this problem have already been addressed. While interdisciplinary research will be key to properly crafting the architectural changes needed, here we propose an approach for how we can already use fallow protections to begin turning the tide of this emerging Cyber-Psychosis today!

Authors:Subham Agrawal, Nico Ostermann-Myrau, Nils Dengler, Maren Bennewitz
Title: Pedestrians and Robots: A Novel Dataset for Learning Distinct Social Navigation Forces
Abstract:
The increasing use of robots in human-centric public spaces such as shopping malls, sidewalks, and hospitals, requires understanding of how pedestrians respond to their presence. However, existing research lacks comprehensive datasets that capture the full range of pedestrian behaviors, e.g., including avoidance, neutrality, and attraction in the presence of robots. Such datasets can be used to effectively learn models capable of accurately predicting diverse responses of pedestrians to robot presence, which are crucial for advancing robot navigation strategies and optimizing pedestrian-aware motion planning. In this paper, we address these challenges by collecting a novel dataset of pedestrian motion in two outdoor locations under three distinct conditions, i.e., no robot presence, a stationary robot, and a moving robot. Thus, unlike existing datasets, ours explicitly encapsulates variations in pedestrian behavior across the different robot conditions. Using our dataset, we propose a novel Neural Social Robot Force Model (NSRFM), an extension of the traditional Social Force Model that integrates neural networks and robot-induced forces to better predict pedestrian behavior in the presence of robots. We validate the NSRFM by comparing its generated trajectories on different real-world datasets. Furthermore, we implemented it in simulation to enable the learning and benchmarking of robot navigation strategies based on their impact on pedestrian movement. Our results demonstrate the model's effectiveness in replicating real-world pedestrian reactions and its its utility in developing, evaluating, and benchmarking social robot navigation algorithms.

Authors:Ruixin Wang, Zhongkai Zhao, Le Fang, Nan Jiang, Yiling Lou, Lin Tan, Tianyi Zhang
Title: Show Me Why It's Correct: Saving 1/3 of Debugging Time in Program Repair with Interactive Runtime Comparison
Abstract:
Automated Program Repair (APR) holds the promise of alleviating the burden of debugging and fixing software bugs. Despite this, developers still need to manually inspect each patch to confirm its correctness, which is tedious and time-consuming. This challenge is exacerbated in the presence of plausible patches, which accidentally pass test cases but may not correctly fix the bug. To address this challenge, we propose an interactive approach called iFix to facilitate patch understanding and comparison based on their runtime difference. iFix performs static analysis to identify runtime variables related to the buggy statement and captures their runtime values during execution for each patch. These values are then aligned across different patch candidates, allowing users to compare and contrast their runtime behavior. To evaluate iFix, we conducted a within-subjects user study with 28 participants. Compared with manual inspection and a state-of-the-art interactive patch filtering technique, iFix reduced participants' task completion time by 36% and 33% while also improving their confidence by 50% and 20%, respectively. Besides, quantitative experiments demonstrate that iFix improves the ranking of correct patches by at least 39% compared with other patch ranking methods and is generalizable to different APR tools.

Authors:Shenghui Chen, Yunhao Yang, Kayla Boggess, Seongkook Heo, Lu Feng, Ufuk Topcu
Title: Evaluating Human Trust in LLM-Based Planners: A Preliminary Study
Abstract:
Large Language Models (LLMs) are increasingly used for planning tasks, offering unique capabilities not found in classical planners such as generating explanations and iterative refinement. However, trust--a critical factor in the adoption of planning systems--remains underexplored in the context of LLM-based planning tasks. This study bridges this gap by comparing human trust in LLM-based planners with classical planners through a user study in a Planning Domain Definition Language (PDDL) domain. Combining subjective measures, such as trust questionnaires, with objective metrics like evaluation accuracy, our findings reveal that correctness is the primary driver of trust and performance. Explanations provided by the LLM improved evaluation accuracy but had limited impact on trust, while plan refinement showed potential for increasing trust without significantly enhancing evaluation accuracy.

Authors:Mert İnan, Anthony Sicilia, Suvodip Dey, Vardhan Dongre, Tejas Srinivasan, Jesse Thomason, Gökhan Tür, Dilek Hakkani-Tür, Malihe Alikhani
Title: Better Slow than Sorry: Introducing Positive Friction for Reliable Dialogue Systems
Abstract:
While theories of discourse and cognitive science have long recognized the value of unhurried pacing, recent dialogue research tends to minimize friction in conversational systems. Yet, frictionless dialogue risks fostering uncritical reliance on AI outputs, which can obscure implicit assumptions and lead to unintended consequences. To meet this challenge, we propose integrating positive friction into conversational AI, which promotes user reflection on goals, critical thinking on system response, and subsequent re-conditioning of AI systems. We hypothesize systems can improve goal alignment, modeling of user mental states, and task success by deliberately slowing down conversations in strategic moments to ask questions, reveal assumptions, or pause. We present an ontology of positive friction and collect expert human annotations on multi-domain and embodied goal-oriented corpora. Experiments on these corpora, along with simulated interactions using state-of-the-art systems, suggest incorporating friction not only fosters accountable decision-making, but also enhances machine understanding of user beliefs and goals, and increases task success rates.

Authors:Amirreza Payandeh, Daeun Song, Mohammad Nazeri, Jing Liang, Praneel Mukherjee, Amir Hossain Raj, Yangzhe Kong, Dinesh Manocha, Xuesu Xiao
Title: Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces
Abstract:
Most existing social robot navigation techniques either leverage hand-crafted rules or human demonstrations to connect robot perception to socially compliant actions. However, there remains a significant gap in effectively translating perception into socially compliant actions, much like how human reasoning naturally occurs in dynamic environments. Considering the recent success of Vision-Language Models (VLMs), we propose using language to bridge the gap in human-like reasoning between perception and socially aware robot actions. We create a vision-language dataset, Social robot Navigation via Explainable Interactions (SNEI), featuring 40K human-annotated Visual Question Answers (VQAs) based on 2K human-robot social interactions in unstructured, crowded public spaces, spanning perception, prediction, chain-of-thought reasoning, action, and explanation. We fine-tune a VLM, Social-LLaVA, using SNEI to demonstrate the practical application of our dataset. Social-LLaVA outperforms state-of-the-art models like GPT-4V and Gemini, based on the average of fifteen different human-judge scores across 50 VQA. Deployed onboard a mobile robot, Social-LLaVA enables human-like reasoning, marking a promising step toward socially compliant robot navigation in dynamic public spaces through language reasoning.

Authors:Olga Kolesnikova, Moein Shahiki Tash, Zahra Ahani, Ameeta Agrawal, Raul Monroy, Grigori Sidorov
Title: Advanced Machine Learning Techniques for Social Support Detection on Social Media
Abstract:
The widespread use of social media highlights the need to understand its impact, particularly the role of online social support. This study uses a dataset focused on online social support, which includes binary and multiclass classifications of social support content on social media. The classification of social support is divided into three tasks. The first task focuses on distinguishing between supportive and non-supportive. The second task aims to identify whether the support is directed toward an individual or a group. The third task categorizes the specific type of social support, grouping it into categories such as Nation, LGBTQ, Black people, Women, Religion, and Other (if it does not fit into the previously mentioned categories). To address data imbalances in these tasks, we employed K-means clustering for balancing the dataset and compared the results with the original unbalanced data. Using advanced machine learning techniques, including transformers and zero-shot learning approaches with GPT3, GPT4, and GPT4-o, we predict social support levels in various contexts. The effectiveness of the dataset is evaluated using baseline models across different learning approaches, with transformer-based methods demonstrating superior performance. Additionally, we achieved a 0.4\% increase in the macro F1 score for the second task and a 0.7\% increase for the third task, compared to previous work utilizing traditional machine learning with psycholinguistic and unigram-based TF-IDF values.

Authors:Weiyin Xie, Chunxi Huang, Jiyao Wang, Dengbo He
Title: Do Electric Vehicles Induce More Motion Sickness Than Fuel Vehicles? A Survey Study in China
Abstract:
Electric vehicles (EVs) are a promising alternative to fuel vehicles (FVs), given some unique characteristics of EVs, for example, the low air pollution and maintenance cost. However, the increasing prevalence of EVs is accompanied by widespread complaints regarding the high likelihood of motion sickness (MS) induction, especially when compared to FVs, which has become one of the major obstacles to the acceptance and popularity of EVs. Despite the prevalence of such complaints online and among EV users, the association between vehicle type (i.e., EV versus FV) and MS prevalence and severity has not been quantified. Thus, this study aims to investigate the existence of EV-induced MS and explore the potential factors leading to it. A survey study was conducted to collect passengers' MS experience in EVs and FVs in the past one year. In total, 639 valid responses were collected from mainland China. The results show that FVs were associated with a higher frequency of MS, while EVs were found to induce more severe MS symptoms. Further, we found that passengers' MS severity was associated with individual differences (i.e., age, gender, sleep habits, susceptibility to motion-induced MS), in-vehicle activities (i.e., chatting with others and watching in-vehicle displays), and road conditions (i.e., congestion and slope), while the MS frequency was associated with the vehicle ownership and riding frequency. The results from this study can guide the directions of future empirical studies that aim to quantify the inducers of MS in EVs and FVs, as well as the optimization of EVs to reduce MS.
中文: 电动汽车比燃油车引发更严重的晕车症状,其严重程度受个体差异、车内活动和路况影响,而晕车频率则与车辆拥有情况和乘坐频率相关。
English: Electric vehicles are linked to more severe motion sickness symptoms than fuel vehicles, with severity influenced by individual traits, in-vehicle activities, and road conditions, while frequency relates to vehicle ownership and riding habits.

Authors:Abdallah Lakhdari, Jiajie Li, Amani Abusafia, Athman Bouguettaya
Title: Privacy-aware IoT Fall Detection Services For Aging in Place
Abstract:
Fall detection is critical to support the growing elderly population, projected to reach 2.1 billion by 2050. However, existing methods often face data scarcity challenges or compromise privacy. We propose a novel IoT-based Fall Detection as a Service (FDaaS) framework to assist the elderly in living independently and safely by accurately detecting falls. We design a service-oriented architecture that leverages Ultra-wideband (UWB) radar sensors as an IoT health-sensing service, ensuring privacy and minimal intrusion. We address the challenges of data scarcity by utilizing a Fall Detection Generative Pre-trained Transformer (FD-GPT) that uses augmentation techniques. We developed a protocol to collect a comprehensive dataset of the elderly daily activities and fall events. This resulted in a real dataset that carefully mimics the elderly's routine. We rigorously evaluate and compare various models using this dataset. Experimental results show our approach achieves 90.72% accuracy and 89.33% precision in distinguishing between fall events and regular activities of daily living.
中文: 提出的物联网跌倒检测即服务框架采用保护隐私的超宽带雷达传感器和生成模型解决数据稀缺问题,在老年人跌倒检测中实现了超过90%的准确率。
English: The proposed IoT-based Fall Detection as a Service (FDaaS) framework uses privacy-preserving UWB radar sensors and a generative model to overcome data scarcity, achieving over 90% accuracy in detecting falls among the elderly.

Authors:Gionnieve Lim, Bryan Chen Zhengyu Tan, Kellie Yu Hui Sim, Weiyan Shi, Ming Hui Chew, Ming Shan Hee, Roy Ka-Wei Lee, Simon T. Perrault, Kenny Tsu Wei Choo
Title: Sword and Shield: Uses and Strategies of LLMs in Navigating Disinformation
Abstract:
The emergence of Large Language Models (LLMs) presents a dual challenge in the fight against disinformation. These powerful tools, capable of generating human-like text at scale, can be weaponised to produce sophisticated and persuasive disinformation, yet they also hold promise for enhancing detection and mitigation strategies. This paper investigates the complex dynamics between LLMs and disinformation through a communication game that simulates online forums, inspired by the game Werewolf, with 25 participants. We analyse how Disinformers, Moderators, and Users leverage LLMs to advance their goals, revealing both the potential for misuse and combating disinformation. Our findings highlight the varying uses of LLMs depending on the participants' roles and strategies, underscoring the importance of understanding their effectiveness in this context. We conclude by discussing implications for future LLM development and online platform design, advocating for a balanced approach that empowers users and fosters trust while mitigating the risks of LLM-assisted disinformation.

Authors:Quan Shi, Carlos E. Jimenez, Shunyu Yao, Nick Haber, Diyi Yang, Karthik Narasimhan
Title: When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration
Abstract:
Recent advancements in AI reasoning have driven substantial improvements across diverse tasks. A critical open question is whether these improvements also yields better knowledge transfer: the ability of models to communicate reasoning in ways humans can understand, apply, and learn from. To investigate this, we introduce Knowledge Integration and Transfer Evaluation (KITE), a conceptual and experimental framework for Human-AI knowledge transfer capabilities and conduct the first large-scale human study (N=118) explicitly designed to measure it. In our two-phase setup, humans first ideate with an AI on problem-solving strategies, then independently implement solutions, isolating model explanations' influence on human understanding. Our findings reveal that although model benchmark performance correlates with collaborative outcomes, this relationship is notably inconsistent, featuring significant outliers, indicating that knowledge transfer requires dedicated optimization. Our analysis identifies behavioral and strategic factors mediating successful knowledge transfer. We release our code, dataset, and evaluation framework to support future work on communicatively aligned models.
中文摘要:近期人工智能推理能力的进步与人类知识传递效果存在不一致性,为此提出的KITE评估框架首次通过大规模实验证明:模型基准性能并不能可靠预测人类从AI解释中学习的效果。
English Summary: Recent AI reasoning advances show inconsistent knowledge transfer to humans, prompting the development of the KITE framework which reveals that benchmark performance doesn't reliably predict human learning from AI explanations.

Authors:Eason Chen, Xinyi Tang, Aprille Xi, Chenyu Lin, Conrad Borchers, Shivang Gupta, Jionghao Lin, Kenneth R Koedinger
Title: VTutor for High-Impact Tutoring at Scale: Managing Engagement and Real-Time Multi-Screen Monitoring with P2P Connections
Abstract:
Hybrid tutoring, where a human tutor supports multiple students in learning with educational technology, is an increasingly common application to deliver high-impact tutoring at scale. However, past hybrid tutoring applications are limited in guiding tutor attention to students that require support. Specifically, existing conferencing tools, commonly used in hybrid tutoring, do not allow tutors to monitor multiple students' screens while directly communicating and attending to multiple students simultaneously. To address this issue, this paper introduces VTutor, a web-based platform leveraging peer-to-peer screen sharing and virtual avatars to deliver real-time, context-aware tutoring feedback at scale. By integrating a multi-student monitoring dashboard with AI-powered avatar prompts, VTutor empowers a single educator or tutor to rapidly detect off-task or struggling students and intervene proactively, thus enhancing the benefits of one-on-one interactions in classroom contexts with several students. Drawing on insight from the learning sciences and past research on animated pedagogical agents, we demonstrate how stylized avatars can potentially sustain student engagement while accommodating varying infrastructure constraints. Finally, we address open questions on refining large-scale, AI-driven tutoring solutions for improved learner outcomes, and how VTutor could help interpret real-time learner interactions to support remote tutors at scale. The VTutor platform can be accessed at https://ls2025.vtutor.ai. The system demo video is at https://ls2025.vtutor.ai/video.

Authors:Eason Chen, Chenyu Lin, Yu-Kai Huang, Xinyi Tang, Aprille Xi, Jionghao Lin, Kenneth Koedinger
Title: VTutor: An Animated Pedagogical Agent SDK that Provide Real Time Multi-Model Feedback
Abstract:
Pedagogical Agents (PAs) show significant potential for boosting student engagement and learning outcomes by providing adaptive, on-demand support in educational contexts. However, existing PA solutions are often hampered by pre-scripted dialogue, unnatural animations, uncanny visual realism, and high development costs. To address these gaps, we introduce VTutor, an open-source SDK leveraging lightweight WebGL, Unity, and JavaScript frameworks. VTutor receives text outputs from a large language model (LLM), converts them into audio via text-to-speech, and then renders a real-time, lip-synced pedagogical agent (PA) for immediate, large-scale deployment on web-based learning platforms. By providing on-demand, personalized feedback, VTutor strengthens students' motivation and deepens their engagement with instructional material. Using an anime-like aesthetic, VTutor alleviates the uncanny valley effect, allowing learners to engage with expressive yet comfortably stylized characters. Our evaluation with 50 participants revealed that VTutor significantly outperforms the existing talking-head approaches (e.g., SadTalker) on perceived synchronization accuracy, naturalness, emotional expressiveness, and overall preference. As an open-source project, VTutor welcomes community-driven contributions - from novel character designs to specialized showcases of pedagogical agent applications - that fuel ongoing innovation in AI-enhanced education. By providing an accessible, customizable, and learner-centered PA solution, VTutor aims to elevate human-AI interaction experience in education fields, ultimately broadening the impact of AI in learning contexts. The demo link to VTutor is at https://vtutor-aied25.vercel.app.

Authors:Zheng Wei, Junxiang Liao, Lik-Hang Lee, Huamin Qu, Xian Xu
Title: Towards Enhanced Learning through Presence: A Systematic Review of Presence in Virtual Reality Across Tasks and Disciplines
Abstract:
The rising interest in Virtual Reality (VR) technology has sparked a desire to create immersive learning platforms capable of handling various tasks across environments. Through immersive interfaces, users can engage deeply with virtual environments, enhancing both learning outcomes and task performance. In fields such as education, engineering, and collaboration, presence has emerged as a critical factor influencing user engagement, motivation, and skill mastery. This review provides a comprehensive examination of the role of presence across different tasks and disciplines, exploring how its design impacts learning outcomes. Using a systematic search strategy based on the PRISMA method, we screened 2,793 articles and included 78 studies that met our inclusion criteria. We conducted a detailed classification and analysis of different types of presence in VR environments, including spatial presence, social presence, co-presence, self-presence, and cognitive presence. This review emphasizes how these varied types of presence affect learning outcomes across tasks and fields, and examines how design elements and interaction techniques shape presence and subsequently impact learning outcomes. We also summarize trends and future directions, identifying research gaps and opportunities to improve learning outcomes by enhancing presence in VR environments, thus offering guidance and insight for future research on VR presence and learning effectiveness.

Authors:Lucio La Cava, Andrea Tagarelli
Title: OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and Attribution
Abstract:
Open Large Language Models (OLLMs) are increasingly leveraged in generative AI applications, posing new challenges for detecting their outputs. We propose OpenTuringBench, a new benchmark based on OLLMs, designed to train and evaluate machine-generated text detectors on the Turing Test and Authorship Attribution problems. OpenTuringBench focuses on a representative set of OLLMs, and features a number of challenging evaluation tasks, including human/machine-manipulated texts, out-of-domain texts, and texts from previously unseen models. We also provide OTBDetector, a contrastive learning framework to detect and attribute OLLM-based machine-generated texts. Results highlight the relevance and varying degrees of difficulty of the OpenTuringBench tasks, with our detector achieving remarkable capabilities across the various tasks and outperforming most existing detectors. Resources are available on the OpenTuringBench Hugging Face repository at https://huggingface.co/datasets/MLNTeam-Unical/OpenTuringBench

Authors:Kentaro Takahira, Wong Kam-Kwai, Leni Yang, Xian Xu, Takanori Fujiwara, Huamin Qu
Title: TangibleNet: Synchronous Network Data Storytelling through Tangible Interactions in Augmented Reality
Abstract:
Synchronous data-driven storytelling with network visualizations presents significant challenges due to the complexity of real-time manipulation of network components. While existing research addresses asynchronous scenarios, there is a lack of effective tools for live presentations. To address this gap, we developed TangibleNet, a projector-based AR prototype that allows presenters to interact with node-link diagrams using double-sided magnets during live presentations. The design process was informed by interviews with professionals experienced in synchronous data storytelling and workshops with 14 HCI/VIS researchers. Insights from the interviews helped identify key design considerations for integrating physical objects as interactive tools in presentation contexts. The workshops contributed to the development of a design space mapping user actions to interaction commands for node-link diagrams. Evaluation with 12 participants confirmed that TangibleNet supports intuitive interactions and enhances presenter autonomy, demonstrating its effectiveness for synchronous network-based data storytelling.

Authors:Mingyue Yuan, Jieshan Chen, Zhenchang Xing, Gelareh Mohammadi, Aaron Quigley
Title: A Case Study of Scalable Content Annotation Using Multi-LLM Consensus and Human Review
Abstract:
Content annotation at scale remains challenging, requiring substantial human expertise and effort. This paper presents a case study in code documentation analysis, where we explore the balance between automation efficiency and annotation accuracy. We present MCHR (Multi-LLM Consensus with Human Review), a novel semi-automated framework that enhances annotation scalability through the systematic integration of multiple LLMs and targeted human review. Our framework introduces a structured consensus-building mechanism among LLMs and an adaptive review protocol that strategically engages human expertise. Through our case study, we demonstrate that MCHR reduces annotation time by 32% to 100% compared to manual annotation while maintaining high accuracy (85.5% to 98%) across different difficulty levels, from basic binary classification to challenging open-set scenarios.
中文摘要:本文提出MCHR半自动化框架,通过整合多LLM共识机制与定向人工审核,在代码文档标注任务中将标注时间减少32%-100%,同时在不同难度场景下保持85.5%-98%的准确率。
English Summary: This paper introduces MCHR, a semi-automated framework combining multiple LLMs with strategic human review to significantly reduce annotation time by 32-100% while maintaining 85.5-98% accuracy across various code documentation tasks.

Authors:He Zhang, Xinyi Fu, John M. Carroll
Title: Augmenting Image Annotation: A Human-LMM Collaborative Framework for Efficient Object Selection and Label Generation
Abstract:
Traditional image annotation tasks rely heavily on human effort for object selection and label assignment, making the process time-consuming and prone to decreased efficiency as annotators experience fatigue after extensive work. This paper introduces a novel framework that leverages the visual understanding capabilities of large multimodal models (LMMs), particularly GPT, to assist annotation workflows. In our proposed approach, human annotators focus on selecting objects via bounding boxes, while the LMM autonomously generates relevant labels. This human-AI collaborative framework enhances annotation efficiency by reducing the cognitive and time burden on human annotators. By analyzing the system's performance across various types of annotation tasks, we demonstrate its ability to generalize to tasks such as object recognition, scene description, and fine-grained categorization. Our proposed framework highlights the potential of this approach to redefine annotation workflows, offering a scalable and efficient solution for large-scale data labeling in computer vision. Finally, we discuss how integrating LMMs into the annotation pipeline can advance bidirectional human-AI alignment, as well as the challenges of alleviating the "endless annotation" burden in the face of information overload by shifting some of the work to AI.

Authors:Jingyi Xie, Rui Yu, He Zhang, Syed Masum Billah, Sooyeon Lee, John M. Carroll
Title: Beyond Visual Perception: Insights from Smartphone Interaction of Visually Impaired Users with Large Multimodal Models
Abstract:
Large multimodal models (LMMs) have enabled new AI-powered applications that help people with visual impairments (PVI) receive natural language descriptions of their surroundings through audible text. We investigated how this emerging paradigm of visual assistance transforms how PVI perform and manage their daily tasks. Moving beyond usability assessments, we examined both the capabilities and limitations of LMM-based tools in personal and social contexts, while exploring design implications for their future development. Through interviews with 14 visually impaired users of Be My AI (an LMM-based application) and analysis of its image descriptions from both study participants and social media platforms, we identified two key limitations. First, these systems' context awareness suffers from hallucinations and misinterpretations of social contexts, styles, and human identities. Second, their intent-oriented capabilities often fail to grasp and act on users' intentions. Based on these findings, we propose design strategies for improving both human-AI and AI-AI interactions, contributing to the development of more effective, interactive, and personalized assistive technologies.

Authors:Eason Chen, Chenyu Lin, Xinyi Tang, Aprille Xi, Canwen Wang, Jionghao Lin, Kenneth R Koedinger
Title: VTutor: An Open-Source SDK for Generative AI-Powered Animated Pedagogical Agents with Multi-Media Output
Abstract:
The rapid evolution of large language models (LLMs) has transformed human-computer interaction (HCI), but the interaction with LLMs is currently mainly focused on text-based interactions, while other multi-model approaches remain under-explored. This paper introduces VTutor, an open-source Software Development Kit (SDK) that combines generative AI with advanced animation technologies to create engaging, adaptable, and realistic APAs for human-AI multi-media interactions. VTutor leverages LLMs for real-time personalized feedback, advanced lip synchronization for natural speech alignment, and WebGL rendering for seamless web integration. Supporting various 2D and 3D character models, VTutor enables researchers and developers to design emotionally resonant, contextually adaptive learning agents. This toolkit enhances learner engagement, feedback receptivity, and human-AI interaction while promoting trustworthy AI principles in education. VTutor sets a new standard for next-generation APAs, offering an accessible, scalable solution for fostering meaningful and immersive human-AI interaction experiences. The VTutor project is open-sourced and welcomes community-driven contributions and showcases.

Authors:Jiyao Wang, Youyu Sheng, Qihang He, Haolong Hu, Shuwen Liu, Feiqi Gu, Yumei Jing, Dengbo He
Title: Enhancing Psychotherapeutic Alliance in College: When and How to Integrate Multimodal Large Language Models in Psychotherapy
Abstract:
As mental health issues rise among college students, there is an increasing interest and demand in leveraging Multimodal Language Models (MLLM) to enhance mental support services, yet integrating them into psychotherapy remains theoretical or non-user-centered. This study investigated the opportunities and challenges of using MLLMs within the campus psychotherapy alliance in China. Through three studies involving both therapists and student clients, we argue that the ideal role for MLLMs at this stage is as an auxiliary tool to human therapists. Users widely expect features such as triage matching and real-time emotion recognition. At the same time, for independent therapy by MLLM, concerns about capabilities and privacy ethics remain prominent, despite high demands for personalized avatars and non-verbal communication. Our findings further indicate that users' sense of social identity and perceived relative status of MLLMs significantly influence their acceptance. This study provides insights for future intelligent campus mental healthcare.

Authors:Zhongzhan Huang, Shanshan Zhong, Pan Zhou, Shanghua Gao, Marinka Zitnik, Liang Lin
Title: A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models
Abstract:
Recently, numerous benchmarks have been developed to evaluate the logical reasoning abilities of large language models (LLMs). However, assessing the equally important creative capabilities of LLMs is challenging due to the subjective, diverse, and data-scarce nature of creativity, especially in multimodal scenarios. In this paper, we consider the comprehensive pipeline for evaluating the creativity of multimodal LLMs, with a focus on suitable evaluation platforms and methodologies. First, we find the Oogiri game, a creativity-driven task requiring humor, associative thinking, and the ability to produce unexpected responses to text, images, or both. This game aligns well with the input-output structure of modern multimodal LLMs and benefits from a rich repository of high-quality, human-annotated creative responses, making it an ideal platform for studying LLM creativity. Next, beyond using the Oogiri game for standard evaluations like ranking and selection, we propose LoTbench, an interactive, causality-aware evaluation framework, to further address some intrinsic risks in standard evaluations, such as information leakage and limited interpretability. The proposed LoTbench not only quantifies LLM creativity more effectively but also visualizes the underlying creative thought processes. Our results show that while most LLMs exhibit constrained creativity, the performance gap between LLMs and humans is not insurmountable. Furthermore, we observe a strong correlation between results from the multimodal cognition benchmark MMMU and LoTbench, but only a weak connection with traditional creativity metrics. This suggests that LoTbench better aligns with human cognitive theories, highlighting cognition as a critical foundation in the early stages of creativity and enabling the bridging of diverse concepts. https://lotbench.github.io

Authors:Christoph Gebhardt, Robin Willardt, Seyedmorteza Sadat, Chih-Wei Ning, Andreas Brombach, Jie Song, Otmar Hilliges, Christian Holz
Title: Regressor-Guided Image Editing Regulates Emotional Response to Reduce Online Engagement
Abstract:
Emotions are known to mediate the relationship between users' content consumption and their online engagement, with heightened emotional intensity leading to increased engagement. Building on this insight, we propose three regressor-guided image editing approaches aimed at diminishing the emotional impact of images. These include (i) a parameter optimization approach based on global image transformations known to influence emotions, (ii) an optimization approach targeting the style latent space of a generative adversarial network, and (iii) a diffusion-based approach employing classifier guidance and classifier-free guidance. Our findings demonstrate that approaches can effectively alter the emotional properties of images while maintaining high visual quality. Optimization-based methods primarily adjust low-level properties like color hues and brightness, whereas the diffusion-based approach introduces semantic changes, such as altering appearance or facial expressions. Notably, results from a behavioral study reveal that only the diffusion-based approach successfully elicits changes in viewers' emotional responses while preserving high perceived image quality. In future work, we will investigate the impact of these image adaptations on internet user behavior.

Authors:Yangyu Huang, Tianyi Gao, Haoran Xu, Qihao Zhao, Yang Song, Zhipeng Gui, Tengchao Lv, Hao Chen, Lei Cui, Scarlett Li, Furu Wei
Title: PEACE: Empowering Geologic Map Holistic Understanding with MLLMs
Abstract:
Geologic map, as a fundamental diagram in geology science, provides critical insights into the structure and composition of Earth's subsurface and surface. These maps are indispensable in various fields, including disaster detection, resource exploration, and civil engineering. Despite their significance, current Multimodal Large Language Models (MLLMs) often fall short in geologic map understanding. This gap is primarily due to the challenging nature of cartographic generalization, which involves handling high-resolution map, managing multiple associated components, and requiring domain-specific knowledge. To quantify this gap, we construct GeoMap-Bench, the first-ever benchmark for evaluating MLLMs in geologic map understanding, which assesses the full-scale abilities in extracting, referring, grounding, reasoning, and analyzing. To bridge this gap, we introduce GeoMap-Agent, the inaugural agent designed for geologic map understanding, which features three modules: Hierarchical Information Extraction (HIE), Domain Knowledge Injection (DKI), and Prompt-enhanced Question Answering (PEQA). Inspired by the interdisciplinary collaboration among human scientists, an AI expert group acts as consultants, utilizing a diverse tool pool to comprehensively analyze questions. Through comprehensive experiments, GeoMap-Agent achieves an overall score of 0.811 on GeoMap-Bench, significantly outperforming 0.369 of GPT-4o. Our work, emPowering gEologic mAp holistiC undErstanding (PEACE) with MLLMs, paves the way for advanced AI applications in geology, enhancing the efficiency and accuracy of geological investigations.

Authors:Yuchen Zhou, Jiamin Wu, Zichen Ren, Zhouheng Yao, Weiheng Lu, Kunyu Peng, Qihao Zheng, Chunfeng Song, Wanli Ouyang, Chao Gou
Title: CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding
Abstract:
Understanding and decoding brain activity from electroencephalography (EEG) signals is a fundamental challenge in neuroscience and AI, with applications in cognition, emotion recognition, diagnosis, and brain-computer interfaces. While recent EEG foundation models advance generalized decoding via unified architectures and large-scale pretraining, they adopt a scale-agnostic dense modeling paradigm inherited from NLP and vision. This design neglects a core property of neural activity: cross-scale spatiotemporal structure. EEG task patterns span a wide range of temporal and spatial scales, from short bursts to slow rhythms, and from localized cortical responses to distributed interactions. Ignoring this diversity leads to suboptimal representations and weak generalization. We propose CSBrain, a Cross-scale Spatiotemporal Brain foundation model for generalized EEG decoding. CSBrain introduces: (i) Cross-scale Spatiotemporal Tokenization (CST), which aggregates multi-scale features from localized temporal windows and anatomical brain regions into compact scale-aware tokens; and (ii) Structured Sparse Attention (SSA), which captures cross-window and cross-region dependencies, enhancing scale diversity while removing spurious correlations. CST and SSA are alternately stacked to progressively integrate multi-scale dependencies. Experiments on 11 EEG tasks across 16 datasets show that CSBrain consistently outperforms task-specific and foundation model baselines. These results establish cross-scale modeling as a key inductive bias and position CSBrain as a robust backbone for future brain-AI research.

Authors:Weiheng Lu, Chunfeng Song, Jiamin Wu, Pengyu Zhu, Yuchen Zhou, Weijian Mai, Qihao Zheng, Wanli Ouyang
Title: UniMind: Unleashing the Power of LLMs for Unified Multi-Task Brain Decoding
Abstract:
Decoding human brain activity from electroencephalography (EEG) signals is a central challenge at the intersection of neuroscience and artificial intelligence, enabling diverse applications in mental state assessment, clinical monitoring, and human-machine interaction. Recent efforts have extensively explored EEG-based brain foundation models for generalized brain decoding, employing large-scale training on multiple datasets. However, most of these attempts struggle with generalizability and fail to achieve satisfactory performance without task-specific tuning due to pronounced inherent heterogeneity among decoding tasks. To address these challenges, we present UniMind, a general-purpose EEG foundation model for unified multi-task brain decoding by uniquely unleashing the power of large language models to comprehend complex neural patterns. UniMind offers several advantages. First, we design a Neuro-Language Connector to bridge the modality gap between neural signals and large language models, distilling and transforming the spatiotemporal neural patterns of EEG data into representations understandable by language models. Second, a Task-aware Query Selection module is proposed to inject task-awareness into the cross-modal alignment by dynamically generating task-adaptive query tokens, enabling learning of task-relevant neural patterns across diverse tasks. Extensive experiments across ten datasets demonstrate that UniMind substantially outperforms state-of-the-art multi-task decoding models, with an average gain of 12 percent, while also offering valuable neuroscientific insights into neural functional correlations across tasks. The code will be made publicly available.

Authors:Fan Wu, Cuiyun Gao, Shuqing Li, Xin-Cheng Wen, Qing Liao
Title: MLLM-Based UI2Code Automation Guided by UI Layout Information
Abstract:
Converting user interfaces into code (UI2Code) is a crucial step in website development, which is time-consuming and labor-intensive. The automation of UI2Code is essential to streamline this task, beneficial for improving the development efficiency. There exist deep learning-based methods for the task; however, they heavily rely on a large amount of labeled training data and struggle with generalizing to real-world, unseen web page designs. The advent of Multimodal Large Language Models (MLLMs) presents potential for alleviating the issue, but they are difficult to comprehend the complex layouts in UIs and generate the accurate code with layout preserved. To address these issues, we propose LayoutCoder, a novel MLLM-based framework generating UI code from real-world webpage images, which includes three key modules: (1) Element Relation Construction, which aims at capturing UI layout by identifying and grouping components with similar structures; (2) UI Layout Parsing, which aims at generating UI layout trees for guiding the subsequent code generation process; and (3) Layout-Guided Code Fusion, which aims at producing the accurate code with layout preserved. For evaluation, we build a new benchmark dataset which involves 350 real-world websites named Snap2Code, divided into seen and unseen parts for mitigating the data leakage issue, besides the popular dataset Design2Code. Extensive evaluation shows the superior performance of LayoutCoder over the state-of-the-art approaches. Compared with the best-performing baseline, LayoutCoder improves 10.14% in the BLEU score and 3.95% in the CLIP score on average across all datasets.

Authors:Tobias King, Michael Knierim, Philipp Lepold, Christopher Clarke, Hans Gellersen, Michael Beigl, Tobias Röddiger
Title: earEOG via Periauricular Electrodes to Facilitate Eye Tracking in a Natural Headphone Form Factor
Abstract:
Eye tracking technology is frequently utilized to diagnose eye and neurological disorders, assess sleep and fatigue, study human visual perception, and enable novel gaze-based interaction methods. However, traditional eye tracking methodologies are constrained by bespoke hardware that is often cumbersome to wear, complex to apply, and demands substantial computational resources. To overcome these limitations, we investigated Electrooculography (EOG) eye tracking using 14 electrodes positioned around the ears, integrated into a custom-built headphone form factor device. In a controlled experiment, 16 participants tracked stimuli designed to induce smooth pursuits and saccades. Data analysis identified optimal electrode pairs for vertical and horizontal eye movement tracking, benchmarked against gold-standard EOG and camera-based methods. The electrode montage nearest the eyes yielded the best horizontal results. Horizontal smooth pursuits via earEOG showed high correlation with gold-standard measures ($r_{\mathrm{EOG}} = 0.81, p = 0.01$; $r_{\mathrm{CAM}} = 0.56, p = 0.02$), while vertical pursuits were weakly correlated ($r_{\mathrm{EOG}} = 0.28, p = 0.04$; $r_{\mathrm{CAM}} = 0.35, p = 0.05$). Voltage deflections when performing saccades showed strong correlation in the horizontal direction ($r_{\mathrm{left}} = 0.99, p = 0.0$; $r_{\mathrm{right}} = 0.99, p = 0.0$) but low correlation in the vertical direction ($r_{\mathrm{up}} = 0.6, p = 0.23$; $r_{\mathrm{down}} = 0.19, p = 0.73$). Overall, horizontal earEOG demonstrated strong performance, indicating its potential effectiveness, while vertical earEOG results were poor, suggesting limited feasibility in our current setup.

Authors:Ali Abedi, Charlene H. Chu, Shehroz S. Khan
Title: Benchmarking Early Agitation Prediction in Community-Dwelling People with Dementia Using Multimodal Sensors and Machine Learning
Abstract:
Agitation is one of the most common responsive behaviors in people living with dementia, particularly among those residing in community settings without continuous clinical supervision. Timely prediction of agitation can enable early intervention, reduce caregiver burden, and improve the quality of life for both patients and caregivers. This study aimed to develop and benchmark machine learning approaches for the early prediction of agitation in community-dwelling older adults with dementia using multimodal sensor data. A new set of agitation-related contextual features derived from activity data was introduced and employed for agitation prediction. A wide range of machine learning and deep learning models was evaluated across multiple problem formulations, including binary classification for single-timestamp tabular sensor data and multi-timestamp sequential sensor data, as well as anomaly detection for single-timestamp tabular sensor data. The study utilized the Technology Integrated Health Management (TIHM) dataset, the largest publicly available dataset for remote monitoring of people living with dementia, comprising 2,803 days of in-home activity, physiology, and sleep data. The most effective setting involved binary classification of sensor data using the current 6-hour timestamp to predict agitation at the subsequent timestamp. Incorporating additional information, such as time of day and agitation history, further improved model performance, with the highest AUC-ROC of 0.9720 and AUC-PR of 0.4320 achieved by the light gradient boosting machine. This work presents the first comprehensive benchmarking of state-of-the-art techniques for agitation prediction in community-based dementia care using privacy-preserving sensor data. The approach enables accurate, explainable, and efficient agitation prediction, supporting proactive dementia care and aging in place.

Authors:Zhanxin Hao, Jie Cao, Ruimiao Li, Jifan Yu, Zhiyuan Liu, Yu Zhang
Title: Mapping Student-AI Interaction Dynamics in Multi-Agent Learning Environments: Supporting Personalised Learning and Reducing Performance Gaps
Abstract:
Multi-agent AI systems, which simulate diverse instructional roles such as teachers and peers, offer new possibilities for personalized and interactive learning. Yet, student-AI interaction patterns and their pedagogical implications remain unclear. This study explores how university students engaged with multiple AI agents, and how these interactions influenced cognitive outcomes (learning gains) and non-cognitive factors (motivation, technology acceptance). Based on MAIC, an online learning platform with multi-agent, the research involved 305 university students and 19,365 lines of dialogue data. Pre- and post-test scores, self-reported motivation and technology acceptance were also collected. The study identified two engagement patterns: co-construction of knowledge and co-regulation. Lag sequential analysis revealed that students with lower prior knowledge relied more on co-construction of knowledge sequences, showing higher learning gains and post-course motivation. In contrast, students with higher prior knowledge engaged more in co-regulation behaviors but exhibited limited learning improvement. Technology acceptance increased across all groups. These findings suggest that multi-agent AI systems can adapt to students' varying needs, support differentiated engagement, and reduce performance gaps. Implications for personalized system design and future research directions are discussed.

Authors:Michael Küttner, Valeria Zitz, Kathrin Gerling, Michael Beigl, Tobias Röddiger
Title: UltrasonicSpheres: Localized, Multi-Channel Sound Spheres Using Off-the-Shelf Speakers and Earables
Abstract:
We present a demo of UltrasonicSpheres, a novel system for location-specific audio delivery using wearable earphones that decode ultrasonic signals into audible sound. Unlike conventional beamforming setups, UltrasonicSpheres relies on single ultrasonic speakers to broadcast localized audio with multiple channels, each encoded on a distinct ultrasonic carrier frequency. Users wearing our acoustically transparent earphones can demodulate their selected stream, such as exhibit narrations in a chosen language, while remaining fully aware of ambient environmental sounds. The experience preserves spatial audio perception, giving the impression that the sound originates directly from the physical location of the source. This enables personalized, localized audio without requiring pairing, tracking, or additional infrastructure. Importantly, visitors not equipped with the earphones are unaffected, as the ultrasonic signals are inaudible to the human ear. Our demo invites participants to explore multiple co-located audio zones and experience how UltrasonicSpheres supports unobtrusive delivery of personalized sound in public spaces.

Authors:Valeria Zitz, Michael Küttner, Jonas Hummel, Michael T. Knierim, Michael Beigl, Tobias Röddiger
Title: Heatables: Effects of Infrared-LED-Induced Ear Heating on Thermal Perception, Comfort, and Cognitive Performance
Abstract:
Maintaining thermal comfort in shared indoor environments remains challenging, as centralized HVAC systems are slow to adapt and standardized to group norms. Cold exposure not only reduces subjective comfort but can impair cognitive performance, particularly under moderate to severe cold stress. Personal Comfort Systems (PCS) have shown promise by providing localized heating, yet many designs target distal body parts with low thermosensitivity and often lack portability. In this work, we investigate whether targeted thermal stimulation using in-ear worn devices can manipulate thermal perception and enhance thermal comfort. We present Heatables, a novel in-ear wearable that emits Near-Infrared (NIR) and Infrared (IR) radiation via integrated LEDs to deliver localized optical heating. This approach leverages NIR-IR's ability to penetrate deeper tissues, offering advantages over traditional resistive heating limited to surface warming. In a placebo-controlled study with 24 participants, each exposed for 150 minutes in a cool office environment (approximately 17.5 degrees Celsius) to simulate sustained cold stress during typical sedentary office activities, Heatables significantly increased the perceived ambient temperature by around 1.5 degrees Celsius and delayed cold discomfort. Importantly, thermal benefits extended beyond the ear region, improving both whole-body comfort and thermal acceptability. These findings position in-ear NIR-IR-LED-based stimulation as a promising modality for unobtrusive thermal comfort enhancement in everyday contexts.

Authors:Mina Huh, Zihui Xue, Ujjaini Das, Kumar Ashutosh, Kristen Grauman, Amy Pavel
Title: Vid2Coach: Transforming How-To Videos into Task Assistants
Abstract:
People use videos to learn new recipes, exercises, and crafts. Such videos remain difficult for blind and low vision (BLV) people to follow as they rely on visual comparison. Our observations of visual rehabilitation therapists (VRTs) guiding BLV people to follow how-to videos revealed that VRTs provide both proactive and responsive support including detailed descriptions, non-visual workarounds, and progress feedback. We propose Vid2Coach, a system that transforms how-to videos into wearable camera-based assistants that provide accessible instructions and mixed-initiative feedback. From the video, Vid2Coach generates accessible instructions by augmenting narrated instructions with demonstration details and completion criteria for each step. It then uses retrieval-augmented-generation to extract relevant non-visual workarounds from BLV-specific resources. Vid2Coach then monitors user progress with a camera embedded in commercial smart glasses to provide context-aware instructions, proactive feedback, and answers to user questions. BLV participants (N=8) using Vid2Coach completed cooking tasks with 58.5\% fewer errors than when using their typical workflow and wanted to use Vid2Coach in their daily lives. Vid2Coach demonstrates an opportunity for AI visual assistance that strengthens rather than replaces non-visual expertise.

Authors:Cheng Luo, Jianghui Wang, Bing Li, Siyang Song, Bernard Ghanem
Title: OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions
Abstract:
In this paper, we introduce Online Multimodal Conversational Response Generation (OMCRG), a novel task that aims to online generate synchronized verbal and non-verbal listener feedback, conditioned on the speaker's multimodal input. OMCRG reflects natural dyadic interactions and poses new challenges in achieving synchronization between the generated audio and facial responses of the listener. To address these challenges, we innovatively introduce text as an intermediate modality to bridge the audio and facial responses. We hence propose OmniResponse, a Multimodal Large Language Model (MLLM) that autoregressively generates high-quality multi-modal listener responses. OmniResponse leverages a pretrained LLM enhanced with two novel components: Chrono-Text, which temporally anchors generated text tokens, and TempoVoice, a controllable online TTS module that produces speech synchronized with facial reactions. To support further OMCRG research, we present ResponseNet, a new dataset comprising 696 high-quality dyadic interactions featuring synchronized split-screen videos, multichannel audio, transcripts, and facial behavior annotations. Comprehensive evaluations conducted on ResponseNet demonstrate that OmniResponse significantly outperforms baseline models in terms of semantic speech content, audio-visual synchronization, and generation quality.

Authors:Sadaf Safa, Ali Abedi, Shehroz S. Khan
Title: Supervised Contrastive Learning for Ordinal Engagement Measurement
Abstract:
Student engagement plays a crucial role in the successful delivery of educational programs. Automated engagement measurement helps instructors monitor student participation, identify disengagement, and adapt their teaching strategies to enhance learning outcomes effectively. This paper identifies two key challenges in this problem: class imbalance and incorporating order into engagement levels rather than treating it as mere categories. Then, a novel approach to video-based student engagement measurement in virtual learning environments is proposed that utilizes supervised contrastive learning for ordinal classification of engagement. Various affective and behavioral features are extracted from video samples and utilized to train ordinal classifiers within a supervised contrastive learning framework (with a sequential classifier as the encoder). A key step involves the application of diverse time-series data augmentation techniques to these feature vectors, enhancing model training. The effectiveness of the proposed method was evaluated using a publicly available dataset for engagement measurement, DAiSEE, containing videos of students who participated in virtual learning programs. The results demonstrate the robust ability of the proposed method for the classification of the engagement level. This approach promises a significant contribution to understanding and enhancing student engagement in virtual learning environments.

Authors:Jessica Tang, Ali Abedi, Tracey J. F. Colella, Shehroz S. Khan
Title: Rehabilitation Exercise Quality Assessment and Feedback Generation Using Large Language Models with Prompt Engineering
Abstract:
Exercise-based rehabilitation improves quality of life and reduces morbidity, mortality, and rehospitalization, though transportation constraints and staff shortages lead to high dropout rates from rehabilitation programs. Virtual platforms enable patients to complete prescribed exercises at home, while AI algorithms analyze performance, deliver feedback, and update clinicians. Although many studies have developed machine learning and deep learning models for exercise quality assessment, few have explored the use of large language models (LLMs) for feedback and are limited by the lack of rehabilitation datasets containing textual feedback. In this paper, we propose a new method in which exercise-specific features are extracted from the skeletal joints of patients performing rehabilitation exercises and fed into pre-trained LLMs. Using a range of prompting techniques, such as zero-shot, few-shot, chain-of-thought, and role-play prompting, LLMs are leveraged to evaluate exercise quality and provide feedback in natural language to help patients improve their movements. The method was evaluated through extensive experiments on two publicly available rehabilitation exercise assessment datasets (UI-PRMD and REHAB24-6) and showed promising results in exercise assessment, reasoning, and feedback generation. This approach can be integrated into virtual rehabilitation platforms to help patients perform exercises correctly, support recovery, and improve health outcomes.

Authors:Simin Yang, Xian Wang, Yang Li, Lik-Hang Lee, Tristan Camille Braud, Pan Hui
Title: A Comprehensive Survey of Electrical Stimulation Haptic Feedback in Human-Computer Interaction
Abstract:
Haptic perception and feedback play a pivotal role in interactive experiences, forming an essential component of human-computer interaction (HCI). In recent years, the field of haptic interaction has witnessed significant advancements, particularly in the area of electrical haptic feedback, driving innovation across various domains. To gain a comprehensive understanding of the current state of research and the latest developments in electrical haptic interaction, this study systematically reviews the literature in this area. Our investigation covers key aspects including haptic devices, haptic perception mechanisms, the comparison and integration of electrical haptic feedback with other feedback modalities, and their diverse applications. Specifically, we conduct a systematic analysis of 110 research papers to explore the forefront of electrical haptic feedback, providing insights into its latest trends, challenges, and future directions.

Authors:Xueyin Li, Xinkai Jiang, Philipp Dahlinger, Gerhard Neumann, Rudolf Lioutikov
Title: Beyond Visuals: Investigating Force Feedback in Extended Reality for Robot Data Collection
Abstract:
This work explores how force feedback affects various aspects of robot data collection within the Extended Reality (XR) setting. Force feedback has been proved to enhance the user experience in Extended Reality (XR) by providing contact-rich information. However, its impact on robot data collection has not received much attention in the robotics community. This paper addresses this shortcoming by conducting an extensive user study on the effects of force feedback during data collection in XR. We extended two XR-based robot control interfaces, Kinesthetic Teaching and Motion Controllers, with haptic feedback features. The user study is conducted using manipulation tasks ranging from simple pick-place to complex peg assemble, requiring precise operations. The evaluations show that force feedback enhances task performance and user experience, particularly in tasks requiring high-precision manipulation. These improvements vary depending on the robot control interface and task complexity. This paper provides new insights into how different factors influence the impact of force feedback.
中文: 本研究探讨了力反馈在扩展现实中如何提升机器人数据收集,发现它能显著改善任务表现和用户体验,尤其在精密操作中,其效果因控制界面和任务复杂度而异。
English: This study investigates how force feedback enhances robot data collection in Extended Reality, finding it significantly improves task performance and user experience, especially in high-precision operations, with effects varying by control interface and task complexity.

Authors:Yifan Zhang, Chen Huang, Zachary Karas, Dung Thuy Nguyen, Kevin Leach, Yu Huang
Title: Enhancing Code LLM Training with Programmer Attention
Abstract:
Human attention provides valuable yet underexploited signals for code LLM training, offering a perspective beyond purely machine-driven attention. Despite the complexity and cost of collecting eye-tracking data, there has also been limited progress in systematically using these signals for code LLM training. To address both issues, we propose a cohesive pipeline spanning augmentation and reward-based fine-tuning. Specifically, we introduce (1) an eye-tracking path augmentation method to expand programmer attention datasets, (2) a pattern abstraction step that refines raw fixations into learnable attention motifs, and (3) a reward-guided strategy for integrating these insights directly into a CodeT5 supervised fine-tuning process. Our experiments yield +7.16 in CodeBLEU on the CodeXGlue benchmark for code summarization, underscoring how uniting human and machine attention can boost code intelligence. We hope this work encourages broader exploration of human-centric methods in next-generation AI4SE.

Authors:Abe Bohan Hou, Hongru Du, Yichen Wang, Jingyu Zhang, Zixiao Wang, Paul Pu Liang, Daniel Khashabi, Lauren Gardner, Tianxing He
Title: Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy
Abstract:
Can we simulate a sandbox society with generative agents to model human behavior, thereby reducing the over-reliance on real human trials for assessing public policies? In this work, we investigate the feasibility of simulating health-related decision-making, using vaccine hesitancy, defined as the delay in acceptance or refusal of vaccines despite the availability of vaccination services (MacDonald, 2015), as a case study. To this end, we introduce the VacSim framework with 100 generative agents powered by Large Language Models (LLMs). VacSim simulates vaccine policy outcomes with the following steps: 1) instantiate a population of agents with demographics based on census data; 2) connect the agents via a social network and model vaccine attitudes as a function of social dynamics and disease-related information; 3) design and evaluate various public health interventions aimed at mitigating vaccine hesitancy. To align with real-world results, we also introduce simulation warmup and attitude modulation to adjust agents' attitudes. We propose a series of evaluations to assess the reliability of various LLM simulations. Experiments indicate that models like Llama and Qwen can simulate aspects of human behavior but also highlight real-world alignment challenges, such as inconsistent responses with demographic profiles. This early exploration of LLM-driven simulations is not meant to serve as definitive policy guidance; instead, it serves as a call for action to examine social simulation for policy development.

Authors:Zhanxin Hao, Jianxiao Jiang, Jifan Yu, Zhiyuan Liu, Yu Zhang
Title: Student engagement in collaborative learning with AI agents in an LLM-empowered learning environment: A cluster analysis
Abstract:
Integrating LLM models into educational practice fosters personalized learning by accommodating the diverse behavioral patterns of different learner types. This study aims to explore these learner types within a novel interactive setting, providing a detailed analysis of their distinctive characteristics and interaction dynamics. The research involved 110 students from a university in China, who engaged with multiple LLM agents in an LLM-empowered learning environment, completing coursework across six modules. Data on the students' non-cognitive traits, course engagement, and AI interaction patterns were collected and analyzed. Using hierarchical cluster analysis, the students were classified into three distinct groups: active questioners, responsive navigators, and silent listeners. Epistemic network analysis was then applied to further delineate the interaction profiles and cognitive engagement of different types of learners. The findings underscore how different learner types engage with human-AI interactive learning and offer practical implications for the design of adaptive educational systems.

Authors:Jane Pan, Ryan Shar, Jacob Pfau, Ameet Talwalkar, He He, Valerie Chen
Title: When Benchmarks Talk: Re-Evaluating Code LLMs with Interactive Feedback
Abstract:
Programming is a fundamentally interactive process, yet coding assistants are often evaluated using static benchmarks that fail to measure how well models collaborate with users. We introduce an interactive evaluation pipeline to examine how LLMs incorporate different types of feedback in a collaborative setting. Specifically, we perturb static coding benchmarks so that the code model must interact with a simulated user to retrieve key information about the problem. We find that interaction significantly affects model performance, as the relative rankings of 10 models across 3 datasets often vary between static and interactive settings, despite models being fairly robust to feedback that contains errors. We also observe that even when different feedback types are equally effective with respect to performance, they can impact model behaviors such as (1) how models respond to higher- vs. lower-quality feedback and (2) whether models prioritize aesthetic vs. functional edits. Our work aims to "re-evaluate" model coding capabilities through an interactive lens toward bridging the gap between existing evaluations and real-world usage.

Authors:Yuchong Zhang, Bastian Orthmann, Michael C. Welle, Jonne Van Haastregt, Danica Kragic
Title: LLM-Driven Augmented Reality Puppeteer: Controller-Free Voice-Commanded Robot Teleoperation
Abstract:
The integration of robotics and augmented reality (AR) presents transformative opportunities for advancing human-robot interaction (HRI) by improving usability, intuitiveness, and accessibility. This work introduces a controller-free, LLM-driven voice-commanded AR puppeteering system, enabling users to teleoperate a robot by manipulating its virtual counterpart in real time. By leveraging natural language processing (NLP) and AR technologies, our system -- prototyped using Meta Quest 3 -- eliminates the need for physical controllers, enhancing ease of use while minimizing potential safety risks associated with direct robot operation. A preliminary user demonstration successfully validated the system's functionality, demonstrating its potential for safer, more intuitive, and immersive robotic control.

Authors:Yang Liu, Haiwei Dong, Abdulmotaleb El Saddik
Title: Leveraging LLMs to Create a Haptic Devices' Recommendation System
Abstract:
Haptic technology has seen significant growth, yet a lack of awareness of existing haptic device design knowledge hinders development. This paper addresses these limitations by leveraging advancements in Large Language Models (LLMs) to develop a haptic agent, focusing specifically on Grounded Force Feedback (GFF) devices recommendation. Our approach involves automating the creation of a structured haptic device database using information from research papers and product specifications. This database enables the recommendation of relevant GFF devices based on user queries. To ensure precise and contextually relevant recommendations, the system employs a dynamic retrieval method that combines both conditional and semantic searches. Benchmarking against the established UEQ and existing haptic device searching tools, the proposed haptic recommendation agent ranks in the top 10\% across all UEQ categories with mean differences favoring the agent in nearly all subscales, and maintains no significant performance bias across different user groups, showcasing superior usability and user satisfaction.

Authors:Nitay Calderon, Roi Reichart, Rotem Dror
Title: The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs
Abstract:
The "LLM-as-an-annotator" and "LLM-as-a-judge" paradigms employ Large Language Models (LLMs) as annotators, judges, and evaluators in tasks traditionally performed by humans. LLM annotations are widely used, not only in NLP research but also in fields like medicine, psychology, and social science. Despite their role in shaping study results and insights, there is no standard or rigorous procedure to determine whether LLMs can replace human annotators. In this paper, we propose a novel statistical procedure, the Alternative Annotator Test (alt-test), that requires only a modest subset of annotated examples to justify using LLM annotations. Additionally, we introduce a versatile and interpretable measure for comparing LLM annotators and judges. To demonstrate our procedure, we curated a diverse collection of ten datasets, consisting of language and vision-language tasks, and conducted experiments with six LLMs and four prompting techniques. Our results show that LLMs can sometimes replace humans with closed-source LLMs (such as GPT-4o), outperforming the open-source LLMs we examine, and that prompting techniques yield judges of varying quality. We hope this study encourages more rigorous and reliable practices.

Authors:Marie Altmann, Kimberly Hegemann, Ali Askari, Vineetha Rallabandi, Max Pascher, Jens Gerken
Title: NoticeLight: Embracing Socio-Technical Asymmetry through Tangible Peripheral Robotic Embodiment in Hybrid Collaboration
Abstract:
Hybrid collaboration has become a fixture in modern workplaces, yet it introduces persistent socio-technical asymmetries-especially disadvantaging remote participants, who struggle with presence disparity, reduced visibility, and limited non-verbal communication. Traditional solutions often seek to erase these asymmetries, but recent research suggests embracing them as productive design constraints. In this context, we introduce NoticeLight: a tangible, peripheral robotic embodiment designed to augment hybrid meetings. NoticeLight transforms remote participants' digital presence into ambient, physical signals -- such as mood dynamics, verbal contribution mosaics, and attention cues -- within the co-located space. By abstracting group states into subtle light patterns, NoticeLight fosters peripheral awareness and balanced participation without disrupting meeting flow or demanding cognitive overload. This approach aligns with emerging perspectives in human-robot synergy, positioning robots as mediators that reshape, rather than replicate, human presence. Our work thereby advances the discourse on how robotic embodiments can empower equitable, dynamic collaboration in the workplace.

Authors:Ruiwei Xiao, Xinying Hou, Runlong Ye, Majeed Kazemitabaar, Nicholas Diana, Michael Liut, John Stamper
Title: Improving Student-AI Interaction Through Pedagogical Prompting: An Example in Computer Science Education
Abstract:
With the proliferation of large language model (LLM) applications since 2022, their use in education has sparked both excitement and concern. Recent studies consistently highlight students' (mis)use of LLMs can hinder learning outcomes. This work aims to teach students how to effectively prompt LLMs to improve their learning. We first proposed pedagogical prompting, a theoretically-grounded new concept to elicit learning-oriented responses from LLMs. To move from concept design to a proof-of-concept learning intervention in real educational settings, we selected early undergraduate CS education (CS1/CS2) as the example context. We began with a formative survey study with instructors (N=36) teaching early-stage undergraduate-level CS courses to inform the instructional design based on classroom needs. Based on their insights, we designed and developed a learning intervention through an interactive system with scenario-based instruction to train pedagogical prompting skills. Finally, we evaluated its instructional effectiveness through a user study with CS novice students (N=22) using pre/post-tests. Through mixed methods analyses, our results indicate significant improvements in learners' LLM-based pedagogical help-seeking skills, along with positive attitudes toward the system and increased willingness to use pedagogical prompts in the future. Our contributions include (1) a theoretical framework of pedagogical prompting; (2) empirical insights into current instructor attitudes toward pedagogical prompting; and (3) a learning intervention design with an interactive learning tool and scenario-based instruction leading to promising results on teaching LLM-based help-seeking. Our approach is scalable for broader implementation in classrooms and has the potential to be integrated into tools like ChatGPT as an on-boarding experience to encourage learning-oriented use of generative AI.

Authors:K. J. Kevin Feng, David W. McDonald, Amy X. Zhang
Title: Levels of Autonomy for AI Agents
Abstract:
Autonomy is a double-edged sword for AI agents, simultaneously unlocking transformative possibilities and serious risks. How can agent developers calibrate the appropriate levels of autonomy at which their agents should operate? We argue that an agent's level of autonomy can be treated as a deliberate design decision, separate from its capability and operational environment. In this work, we define five levels of escalating agent autonomy, characterized by the roles a user can take when interacting with an agent: operator, collaborator, consultant, approver, and observer. Within each level, we describe the ways by which a user can exert control over the agent and open questions for how to design the nature of user-agent interaction. We then highlight a potential application of our framework towards AI autonomy certificates to govern agent behavior in single- and multi-agent systems. We conclude by proposing early ideas for evaluating agents' autonomy. Our work aims to contribute meaningful, practical steps towards responsibly deployed and useful AI agents in the real world.

Authors:Yuhang Zhou, Yimin Xiao, Wei Ai, Ge Gao
Title: The Hidden Language of Harm: Examining the Role of Emojis in Harmful Online Communication and Content Moderation
Abstract:
Social media platforms have become central to modern communication, yet they also harbor offensive content that challenges platform safety and inclusivity. While prior research has primarily focused on textual indicators of offense, the role of emojis, ubiquitous visual elements in online discourse, remains underexplored. Emojis, despite being rarely offensive in isolation, can acquire harmful meanings through symbolic associations, sarcasm, and contextual misuse. In this work, we systematically examine emoji contributions to offensive Twitter messages, analyzing their distribution across offense categories and how users exploit emoji ambiguity. To address this, we propose an LLM-powered, multi-step moderation pipeline that selectively replaces harmful emojis while preserving the tweet's semantic intent. Human evaluations confirm our approach effectively reduces perceived offensiveness without sacrificing meaning. Our analysis also reveals heterogeneous effects across offense types, offering nuanced insights for online communication and emoji moderation.

Authors:Ilia Sucholutsky, Katherine M. Collins, Nori Jacoby, Bill D. Thompson, Robert D. Hawkins
Title: Using LLMs to Advance the Cognitive Science of Collectives
Abstract:
LLMs are already transforming the study of individual cognition, but their application to studying collective cognition has been underexplored. We lay out how LLMs may be able to address the complexity that has hindered the study of collectives and raise possible risks that warrant new methods.

Authors:Yusuke Masubuchi, Takefumi Hiraki, Yuichi Hiroi, Masanori Ibara, Kazuki Matsutani, Megumi Zaizen, Junya Morita
Title: Development of Digital Twin Environment through Integration of Commercial Metaverse Platform and IoT Sensors of Smart Building
Abstract:
The digital transformation of smart cities and workplaces requires effective integration of physical and cyber spaces, yet existing digital twin solutions remain limited in supporting real-time, multi-user collaboration. While metaverse platforms enable shared virtual experiences, they have not supported comprehensive integration of IoT sensors on physical spaces, especially for large-scale smart architectural environments. This paper presents a digital twin environment that integrates Kajima Corp.'s smart building facility "The GEAR" in Singapore with a commercial metaverse platform Cluster. Our system consists of three key components: a standardized IoT sensor platform, a real-time data relay system, and an environmental data visualization framework. Quantitative end-to-end latency measurements confirm the feasibility of our approach for real-world applications in large architectural spaces. The proposed framework enables new forms of collaboration that transcend spatial constraints, advancing the development of next-generation interactive environments.

Authors:Zahra Zahedi, Shashank Mehrotra, Teruhisa Misu, Kumar Akash
Title: Toward Informed AV Decision-Making: Computational Model of Well-being and Trust in Mobility
Abstract:
For future human-autonomous vehicle (AV) interactions to be effective and smooth, human-aware systems that analyze and align human needs with automation decisions are essential. Achieving this requires systems that account for human cognitive states. We present a novel computational model in the form of a Dynamic Bayesian Network (DBN) that infers the cognitive states of both AV users and other road users, integrating this information into the AV's decision-making process. Specifically, our model captures the well-being of both an AV user and an interacting road user as cognitive states alongside trust. Our DBN models infer beliefs over the AV user's evolving well-being, trust, and intention states, as well as the possible well-being of other road users, based on observed interaction experiences. Using data collected from an interaction study, we refine the model parameters and empirically assess its performance. Finally, we extend our model into a causal inference model (CIM) framework for AV decision-making, enabling the AV to enhance user well-being and trust while balancing these factors with its own operational costs and the well-being of interacting road users. Our evaluation demonstrates the model's effectiveness in accurately predicting user's states and guiding informed, human-centered AV decisions.

Authors:Sunghyo Chung, Hyeon Jeon, Sungbok Shin, Md Naimul Hoque
Title: Reading.help: Supporting EFL Readers with Proactive and On-Demand Explanation of English Grammar and Semantics
Abstract:
A large portion of texts in the world is written in English, but readers who see English as a Foreign Language (EFL) often struggle to read texts written in English accurately and swiftly. In many countries, EFL readers seek help from professional teachers and mentors, which is limited and costly. In this paper, we explore how an intelligent reading tool can assist EFL readers. To support our research agenda, we conducted a case study with EFL readers in South Korea. We at first developed an LLM-based reading tool based on prior literature. We then revised the tool based on the feedback from a study with 15 South Korean EFL readers. The final tool, named Reading.help, helps EFL readers comprehend complex sentences and paragraphs with on-demand and proactive explanations. We finally evaluated the tool with 5 EFL readers and 2 EFL education professionals. Our findings suggest Reading.help could potentially help EFL readers self-learn english when they do not have access to any external support.

Authors:Jiayi Wang, Ruiwei Xiao, Xinying Hou, Hanqi Li, Ying Jui Tseng, John Stamper, Ken Koedinger
Title: LLMs to Support K-12 Teachers in Culturally Relevant Pedagogy: An AI Literacy Example
Abstract:
Culturally Relevant Pedagogy (CRP) is vital in K-12 education, yet teachers struggle to implement CRP into practice due to time, training, and resource gaps. This study explores how Large Language Models (LLMs) can address these barriers by introducing CulturAIEd, an LLM tool that assists teachers in adapting AI literacy curricula to students' cultural contexts. Through an exploratory pilot with four K-12 teachers, we examined CulturAIEd's impact on CRP integration. Results showed CulturAIEd enhanced teachers' confidence in identifying opportunities for cultural responsiveness in learning activities and making culturally responsive modifications to existing activities. They valued CulturAIEd's streamlined integration of student demographic information, immediate actionable feedback, which could result in high implementation efficiency. This exploration of teacher-AI collaboration highlights how LLM can help teachers include CRP components into their instructional practices efficiently, especially in global priorities for future-ready education, such as AI literacy.

Authors:Sungbok Shin, Sunghyo Chung, Hyeon Jeon, Hyunwook Lee, Minje Choi, Taehun Kim, Jaehoon Choi, Sungahn Ko, Jaegul Choo
Title: Beyond the Mirror: Personal Analytics through Visual Juxtaposition with Other People's Data
Abstract:
An individual's data can reveal facets of behavior and identity, but its interpretation is context dependent. We can easily identify various self-tracking applications that help people reflect on their lives. However, self-tracking confined to one person's data source may fall short in terms of objectiveness, and insights coming from various perspectives. To address this, we examine how those interpretations about a person's data can be augmented when the data are juxtaposed with that of others using anonymized online calendar logs from a schedule management app. We develop CALTREND, a visual analytics system that compares an individuals anonymized online schedule logs with using those from other people. Using CALTREND as a probe, we conduct a study with two domain experts, one in information technology and one in Korean herbal medicine. We report our observations on how comparative views help enrich the characterization of an individual based on the experts' comments. We find that juxtaposing personal data with others' can potentially lead to diverse interpretations of one dataset shaped by domain-specific mental models.

Authors:Sungbok Shin, Hyeon Jeon, Sanghyun Hong, Niklas Elmqvist
Title: Data Therapist: Eliciting Domain Knowledge from Subject Matter Experts Using Large Language Models
Abstract:
Effective data visualization requires not only technical proficiency but also a deep understanding of the domain-specific context in which data exists. This context often includes tacit knowledge about data provenance, quality, and intended use, which is rarely explicit in the dataset itself. We present the Data Therapist, a web-based tool that helps domain experts externalize this implicit knowledge through a mixed-initiative process combining iterative Q&A with interactive annotation. Powered by a large language model, the system analyzes user-supplied datasets, prompts users with targeted questions, and allows annotation at varying levels of granularity. The resulting structured knowledge base can inform both human and automated visualization design. We evaluated the tool in a qualitative study involving expert pairs from Molecular Biology, Accounting, Political Science, and Usable Security. The study revealed recurring patterns in how experts reason about their data and highlights areas where AI support can improve visualization design.

Authors:Yuichi Hiroi, Yuji Hatada, Takefumi Hiraki
Title: Cross-Reality Lifestyle: Integrating Physical and Virtual Lives through Multi-Platform Metaverse
Abstract:
Technological advances are redefining the relationship between physical and virtual space. Traditionally, when users engage in virtual reality (VR), they are completely cut off from the physical space; similarly, they are unable to access virtual experiences while engaged in physical activities. However, modern multi-platform metaverse environments allow simultaneous participation through mobile devices, creating new opportunities for integrated experiences. This study introduces the concept of "cross-reality lifestyles" to examine how users actively combine their physical and virtual activities. We identify three patterns of integration: 1) amplification: one space enhances experiences in the other; 2) complementary: spaces offer different but equally valuable alternatives; and 3) emergence: simultaneous engagement creates entirely new experiences. By analyzing commercial platforms, we create a technical framework that addresses content design, platform infrastructure, and device interfaces. This framework guides the development of cross-reality applications while demonstrating how metaverse technologies blur the traditional boundaries between physical and virtual experiences.

Authors:Ryutaro Kurai, Takefumi Hiraki, Yuichi Hiroi, Yutaro Hirao, Monica Perusquía-Hernández, Hideaki Uchiyama, Kiyoshi Kiyokawa
Title: MagicCraft: Natural Language-Driven Generation of Dynamic and Interactive 3D Objects for Commercial Metaverse Platforms
Abstract:
Metaverse platforms are rapidly evolving to provide immersive spaces for user interaction and content creation. However, the generation of dynamic and interactive 3D objects remains challenging due to the need for advanced 3D modeling and programming skills. To address this challenge, we present MagicCraft, a system that generates functional 3D objects from natural language prompts for metaverse platforms. MagicCraft uses generative AI models to manage the entire content creation pipeline: converting user text descriptions into images, transforming images into 3D models, predicting object behavior, and assigning necessary attributes and scripts. It also provides an interactive interface for users to refine generated objects by adjusting features such as orientation, scale, seating positions, and grip points. Implemented on Cluster, a commercial metaverse platform, MagicCraft was evaluated by 7 expert CG designers and 51 general users. Results show that MagicCraft significantly reduces the time and skill required to create 3D objects. Users with no prior experience in 3D modeling or programming successfully created complex, interactive objects and deployed them in the metaverse. Expert feedback highlighted the system's potential to improve content creation workflows and support rapid prototyping. By integrating AI-generated content into metaverse platforms, MagicCraft makes 3D content creation more accessible.

Authors:Raymond Fok, Joseph Chee Chang, Marissa Radensky, Pao Siangliulue, Jonathan Bragg, Amy X. Zhang, Daniel S. Weld
Title: Facets, Taxonomies, and Syntheses: Navigating Structured Representations in LLM-Assisted Literature Review
Abstract:
Comprehensive literature review requires synthesizing vast amounts of research -- a labor intensive and cognitively demanding process. Most prior work focuses either on helping researchers deeply understand a few papers (e.g., for triaging or reading), or retrieving from and visualizing a vast corpus. Deep analysis and synthesis of large paper collections (e.g., to produce a survey paper) is largely conducted manually with little support. We present DimInd, an interactive system that scaffolds literature review across large paper collections through LLM-generated structured representations. DimInd scaffolds literature understanding with multiple levels of compression, from papers, to faceted literature comparison tables with information extracted from individual papers, to taxonomies of concepts, to narrative syntheses. Users are guided through these successive information transformations while maintaining provenance to source text. In an evaluation with 23 researchers, DimInd supported participants in extracting information and conceptually organizing papers with less effort compared to a ChatGPT-assisted baseline workflow.

Authors:Chaoran Chen, Zhiping Zhang, Ibrahim Khalilov, Bingcan Guo, Simret A Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, Toby Jia-Jun Li
Title: Toward a Human-Centered Evaluation Framework for Trustworthy LLM-Powered GUI Agents
Abstract:
The rise of Large Language Models (LLMs) has revolutionized Graphical User Interface (GUI) automation through LLM-powered GUI agents, yet their ability to process sensitive data with limited human oversight raises significant privacy and security risks. This position paper identifies three key risks of GUI agents and examines how they differ from traditional GUI automation and general autonomous agents. Despite these risks, existing evaluations focus primarily on performance, leaving privacy and security assessments largely unexplored. We review current evaluation metrics for both GUI and general LLM agents and outline five key challenges in integrating human evaluators for GUI agent assessments. To address these gaps, we advocate for a human-centered evaluation framework that incorporates risk assessments, enhances user awareness through in-context consent, and embeds privacy and security considerations into GUI agent design and evaluation.

Authors:Yong-Hao Hu, Sotaro Yokoi, Yuji Hatada, Yuichi Hiroi, Takuji Narumi, Takefumi Hiraki
Title: LUIDA: Large-scale Unified Infrastructure for Digital Assessments based on Commercial Metaverse Platform
Abstract:
Online experiments using metaverse platforms have gained significant traction in Human-Computer Interaction and Virtual Reality (VR) research. However, current research workflows are highly fragmented, as researchers must use separate tools for system implementation, participant recruitment, experiment execution, and data collection, reducing consistency and increasing workload. We present LUIDA (Large-scale Unified Infrastructure for Digital Assessments), a metaverse-based framework that integrates these fragmented processes. LUIDA automatically allocates interconnected virtual environments for parallel experiment execution and provides implementation templates adaptable to various VR research domains, requiring minimal metaverse development expertise. Our evaluation included two studies using a prototype built on Cluster, the commercial metaverse platform. First, VR researchers using LUIDA to develop and run experiments reported high usability scores (SUS: 73.75) and moderate workload (NASA-TLX: 24.11) for overall usage, with interviews confirming streamlined workflows compared to traditional laboratory experiments. Second, we conducted three replicated experiments with public Cluster users, each recruiting approximately 200 participants within one week. These experiments produced results that closely matched the original studies, validating the experimental integrity of LUIDA across research domains. After technical refinements, we plan to release LUIDA as an open platform, providing a standardized protocol to improve research efficiency and experimental reproducibility in VR studies.

Authors:Luis Morales-Navarro, Daniel J. Noh, Yasmin B. Kafai
Title: Building babyGPTs: Youth Engaging in Data Practices and Ethical Considerations through the Construction of Generative Language Models
Abstract:
As generative language models (GLMs) have gained popularity, youth are increasingly using them in their everyday lives. As such, most research has centered on supporting youth as users of GLM-powered systems. However, we know little of how to engage youth in the design of these models. Building on the rich legacy of child-computer interaction research that positions youth as designers of computing systems, we explore how to support young people in designing GLMs. Through a case study of three teenagers (ages 14-15) building a babyGPT screenplay generator, we illustrate how the team developed a model while engaging in artificial intelligence/machine learning-relevant data practices and addressing ethical issues. This paper contributes a case study that demonstrates the feasibility of engaging youth in building GLMs.

Authors:Sitong Wang, Samia Menon, Dingzeyu Li, Xiaojuan Ma, Richard Zemel, Lydia B. Chilton
Title: Schemex: Interactive Structural Abstraction from Examples with Contrastive Refinement
Abstract:
Each type of creative or communicative work is underpinned by an implicit structure. People learn these structures from examples - a process known in cognitive science as schema induction. However, inducing schemas is challenging, as structural patterns are often obscured by surface-level variation. We present Schemex, an interactive visual workflow that scaffolds schema induction through clustering, abstraction, and contrastive refinement. Schemex supports users through visual representations and interactive exploration that connect abstract structures to concrete examples, promoting transparency, adaptability, and effective human-AI collaboration. In our user study, participants reported significantly greater insight and confidence in the schemas developed with Schemex compared to those created using a baseline of an AI reasoning model. We conclude by discussing the broader implications of structural abstraction and contrastive refinement across domains.

Authors:Chaoran Chen, Zhiping Zhang, Bingcan Guo, Shang Ma, Ibrahim Khalilov, Simret A Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, Toby Jia-Jun Li
Title: The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections
Abstract:
A Large Language Model (LLM) powered GUI agent is a specialized autonomous system that performs tasks on the user's behalf according to high-level instructions. It does so by perceiving and interpreting the graphical user interfaces (GUIs) of relevant apps, often visually, inferring necessary sequences of actions, and then interacting with GUIs by executing the actions such as clicking, typing, and tapping. To complete real-world tasks, such as filling forms or booking services, GUI agents often need to process and act on sensitive user data. However, this autonomy introduces new privacy and security risks. Adversaries can inject malicious content into the GUIs that alters agent behaviors or induces unintended disclosures of private information. These attacks often exploit the discrepancy between visual saliency for agents and human users, or the agent's limited ability to detect violations of contextual integrity in task automation. In this paper, we characterized six types of such attacks, and conducted an experimental study to test these attacks with six state-of-the-art GUI agents, 234 adversarial webpages, and 39 human participants. Our findings suggest that GUI agents are highly vulnerable, particularly to contextually embedded threats. Moreover, human users are also susceptible to many of these attacks, indicating that simple human oversight may not reliably prevent failures. This misalignment highlights the need for privacy-aware agent design. We propose practical defense strategies to inform the development of safer and more reliable GUI agents.

Authors:Amanpreet Kapoor, Marc Diaz, Stephen MacNeil, Leo Porter, Paul Denny
Title: Exploring Student Behaviors and Motivations using AI TAs with Optional Guardrails
Abstract:
AI-powered chatbots and digital teaching assistants (AI TAs) are gaining popularity in programming education, offering students timely and personalized feedback. Despite their potential benefits, concerns about student over-reliance and academic misconduct have prompted the introduction of "guardrails" into AI TAs - features that provide scaffolded support rather than direct solutions. However, overly restrictive guardrails may lead students to bypass these tools and use unconstrained AI models, where interactions are not observable, thus limiting our understanding of students' help-seeking behaviors. To investigate this, we designed and deployed a novel AI TA tool with optional guardrails in one lab of a large introductory programming course. As students completed three code writing and debugging tasks, they had the option to receive guardrailed help or use a "See Solution" feature which disabled the guardrails and generated a verbatim response from the underlying model. We investigate students' motivations and use of this feature and examine the association between usage and their course performance. We found that 50% of the 885 students used the "See Solution" feature for at least one problem and 14% used it for all three problems. Additionally, low-performing students were more likely to use this feature and use it close to the deadline as they started assignments later. The predominant factors that motivated students to disable the guardrails were assistance in solving problems, time pressure, lack of self-regulation, and curiosity. Our work provides insights into students' solution-seeking motivations and behaviors, which has implications for the design of AI TAs that balance pedagogical goals with student preferences.

Authors:Daniel J. Noh, Deborah A. Fields, Luis Morales-Navarro, Alexis Cabrera-Sutch, Yasmin B. Kafai, Danaé Metaxa
Title: Youth as Advisors in Participatory Design: Situating Teens' Expertise in Everyday Algorithm Auditing with Teachers and Researchers
Abstract:
Research on children and youth's participation in different roles in the design of technologies is one of the core contributions in child-computer interaction studies. Building on this work, we situate youth as advisors to a group of high school computer science teacher- and researcher-designers creating learning activities in the context of emerging technologies. Specifically, we explore algorithm auditing as a potential entry point for youth and adults to critically evaluate generative AI algorithmic systems, with the goal of designing classroom lessons. Through a two-hour session where three teenagers (16-18 years) served as advisors, we (1) examine the types of expertise the teens shared and (2) identify back stage design elements that fostered their agency and voice in this advisory role. Our discussion considers opportunities and challenges in situating youth as advisors, providing recommendations for actions that researchers, facilitators, and teachers can take to make this unusual arrangement feasible and productive.

Authors:Lars Krupp, Daniel Geißler, Peter Hevesi, Marco Hirsch, Paul Lukowicz, Jakob Karolus
Title: Talk2X -- An Open-Source Toolkit Facilitating Deployment of LLM-Powered Chatbots on the Web
Abstract:
Integrated into websites, LLM-powered chatbots offer alternative means of navigation and information retrieval, leading to a shift in how users access information on the web. Yet, predominantly closed-sourced solutions limit proliferation among web hosts and suffer from a lack of transparency with regard to implementation details and energy efficiency. In this work, we propose our openly available agent Talk2X leveraging an adapted retrieval-augmented generation approach (RAG) combined with an automatically generated vector database, benefiting energy efficiency. Talk2X's architecture is generalizable to arbitrary websites offering developers a ready to use tool for integration. Using a mixed-methods approach, we evaluated Talk2X's usability by tasking users to acquire specific assets from an open science repository. Talk2X significantly improved task completion time, correctness, and user experience supporting users in quickly pinpointing specific information as compared to standard user-website interaction. Our findings contribute technical advancements to an ongoing paradigm shift of how we access information on the web.

Authors:Matheus Kunzler Maldaner, Wesley Hanwen Deng, Jason Hong, Ken Holstein, Motahhare Eslami
Title: MIRAGE: Multi-model Interface for Reviewing and Auditing Generative Text-to-Image AI
Abstract:
While generative AI systems have gained popularity in diverse applications, their potential to produce harmful outputs limits their trustworthiness and usability in different applications. Recent years have seen growing interest in engaging diverse AI users in auditing generative AI that might impact their lives. To this end, we propose MIRAGE as a web-based tool where AI users can compare outputs from multiple AI text-to-image (T2I) models by auditing AI-generated images, and report their findings in a structured way. We used MIRAGE to conduct a preliminary user study with five participants and found that MIRAGE users could leverage their own lived experiences and identities to surface previously unnoticed details around harmful biases when reviewing multiple T2I models' outputs compared to reviewing only one.

Authors:Jorge de Heuvel, Daniel Marta, Simon Holk, Iolanda Leite, Maren Bennewitz
Title: The Impact of VR and 2D Interfaces on Human Feedback in Preference-Based Robot Learning
Abstract:
Aligning robot navigation with human preferences is essential for ensuring comfortable and predictable robot movement in shared spaces, facilitating seamless human-robot coexistence. While preference-based learning methods, such as reinforcement learning from human feedback (RLHF), enable this alignment, the choice of the preference collection interface may influence the process. Traditional 2D interfaces provide structured views but lack spatial depth, whereas immersive VR offers richer perception, potentially affecting preference articulation. This study systematically examines how the interface modality impacts human preference collection and navigation policy alignment. We introduce a novel dataset of 2,325 human preference queries collected through both VR and 2D interfaces, revealing significant differences in user experience, preference consistency, and policy outcomes. Our findings highlight the trade-offs between immersion, perception, and preference reliability, emphasizing the importance of interface selection in preference-based robot learning. The dataset will be publicly released to support future research.

Authors:Yanwei Huang, Wesley Hanwen Deng, Sijia Xiao, Motahhare Eslami, Jason I. Hong, Adam Perer
Title: Vipera: Towards systematic auditing of generative text-to-image models at scale
Abstract:
Generative text-to-image (T2I) models are known for their risks related such as bias, offense, and misinformation. Current AI auditing methods face challenges in scalability and thoroughness, and it is even more challenging to enable auditors to explore the auditing space in a structural and effective way. Vipera employs multiple visual cues including a scene graph to facilitate image collection sensemaking and inspire auditors to explore and hierarchically organize the auditing criteria. Additionally, it leverages LLM-powered suggestions to facilitate exploration of unexplored auditing directions. An observational user study demonstrates Vipera's effectiveness in helping auditors organize their analyses while engaging with diverse criteria.

Authors:Juntong Chen, Jiang Wu, Jiajing Guo, Vikram Mohanty, Xueming Li, Jorge Piazentin Ono, Wenbin He, Liu Ren, Dongyu Liu
Title: InterChat: Enhancing Generative Visual Analytics using Multimodal Interactions
Abstract:
The rise of Large Language Models (LLMs) and generative visual analytics systems has transformed data-driven insights, yet significant challenges persist in accurately interpreting users' analytical and interaction intents. While language inputs offer flexibility, they often lack precision, making the expression of complex intents inefficient, error-prone, and time-intensive. To address these limitations, we investigate the design space of multimodal interactions for generative visual analytics through a literature review and pilot brainstorming sessions. Building on these insights, we introduce a highly extensible workflow that integrates multiple LLM agents for intent inference and visualization generation. We develop InterChat, a generative visual analytics system that combines direct manipulation of visual elements with natural language inputs. This integration enables precise intent communication and supports progressive, visually driven exploratory data analyses. By employing effective prompt engineering, and contextual interaction linking, alongside intuitive visualization and interaction designs, InterChat bridges the gap between user interactions and LLM-driven visualizations, enhancing both interpretability and usability. Extensive evaluations, including two usage scenarios, a user study, and expert feedback, demonstrate the effectiveness of InterChat. Results show significant improvements in the accuracy and efficiency of handling complex visual analytics tasks, highlighting the potential of multimodal interactions to redefine user engagement and analytical depth in generative visual analytics.

Authors:Yoonsu Kim, Brandon Chin, Kihoon Son, Seoyoung Kim, Juho Kim
Title: Applying the Gricean Maxims to a Human-LLM Interaction Cycle: Design Insights from a Participatory Approach
Abstract:
While large language models (LLMs) are increasingly used to assist users in various tasks through natural language interactions, these interactions often fall short due to LLMs' limited ability to infer contextual nuances and user intentions, unlike humans. To address this challenge, we draw inspiration from the Gricean Maxims--human communication theory that suggests principles of effective communication--and aim to derive design insights for enhancing human-AI interactions (HAI). Through participatory design workshops with communication experts, designers, and end-users, we identified ways to apply these maxims across the stages of the HAI cycle. Our findings include reinterpreted maxims tailored to human-LLM contexts and nine actionable design considerations categorized by interaction stage. These insights provide a concrete framework for designing more cooperative and user-centered LLM-based systems, bridging theoretical foundations in communication with practical applications in HAI.

Authors:DaEun Choi, Kihoon Son, Hyunjoon Jung, Juho Kim
Title: Expandora: Broadening Design Exploration with Text-to-Image Model
Abstract:
Broad exploration of references is critical in the visual design process. While text-to-image (T2I) models offer efficiency and customization of exploration, they often limit support for divergence in exploration. We conducted a formative study (N=6) to investigate the limitations of current interaction with the T2I model for broad exploration and found that designers struggle to articulate exploratory intentions and manage iterative, non-linear workflows. To address these challenges, we developed Expandora. Users can specify their exploratory intentions and desired diversity levels through structured input, and using an LLM-based pipeline, Expandora generates tailored prompt variations. The results are displayed in a mindmap-like interface that encourages non-linear workflows. A user study (N=8) demonstrated that Expandora significantly increases prompt diversity, the number of prompts users tried within a given time, and user satisfaction compared to the baseline. Nonetheless, its limitations in supporting convergent thinking suggest opportunities for holistically improving creative processes.

Authors:Jaemarie Solyst, Cindy Peng, Wesley Hanwen Deng, Praneetha Pratapa, Jessica Hammer, Amy Ogan, Jason Hong, Motahhare Eslami
Title: Investigating Youth AI Auditing
Abstract:
Youth are active users and stakeholders of artificial intelligence (AI), yet they are often not included in responsible AI (RAI) practices. Emerging efforts in RAI largely focus on adult populations, missing an opportunity to get unique perspectives of youth. This study explores the potential of youth (teens under the age of 18) to engage meaningfully in RAI, specifically through AI auditing. In a workshop study with 17 teens, we investigated how youth can actively identify problematic behaviors in youth-relevant ubiquitous AI (text-to-image generative AI, autocompletion in search bar, image search) and the impacts of supporting AI auditing with critical AI literacy scaffolding with guided discussion about AI ethics and an auditing tool. We found that youth can contribute quality insights, shaped by their expertise (e.g., hobbies and passions), lived experiences (e.g., social identities), and age-related knowledge (e.g., understanding of fast-moving trends). We discuss how empowering youth in AI auditing can result in more responsible AI, support their learning through doing, and lead to implications for including youth in various participatory RAI processes.

Authors:Lars Krupp, Daniel Geißler, Paul Lukowicz, Jakob Karolus
Title: Towards Sustainable Web Agents: A Plea for Transparency and Dedicated Metrics for Energy Consumption
Abstract:
Improvements in the area of large language models have shifted towards the construction of models capable of using external tools and interpreting their outputs. These so-called web agents have the ability to interact autonomously with the internet. This allows them to become powerful daily assistants handling time-consuming, repetitive tasks while supporting users in their daily activities. While web agent research is thriving, the sustainability aspect of this research direction remains largely unexplored. We provide an initial exploration of the energy and CO2 cost associated with web agents. Our results show how different philosophies in web agent creation can severely impact the associated expended energy. We highlight lacking transparency regarding the disclosure of model parameters and processes used for some web agents as a limiting factor when estimating energy consumption. As such, our work advocates a change in thinking when evaluating web agents, warranting dedicated metrics for energy consumption and sustainability.

Authors:Sitong Wang, Lydia B. Chilton
Title: Schemex: Discovering Design Patterns from Examples through Iterative Abstraction and Refinement
Abstract:
Expertise is often built by learning from examples. This process, known as schema induction, helps us identify patterns from examples. Despite its importance, schema induction remains a challenging cognitive task. Recent advances in generative AI reasoning capabilities offer new opportunities to support schema induction through human-AI collaboration. We present Schemex, an AI-powered workflow that enhances human schema induction through three stages: clustering, abstraction, and refinement via contrasting examples. We conducted an initial evaluation of Schemex through two real-world case studies: writing abstracts for HCI papers and creating news TikToks. Qualitative analysis demonstrates the high accuracy and usefulness of the generated schemas. We also discuss future work on developing more flexible methods for workflow construction to help humans focus on high-level thinking.

Authors:Parag Khanna, Nona Rajabi, Sumeyra U. Demir Kanik, Danica Kragic, Mårten Björkman, Christian Smith
Title: Early Detection of Human Handover Intentions in Human-Robot Collaboration: Comparing EEG, Gaze, and Hand Motion
Abstract:
Human-robot collaboration (HRC) relies on accurate and timely recognition of human intentions to ensure seamless interactions. Among common HRC tasks, human-to-robot object handovers have been studied extensively for planning the robot's actions during object reception, assuming the human intention for object handover. However, distinguishing handover intentions from other actions has received limited attention. Most research on handovers has focused on visually detecting motion trajectories, which often results in delays or false detections when trajectories overlap. This paper investigates whether human intentions for object handovers are reflected in non-movement-based physiological signals. We conduct a multimodal analysis comparing three data modalities: electroencephalogram (EEG), gaze, and hand-motion signals. Our study aims to distinguish between handover-intended human motions and non-handover motions in an HRC setting, evaluating each modality's performance in predicting and classifying these actions before and after human movement initiation. We develop and evaluate human intention detectors based on these modalities, comparing their accuracy and timing in identifying handover intentions. To the best of our knowledge, this is the first study to systematically develop and test intention detectors across multiple modalities within the same experimental context of human-robot handovers. Our analysis reveals that handover intention can be detected from all three modalities. Nevertheless, gaze signals are the earliest as well as the most accurate to classify the motion as intended for handover or non-handover.

Authors:Zongyu Chang, Feihong Lu, Ziqin Zhu, Qian Li, Cheng Ji, Zhuo Chen, Hao Peng, Yang Liu, Ruifeng Xu, Yangqiu Song, Shangguang Wang, Jianxin Li
Title: Bridging the Gap Between LLMs and Human Intentions: Progresses and Challenges in Instruction Understanding, Intention Reasoning, and Reliable Generation
Abstract:
Large language models (LLMs) have demonstrated exceptional capabilities in understanding and generation. However, when interacting with human instructions in real-world scenarios, LLMs still face significant challenges, particularly in accurately capturing and comprehending human instructions and intentions. This paper focuses on three challenges in LLM-based text generation tasks: instruction understanding, intention reasoning, and Reliable Dialog Generation. Regarding human complex instruction, LLMs have deficiencies in understanding long contexts and instructions in multi-round conversations. For intention reasoning, LLMs may have inconsistent command reasoning, difficulty reasoning about commands containing incorrect information, difficulty understanding user ambiguous language commands, and a weak understanding of user intention in commands. Besides, In terms of Reliable Dialog Generation, LLMs may have unstable generated content and unethical generation. To this end, we classify and analyze the performance of LLMs in challenging scenarios and conduct a comprehensive evaluation of existing solutions. Furthermore, we introduce benchmarks and categorize them based on the aforementioned three core challenges. Finally, we explore potential directions for future research to enhance the reliability and adaptability of LLMs in real-world applications.

Authors:Francesco Vona, Julia Schorlemmer, Michael Stern, Navid Ashrafi, Maurizio Vergari, Tanja Kojic, Jan-Niklas Voigt-Antons
Title: Comparing Pass-Through Quality of Mixed Reality Devices: A User Experience Study During Real-World Tasks
Abstract:
In extended reality, pass-through enables users to view their real-world surroundings via cameras on the headset, displaying live video inside the device. This study compared the pass-through quality of three devices: Apple Vision Pro, Meta Quest 3, and Varjo XR3. Thirtyone participants performed two tasks, reading a text and solving a puzzle, while using each headset with the pass-through feature activated. Participants then rated their experiences, focusing on workload and cybersickness. Results showed that the Apple Vision Pro outperformed the Meta Quest 3 and Varjo XR3, receiving the highest ratings for pass-through quality.

Authors:Sitong Wang, Jocelyn McKinnon-Crowley, Tao Long, Kian Loong Lua, Keren Henderson, Kevin Crowston, Jeffrey V. Nickerson, Mark Hansen, Lydia B. Chilton
Title: The Role of Human Creativity in the Presence of AI Creativity Tools at Work: A Case Study on AI-Driven Content Transformation in Journalism
Abstract:
As AI becomes more capable, it is unclear how human creativity will remain essential in jobs that incorporate AI. We conducted a 14-week study of a student newsroom using an AI tool to convert web articles into social media videos. Most creators treated the tool as a creative springboard, not as a completion mechanism. They edited the AI outputs. The tool enabled the team to publish successful content that received over 500,000 views. Human creativity remained essential: after AI produced templated outputs, creators took ownership of the task, injecting their own creativity, especially when AI failed to create appropriate content. AI was initially seen as an authority, due to creators' lack of experience, but they ultimately learned to assert their own authority.

Authors:Francesco Vona, Julia Schorlemmer, Jessica Stemann, Sebastian Fischer, Jan-Niklas Voigt-Antons
Title: Hands vs. Controllers: Comparing User Interactions in Virtual Reality Shopping Environments
Abstract:
Virtual reality enables users to experience real-life situations in immersive environments. Interaction methods significantly shape user experience, particularly in high fidelity simulations mimicking real world tasks. This study evaluates two primary VR interaction techniques, hand based and controller based, through virtual shopping tasks in a simulated supermarket with 40 participants. Hand-based interaction was preferred for its natural, immersive qualities and alignment with real-world gestures but faced usability challenges, including limited haptic feedback and grasping inefficiencies. In contrast, controller-based interaction offered greater precision and reliability, making it more suitable for tasks requiring fine motor skills.

Authors:Arthur Caetano, Kavya Verma, Atieh Taheri, Radha Kumaran, Zichen Chen, Jiaao Chen, Tobias Höllerer, Misha Sra
Title: Agentic Workflows for Conversational Human-AI Interaction Design
Abstract:
Conversational human-AI interaction (CHAI) have recently driven mainstream adoption of AI. However, CHAI poses two key challenges for designers and researchers: users frequently have ambiguous goals and an incomplete understanding of AI functionalities, and the interactions are brief and transient, limiting opportunities for sustained engagement with users. AI agents can help address these challenges by suggesting contextually relevant prompts, by standing in for users during early design testing, and by helping users better articulate their goals. Guided by research-through-design, we explored agentic AI workflows through the development and testing of a probe over four iterations with 10 users. We present our findings through an annotated portfolio of design artifacts, and through thematic analysis of user experiences, offering solutions to the problems of ambiguity and transient in CHAI. Furthermore, we examine the limitations and possibilities of these AI agent workflows, suggesting that similar collaborative approaches between humans and AI could benefit other areas of design.

Authors:Masudul Hasan Masud Bhuiyan, Matteo Varvello, Cristian-Alexandru Staicu, Yasir Zaki
Title: Non-Western Perspectives on Web Inclusivity: A Study of Accessibility Practices in the Global South
Abstract:
The Global South faces unique challenges in achieving digital inclusion due to a heavy reliance on mobile devices for internet access and the prevalence of slow or unreliable networks. While numerous studies have investigated web accessibility within specific sectors such as education, healthcare, and government services, these efforts have been largely constrained to individual countries or narrow contexts, leaving a critical gap in cross-regional, large-scale analysis. This paper addresses this gap by conducting the first large-scale comparative study of mobile web accessibility across the Global South. In this work, we evaluate 100,000 websites from 10 countries in the Global South to provide a comprehensive understanding of accessibility practices in these regions. Our findings reveal that websites from countries with strict accessibility regulations and enforcement tend to adhere better to Web Content Accessibility Guidelines (WCAG) guidelines. However, accessibility violations impact different disability groups in varying ways. Blind and low-vision individuals in the Global South are disproportionately affected, as only 40% of the evaluated websites meet critical accessibility guidelines. This significant shortfall is largely due to developers frequently neglecting to implement valid alt text for images and ARIA descriptions, which are essential specification mechanisms in the HTML standard for the effective operation of screen readers.

Authors:Ryutaro Kurai, Hikari Yanagawa, Yuichi Hiroi, Takefumi Hiraki
Title: MetaGadget: An Accessible Framework for IoT Integration into Commercial Metaverse Platforms
Abstract:
While the integration of IoT devices in virtual spaces is becoming increasingly common, technical barriers to controlling custom devices in multi-user Virtual Reality (VR) environments remain high, particularly limiting new applications in educational and prototyping settings. We propose MetaGadget, a framework for connecting IoT devices to commercial metaverse platforms that implements device control through HTTP-based event triggers without requiring persistent client connections. Through two workshops focused on smart home control and custom device integration, we explored the potential application of IoT connectivity in multi-user metaverse environments. Participants successfully implemented new interactions unique to the metaverse, such as environmental sensing and remote control systems that support simultaneous operation by multiple users, and reported positive feedback on the ease of system development. We verified that our framework provides a new approach to controlling IoT devices in the metaverse while reducing technical requirements, and provides a foundation for creative practice that connects multi-user VR environments and physical spaces.

Authors:Natalie Kiesler, Jacqueline Smith, Juho Leinonen, Armando Fox, Stephen MacNeil, Petri Ihantola
Title: The Role of Generative AI in Software Student CollaborAItion
Abstract:
Collaboration is a crucial part of computing education. The increase in AI capabilities over the last couple of years is bound to profoundly affect all aspects of systems and software engineering, including collaboration. In this position paper, we consider a scenario where AI agents would be able to take on any role in collaborative processes in computing education. We outline these roles, the activities and group dynamics that software development currently include, and discuss if and in what way AI could facilitate these roles and activities. The goal of our work is to envision and critically examine potential futures. We present scenarios suggesting how AI can be integrated into existing collaborations. These are contrasted by design fictions that help demonstrate the new possibilities and challenges for computing education in the AI era.

Authors:Dongyun Han, Anastasia Bezerianos, Petra Isenberg, Isaac Cho
Title: Perception of Visual Variables on Virtual Wall-Sized Tiled Displays in Immersive Environments
Abstract:
We investigate the perception of visual variables on wall-sized tiled displays within an immersive environment. We designed and conducted two formal user studies focusing on elementary visualization reading tasks in VR. The first study compared three different virtual display arrangements (Flat, Cylinder, and Cockpit). It showed that participants made smaller errors on virtual curved walls (Cylinder and Cockpit) compared to Flat. Following that, we compared the results with those from a previous study conducted in a real-world setting. The comparative analysis showed that virtual curved walls resulted in smaller errors than the real-world flat wall display, but with longer task completion time. The second study evaluated the impact of four 3D user interaction techniques (Selection, Walking, Steering, and Teleportation) on performing the elementary task on the virtual Flat wall display. The results confirmed that interaction techniques further improved task performance. Finally, we discuss the limitations and future work.

Authors:Vanessa Echeverria, Linxuan Zhao, Riordan Alfredo, Mikaela Milesi, Yuequiao Jin, Sophie Abel, Jie Fan, Lixiang Yan, Xinyu Li, Samantha Dix, Rosie Wotherspoon, Hollie Jaggard, Abra Osborne, Simon Buckingham Shum, Dragan Gasevic, Roberto Martinez-Maldonado
Title: TeamVision: An AI-powered Learning Analytics System for Supporting Reflection in Team-based Healthcare Simulation
Abstract:
Healthcare simulations help learners develop teamwork and clinical skills in a risk-free setting, promoting reflection on real-world practices through structured debriefs. However, despite video's potential, it is hard to use, leaving a gap in providing concise, data-driven summaries for supporting effective debriefing. Addressing this, we present TeamVision, an AI-powered multimodal learning analytics (MMLA) system that captures voice presence, automated transcriptions, body rotation, and positioning data, offering educators a dashboard to guide debriefs immediately after simulations. We conducted an in-the-wild study with 56 teams (221 students) and recorded debriefs led by six teachers using TeamVision. Follow-up interviews with 15 students and five teachers explored perceptions of its usefulness, accuracy, and trustworthiness. This paper examines: i) how TeamVision was used in debriefing, ii) what educators found valuable and challenging, and iii) perceptions of its effectiveness. Results suggest TeamVision enables flexible debriefing and highlights the challenges and implications of using AI-powered systems in healthcare simulation.

Authors:Arthur Caetano, Yunhao Luo, Adwait Sharma, Misha Sra
Title: GraspR: A Computational Model of Spatial User Preferences for Adaptive Grasp UI Design
Abstract:
Grasp User Interfaces (grasp UIs) enable dual-tasking in XR by allowing interaction with digital content while holding physical objects. However, current grasp UI design practices face a fundamental challenge: existing approaches either capture user preferences through labor-intensive elicitation studies that are difficult to scale or rely on biomechanical models that overlook subjective factors. We introduce GraspR, the first computational model that predicts user preferences for single-finger microgestures in grasp UIs. Our data-driven approach combines the scalability of computational methods with human preference modeling, trained on 1,520 preferences collected via a two-alternative forced choice paradigm across eight participants and four frequently used grasp variations. We demonstrate GraspR's effectiveness through a working prototype that dynamically adjusts interface layouts across four everyday tasks. We release both the dataset and code to support future research in adaptive grasp UIs.

Authors:Wesley Hanwen Deng, Wang Claire, Howard Ziyu Han, Jason I. Hong, Kenneth Holstein, Motahhare Eslami
Title: WeAudit: Scaffolding User Auditors and AI Practitioners in Auditing Generative AI
Abstract:
There has been growing interest from both practitioners and researchers in engaging end users in AI auditing, to draw upon users' unique knowledge and lived experiences. However, we know little about how to effectively scaffold end users in auditing in ways that can generate actionable insights for AI practitioners. Through formative studies with both users and AI practitioners, we first identified a set of design goals to support user-engaged AI auditing. We then developed WeAudit, a workflow and system that supports end users in auditing AI both individually and collectively. We evaluated WeAudit through a three-week user study with user auditors and interviews with industry Generative AI practitioners. Our findings offer insights into how WeAudit supports users in noticing and reflecting upon potential AI harms and in articulating their findings in ways that industry practitioners can act upon. Based on our observations and feedback from both users and practitioners, we identify several opportunities to better support user engagement in AI auditing processes. We discuss implications for future research to support effective and responsible user engagement in AI auditing and red-teaming.

Authors:Kristoffer Christensen, Bo Nørregaard Jørgensen, Zheng Grace Ma
Title: A Visualization Framework for Exploring Multi-Agent-Based Simulations Case Study of an Electric Vehicle Home Charging Ecosystem
Abstract:
Multi-agent-based simulations (MABS) of electric vehicle (EV) home charging ecosystems generate large, complex, and stochastic time-series datasets that capture interactions between households, grid infrastructure, and energy markets. These interactions can lead to unexpected system-level events, such as transformer overloads or consumer dissatisfaction, that are difficult to detect and explain through static post-processing. This paper presents a modular, Python-based dashboard framework, built using Dash by Plotly, that enables efficient, multi-level exploration and root-cause analysis of emergent behavior in MABS outputs. The system features three coordinated views (System Overview, System Analysis, and Consumer Analysis), each offering high-resolution visualizations such as time-series plots, spatial heatmaps, and agent-specific drill-down tools. A case study simulating full EV adoption with smart charging in a Danish residential network demonstrates how the dashboard supports rapid identification and contextual explanation of anomalies, including clustered transformer overloads and time-dependent charging failures. The framework facilitates actionable insight generation for researchers and distribution system operators, and its architecture is adaptable to other distributed energy resources and complex energy systems.

Authors:Abdul Basit, Maha Nawaz, Muhammad Shafique
Title: BRAVE: Brain-Controlled Prosthetic Arm with Voice Integration and Embodied Learning for Enhanced Mobility
Abstract:
Non-invasive brain-computer interfaces (BCIs) have the potential to enable intuitive control of prosthetic limbs for individuals with upper limb amputations. However, existing EEG-based control systems face challenges related to signal noise, classification accuracy, and real-time adaptability. In this work, we present BRAVE, a hybrid EEG and voice-controlled prosthetic system that integrates ensemble learning-based EEG classification with a human-in-the-loop (HITL) correction framework for enhanced responsiveness. Unlike traditional electromyography (EMG)-based prosthetic control, BRAVE aims to interpret EEG-driven motor intent, enabling movement control without reliance on residual muscle activity. To improve classification robustness, BRAVE combines LSTM, CNN, and Random Forest models in an ensemble framework, achieving a classification accuracy of 96% across test subjects. EEG signals are preprocessed using a bandpass filter (0.5-45 Hz), Independent Component Analysis (ICA) for artifact removal, and Common Spatial Pattern (CSP) feature extraction to minimize contamination from electromyographic (EMG) and electrooculographic (EOG) signals. Additionally, BRAVE incorporates automatic speech recognition (ASR) to facilitate intuitive mode switching between different degrees of freedom (DOF) in the prosthetic arm. The system operates in real time, with a response latency of 150 ms, leveraging Lab Streaming Layer (LSL) networking for synchronized data acquisition. The system is evaluated on an in-house fabricated prosthetic arm and on multiple participants highlighting the generalizability across users. The system is optimized for low-power embedded deployment, ensuring practical real-world application beyond high-performance computing environments. Our results indicate that BRAVE offers a promising step towards robust, real-time, non-invasive prosthetic control.

Authors:Yijia Shao, Humishka Zope, Yucheng Jiang, Jiaxin Pei, David Nguyen, Erik Brynjolfsson, Diyi Yang
Title: Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce
Abstract:
The rapid rise of compound AI systems (a.k.a., AI agents) is reshaping the labor market, raising concerns about job displacement, diminished human agency, and overreliance on automation. Yet, we lack a systematic understanding of the evolving landscape. In this paper, we address this gap by introducing a novel auditing framework to assess which occupational tasks workers want AI agents to automate or augment, and how those desires align with the current technological capabilities. Our framework features an audio-enhanced mini-interview to capture nuanced worker desires and introduces the Human Agency Scale (HAS) as a shared language to quantify the preferred level of human involvement. Using this framework, we construct the WORKBank database, building on the U.S. Department of Labor's O*NET database, to capture preferences from 1,500 domain workers and capability assessments from AI experts across over 844 tasks spanning 104 occupations. Jointly considering the desire and technological capability divides tasks in WORKBank into four zones: Automation "Green Light" Zone, Automation "Red Light" Zone, R&D Opportunity Zone, Low Priority Zone. This highlights critical mismatches and opportunities for AI agent development. Moving beyond a simple automate-or-not dichotomy, our results reveal diverse HAS profiles across occupations, reflecting heterogeneous expectations for human involvement. Moreover, our study offers early signals of how AI agent integration may reshape the core human competencies, shifting from information-focused skills to interpersonal ones. These findings underscore the importance of aligning AI agent development with human desires and preparing workers for evolving workplace dynamics.

Authors:Trisanth Srinivasan, Santosh Patapati, Himani Musku, Idhant Gode, Aditya Arora, Samvit Bhattacharya, Abubakr Nazriev, Sanika Hirave, Zaryab Kanjiani, Srinjoy Ghose
Title: DURA-CPS: A Multi-Role Orchestrator for Dependability Assurance in LLM-Enabled Cyber-Physical Systems
Abstract:
Cyber-Physical Systems (CPS) increasingly depend on advanced AI techniques to operate in critical applications. However, traditional verification and validation methods often struggle to handle the unpredictable and dynamic nature of AI components. In this paper, we introduce DURA-CPS, a novel framework that employs multi-role orchestration to automate the iterative assurance process for AI-powered CPS. By assigning specialized roles (e.g., safety monitoring, security assessment, fault injection, and recovery planning) to dedicated agents within a simulated environment, DURA-CPS continuously evaluates and refines AI behavior against a range of dependability requirements. We demonstrate the framework through a case study involving an autonomous vehicle navigating an intersection with an AI-based planner. Our results show that DURA-CPS effectively detects vulnerabilities, manages performance impacts, and supports adaptive recovery strategies, thereby offering a structured and extensible solution for rigorous V&V in safety- and security-critical systems.

Authors:Hongbin Wang, Zhihong Jia, Yuanzhong Shen, Ziwei Wang, Siyang Li, Kai Shu, Feng Hu, Dongrui Wu
Title: SACM: SEEG-Audio Contrastive Matching for Chinese Speech Decoding
Abstract:
Speech disorders such as dysarthria and anarthria can severely impair the patient's ability to communicate verbally. Speech decoding brain-computer interfaces (BCIs) offer a potential alternative by directly translating speech intentions into spoken words, serving as speech neuroprostheses. This paper reports an experimental protocol for Mandarin Chinese speech decoding BCIs, along with the corresponding decoding algorithms. Stereo-electroencephalography (SEEG) and synchronized audio data were collected from eight drug-resistant epilepsy patients as they conducted a word-level reading task. The proposed SEEG and Audio Contrastive Matching (SACM), a contrastive learning-based framework, achieved decoding accuracies significantly exceeding chance levels in both speech detection and speech decoding tasks. Electrode-wise analysis revealed that a single sensorimotor cortex electrode achieved performance comparable to that of the full electrode array. These findings provide valuable insights for developing more accurate online speech decoding BCIs.

Authors:Shenghui Chen, Po-han Li, Sandeep Chinchali, Ufuk Topcu
Title: VIBE: Annotation-Free Video-to-Text Information Bottleneck Evaluation for TL;DR
Abstract:
Many decision-making tasks, where both accuracy and efficiency matter, still require human supervision. For example, tasks like traffic officers reviewing hour-long dashcam footage or researchers screening conference videos can benefit from concise summaries that reduce cognitive load and save time. Yet current vision-language models (VLMs) often produce verbose, redundant outputs that hinder task performance. Existing video caption evaluation depends on costly human annotations and overlooks the summaries' utility in downstream tasks. We address these gaps with Video-to-text Information Bottleneck Evaluation (VIBE), an annotation-free method that scores VLM outputs using two metrics: grounding (how well the summary aligns with visual content) and utility (how informative it is for the task). VIBE selects from randomly sampled VLM outputs by ranking them according to the two scores to support effective human decision-making. Human studies on LearningPaper24, SUTD-TrafficQA, and LongVideoBench show that summaries selected by VIBE consistently improve performance-boosting task accuracy by up to 61.23% and reducing response time by 75.77% compared to naive VLM summaries or raw video.

Authors:Glaucia Melo, Paulo Alencar, Donald Cowan
Title: Enhancing Software Development with Context-Aware Conversational Agents: A User Study on Developer Interactions with Chatbots
Abstract:
Software development is a cognitively intensive process requiring multitasking, adherence to evolving workflows, and continuous learning. With the rise of large language model (LLM)-based tools, such as conversational agents (CAs), there is growing interest in supporting developers through natural language interaction. However, little is known about the specific features developers seek in these systems. We conducted a user study with 29 developers using a prototype text-based chatbot to investigate preferred functionalities. Our findings reveal strong interest in task automation, version control support, and contextual adaptability, especially the need to tailor assistance for both novice and experienced users. We highlight the importance of deep contextual understanding, historical interaction awareness, and personalized support in CA design. This study contributes to the development of context-aware chatbots that enhance productivity and satisfaction, and it outlines opportunities for future research on human-AI collaboration in software engineering.

Authors:Sean Kille, Jan Heinrich Robens, Philipp Dahlinger, Alejandra Rodriguez-Velasquez, Simon Rothfuß, Balint Varga, Andreas Lindenmann, Gerhard Neumann, Sven Matthiesen, Andrea Kiesel, Sören Hohmann
Title: Beyond Task Performance: Human Experience in Human-Robot Collaboration
Abstract:
Human interaction experience plays a crucial role in the effectiveness of human-machine collaboration, especially as interactions in future systems progress towards tighter physical and functional integration. While automation design has been shown to impact task performance, its influence on human experience metrics such as flow, sense of agency (SoA), and embodiment remains underexplored. This study investigates how variations in automation design affect these psychological experience measures and examines correlations between subjective experience and physiological indicators. A user study was conducted in a simulated wood workshop, where participants collaborated with a lightweight robot under four automation levels. The results of the study indicate that medium automation levels enhance flow, SoA and embodiment, striking a balance between support and user autonomy. In contrast, higher automation, despite optimizing task performance, diminishes perceived flow and agency. Furthermore, we observed that grip force might be considered as a real-time proxy of SoA, while correlations with heart rate variability were inconclusive. The findings underscore the necessity for automation strategies that integrate human- centric metrics, aiming to optimize both performance and user experience in collaborative robotic systems

Authors:Marta Moscati, Darius Afchar, Markus Schedl, Bruno Sguerra
Title: Familiarizing with Music: Discovery Patterns for Different Music Discovery Needs
Abstract:
Humans have the tendency to discover and explore. This natural tendency is reflected in data from streaming platforms as the amount of previously unknown content accessed by users. Additionally, in domains such as that of music streaming there is evidence that recommending novel content improves users' experience with the platform. Therefore, understanding users' discovery patterns, such as the amount to which and the way users access previously unknown content, is a topic of relevance for both the scientific community and the streaming industry, particularly the music one. Previous works studied how music consumption differs for users of different traits and looked at diversity, novelty, and consistency over time of users' music preferences. However, very little is known about how users discover and explore previously unknown music, and how this behavior differs for users of varying discovery needs. In this paper we bridge this gap by analyzing data from a survey answered by users of the major music streaming platform Deezer in combination with their streaming data. We first address questions regarding whether users who declare a higher interest in unfamiliar music listen to more diverse music, have more stable music preferences over time, and explore more music within a same time window, compared to those who declare a lower interest. We then investigate which type of music tracks users choose to listen to when they explore unfamiliar music, identifying clear patterns of popularity and genre representativeness that vary for users of different discovery needs. Our findings open up possibilities to infer users' interest in unfamiliar music from streaming data as well as possibilities to develop recommender systems that guide users in exploring music in a more natural way.

Authors:Xiaoshan Huang, Haolun Wu, Xue Liu, Susanne P. Lajoie
Title: What Makes Teamwork Work? A Multimodal Case Study on Emotions and Diagnostic Expertise in an Intelligent Tutoring System
Abstract:
Teamwork is pivotal in medical teamwork when professionals with diverse skills and emotional states collaborate to make critical decisions. This case study examines the interplay between emotions and professional skills in group decision-making during collaborative medical diagnosis within an Intelligent Tutoring System (ITS). By comparing verbal and physiological data between high-performing and low-performing teams of medical professionals working on a patient case within the ITS, alongside individuals' retrospective collaboration experiences, we employ multimodal data analysis to identify patterns in team emotional climate and their impact on diagnostic efficiency. Specifically, we investigate how emotion-driven dialogue and professional expertise influence both the information-seeking process and the final diagnostic decisions. Grounded in the socially shared regulation of learning framework and utilizing sentiment analysis, we found that social-motivational interactions are key drivers of a positive team emotional climate. Furthermore, through content analysis of dialogue and physiological signals to pinpoint emotional fluctuations, we identify episodes where knowledge exchange and skill acquisition are most likely to occur. Our findings offer valuable insights into optimizing group collaboration in medical contexts by harmonizing emotional dynamics with adaptive strategies for effective decision-making, ultimately enhancing diagnostic accuracy and teamwork effectiveness.

Authors:Neil K. R. Sehgal, Sunny Rai, Manuel Tonneau, Anish K. Agarwal, Joseph Cappella, Melanie Kornides, Lyle Ungar, Alison Buttenheim, Sharath Chandra Guntuku
Title: Conversations with AI Chatbots Increase Short-Term Vaccine Intentions But Do Not Outperform Standard Public Health Messaging
Abstract:
Large language model (LLM) based chatbots show promise in persuasive communication, but existing studies often rely on weak controls or focus on belief change rather than behavioral intentions or outcomes. This pre-registered multi-country (US, Canada, UK) randomized controlled trial involving 930 vaccine-hesitant parents evaluated brief (three-minute) multi-turn conversations with LLM-based chatbots against standard public health messaging approaches for increasing human papillomavirus (HPV) vaccine intentions for their children. Participants were randomly assigned to: (1) a weak control (no message), (2) a strong control reflecting the standard of care (reading official public health materials), or (3 and 4) one of two chatbot conditions. One chatbot was prompted to deliver short, conversational responses, while the other used the model's default output style (longer with bullet points). While chatbot interactions significantly increased self-reported vaccination intent (by 7.1-10.3 points on a 100-point scale) compared to no message, they did not outperform standard public health materials, with the conversational chatbot performing significantly worse. Additionally, while the short-term effects of chatbot interactions faded during a 15-day follow-up, the effects of public health material persisted through a 45-day follow-up relative to no message. These findings suggest that while LLMs can effectively shift vaccination intentions in the short-term, their incremental value over existing public health communications is questionable, offering a more tempered view of their persuasive capabilities and highlighting the importance of integrating AI-driven tools alongside, rather than replacing, current public health strategies.

Authors:Joseph Lee, Tianqi Shang, Jae Young Baik, Duy Duong-Tran, Shu Yang, Lingyao Li, Li Shen
Title: From Promising Capability to Pervasive Bias: Assessing Large Language Models for Emergency Department Triage
Abstract:
Large Language Models (LLMs) have shown promise in clinical decision support, yet their application to triage remains underexplored. We systematically investigate the capabilities of LLMs in emergency department triage through two key dimensions: (1) robustness to distribution shifts and missing data, and (2) counterfactual analysis of intersectional biases across sex and race. We assess multiple LLM-based approaches, ranging from continued pre-training to in-context learning, as well as machine learning approaches. Our results indicate that LLMs exhibit superior robustness, and we investigate the key factors contributing to the promising LLM-based approaches. Furthermore, in this setting, we identify gaps in LLM preferences that emerge in particular intersections of sex and race. LLMs generally exhibit sex-based differences, but they are most pronounced in certain racial groups. These findings suggest that LLMs encode demographic preferences that may emerge in specific clinical contexts or particular combinations of characteristics.

Authors:Jinwen Tang, Songxi Chen, Yi Shang
Title: TigerGPT: A New AI Chatbot for Adaptive Campus Climate Surveys
Abstract:
Campus climate surveys play a pivotal role in capturing how students, faculty, and staff experience university life, yet traditional methods frequently suffer from low participation and minimal follow-up. We present TigerGPT, a new AI chatbot that generates adaptive, context-aware dialogues enriched with visual elements. Through real-time follow-up prompts, empathetic messaging, and flexible topic selection, TigerGPT elicits more in-depth feedback compared to traditional static survey forms. Based on established principles of conversational design, the chatbot employs empathetic cues, bolded questions, and user-driven topic selection. It retains some role-based efficiency (e.g., collecting user role through quick clicks) but goes beyond static scripts by employing GenAI adaptiveness. In a pilot study with undergraduate students, we collected both quantitative metrics (e.g., satisfaction ratings) and qualitative insights (e.g., written comments). Most participants described TigerGPT as engaging and user-friendly; about half preferred it over conventional surveys, attributing this preference to its personalized conversation flow and supportive tone. The findings indicate that an AI survey chatbot is promising in gaining deeper insight into campus climate.

Authors:Evans Xu Han, Alice Qian Zhang, Haiyi Zhu, Hong Shen, Paul Pu Liang, Jane Hsieh
Title: POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation
Abstract:
State-of-the-art visual generative AI tools hold immense potential to assist users in the early ideation stages of creative tasks -- offering the ability to generate (rather than search for) novel and unprecedented (instead of existing) images of considerable quality that also adhere to boundless combinations of user specifications. However, many large-scale text-to-image systems are designed for broad applicability, yielding conventional output that may limit creative exploration. They also employ interaction methods that may be difficult for beginners. Given that creative end users often operate in diverse, context-specific ways that are often unpredictable, more variation and personalization are necessary. We introduce POET, a real-time interactive tool that (1) automatically discovers dimensions of homogeneity in text-to-image generative models, (2) expands these dimensions to diversify the output space of generated images, and (3) learns from user feedback to personalize expansions. An evaluation with 28 users spanning four creative task domains demonstrated POET's ability to generate results with higher perceived diversity and help users reach satisfaction in fewer prompts during creative tasks, thereby prompting them to deliberate and reflect more on a wider range of possible produced results during the co-creative process. Focusing on visual creativity, POET offers a first glimpse of how interaction techniques of future text-to-image generation tools may support and align with more pluralistic values and the needs of end users during the ideation stages of their work.

Authors:Pooja S. B. Rao, Sanja Šćepanović, Ke Zhou, Edyta Paulina Bogucka, Daniele Quercia
Title: RiskRAG: A Data-Driven Solution for Improved AI Model Risk Reporting
Abstract:
Risk reporting is essential for documenting AI models, yet only 14% of model cards mention risks, out of which 96% copying content from a small set of cards, leading to a lack of actionable insights. Existing proposals for improving model cards do not resolve these issues. To address this, we introduce RiskRAG, a Retrieval Augmented Generation based risk reporting solution guided by five design requirements we identified from literature, and co-design with 16 developers: identifying diverse model-specific risks, clearly presenting and prioritizing them, contextualizing for real-world uses, and offering actionable mitigation strategies. Drawing from 450K model cards and 600 real-world incidents, RiskRAG pre-populates contextualized risk reports. A preliminary study with 50 developers showed that they preferred RiskRAG over standard model cards, as it better met all the design requirements. A final study with 38 developers, 40 designers, and 37 media professionals showed that RiskRAG improved their way of selecting the AI model for a specific application, encouraging a more careful and deliberative decision-making. The RiskRAG project page is accessible at: https://social-dynamics.net/ai-risks/card.

Authors:Qianou Ma, Dora Zhao, Xinran Zhao, Chenglei Si, Chenyang Yang, Ryan Louie, Ehud Reiter, Diyi Yang, Tongshuang Wu
Title: SPHERE: An Evaluation Card for Human-AI Systems
Abstract:
In the era of Large Language Models (LLMs), establishing effective evaluation methods and standards for diverse human-AI interaction systems is increasingly challenging. To encourage more transparent documentation and facilitate discussion on human-AI system evaluation design options, we present an evaluation card SPHERE, which encompasses five key dimensions: 1) What is being evaluated?; 2) How is the evaluation conducted?; 3) Who is participating in the evaluation?; 4) When is evaluation conducted?; 5) How is evaluation validated? We conduct a review of 39 human-AI systems using SPHERE, outlining current evaluation practices and areas for improvement. We provide three recommendations for improving the validity and rigor of evaluation practices.

Authors:Hilda Hadan, Reza Hadi Mogavi, Leah Zhang-Kennedy, Lennart E. Nacke
Title: Who is Responsible When AI Fails? Mapping Causes, Entities, and Consequences of AI Privacy and Ethical Incidents
Abstract:
The rapid growth of artificial intelligence (AI) technologies has raised major privacy and ethical concerns. However, existing AI incident taxonomies and guidelines lack grounding in real-world cases, limiting their effectiveness for prevention and mitigation. We analyzed 202 real-world AI privacy and ethical incidents to develop a taxonomy that classifies them across AI lifecycle stages and captures contributing factors, including causes, responsible entities, sources of disclosure, and impacts. Our findings reveal widespread harms from poor organizational decisions and legal non-compliance, limited corrective interventions, and rare reporting from AI developers and adopting entities. Our taxonomy offers a structured approach for systematic incident reporting and emphasizes the weaknesses of current AI governance frameworks. Our findings provide actionable guidance for policymakers and practitioners to strengthen user protections, develop targeted AI policies, enhance reporting practices, and foster responsible AI governance and innovation, especially in contexts such as social media and child protection.

Authors:Hilda Hadan, Leah Zhang-Kennedy, Lennart E. Nacke
Title: Computer-based Deceptive Game Design in Commercial Virtual Reality Games: A Preliminary Investigation
Abstract:
As Virtual Reality (VR) games become more popular, it is crucial to understand how deceptive game design patterns manifest and impact player experiences in this emerging medium. Our study sheds light on the presence and effects of manipulative design techniques in commercial VR games compared to a traditional computer game. We conducted an autoethnography study and developed a VR Deceptive Game Design Assessment Guide based on a critical literature review. Using our guide, we compared how deceptive patterns in a popular computer game are different from two commercial VR titles. While VR's technological constraints, such as battery life and limited temporal manipulation, VR's unique sensory immersion amplified the impact of emotional and sensory deception. Current VR games showed similar but evolved forms of deceptive design compared to the computer game. We forecast more sophisticated player manipulation as VR technology advances. Our findings contribute to a better understanding of how deceptive game design persists and escalates in VR. We highlight the urgent need to develop ethical design guidelines for the rapidly advancing VR games industry.

Authors:Ziming Cheng, Zhiyuan Huang, Junting Pan, Zhaohui Hou, Mingjie Zhan
Title: Navi-plus: Managing Ambiguous GUI Navigation Tasks with Follow-up Questions
Abstract:
Graphical user interfaces (GUI) automation agents are emerging as powerful tools, enabling humans to accomplish increasingly complex tasks on smart devices. However, users often inadvertently omit key information when conveying tasks, which hinders agent performance in the current agent paradigm that does not support immediate user intervention. To address this issue, we introduce a $\textbf{Self-Correction GUI Navigation}$ task that incorporates interactive information completion capabilities within GUI agents. We developed the $\textbf{Navi-plus}$ dataset with GUI follow-up question-answer pairs, alongside a $\textbf{Dual-Stream Trajectory Evaluation}$ method to benchmark this new capability. Our results show that agents equipped with the ability to ask GUI follow-up questions can fully recover their performance when faced with ambiguous user tasks.

Authors:Hilda Hadan, Sabrina Alicia Sgandurra, Leah Zhang-Kennedy, Lennart E. Nacke
Title: From Motivating to Manipulative: The Use of Deceptive Design in a Game's Free-to-Play Transition
Abstract:
Over the last decade, the free-to-play (F2P) game business model has gained popularity in the games industry. We examine the role of deceptive design during a game's transition to F2P and its impacts on players. Our analysis focuses on game mechanics and a Reddit analysis of the Overwatch (OW) series after it transitioned to an F2P model. Our study identifies nine game mechanics that use deceptive design patterns. We also identify factors contributing to a negative gameplay experience. Business model transitions in games present possibilities for problematic practices. Our findings identify the need for game developers and publishers to balance player investments and fairness of rewards. A game's successful transition depends on maintaining fundamental components of player motivation and ensuring transparent communication. Compared to existing taxonomies in other media, games need a comprehensive classification of deceptive design. We emphasize the importance of understanding player perceptions and the impact of deceptive practices in future research.

Authors:Hilda Hadan, Lydia Choong, Leah Zhang-Kennedy, Lennart E. Nacke
Title: Deceived by Immersion: A Systematic Analysis of Deceptive Design in Extended Reality
Abstract:
The well-established deceptive design literature has focused on conventional user interfaces. With the rise of extended reality (XR), understanding deceptive design's unique manifestations in this immersive domain is crucial. However, existing research lacks a full, cross-disciplinary analysis that analyzes how XR technologies enable new forms of deceptive design. Our study reviews the literature on deceptive design in XR environments. We use thematic synthesis to identify key themes. We found that XR's immersive capabilities and extensive data collection enable subtle and powerful manipulation strategies. We identified eight themes outlining these strategies and discussed existing countermeasures. Our findings show the unique risks of deceptive design in XR, highlighting implications for researchers, designers, and policymakers. We propose future research directions that explore unintentional deceptive design, data-driven manipulation solutions, user education, and the link between ethical design and policy regulations.

Authors:Hilda Hadan, Derrick M. Wang, Lennart E. Nacke, Leah Zhang-Kennedy
Title: Privacy in Immersive Extended Reality: Exploring User Perceptions, Concerns, and Coping Strategies
Abstract:
Extended Reality (XR) technology is changing online interactions, but its granular data collection sensors may be more invasive to user privacy than web, mobile, and the Internet of Things technologies. Despite an increased interest in studying developers' concerns about XR device privacy, user perceptions have rarely been addressed. We surveyed 464 XR users to assess their awareness, concerns, and coping strategies around XR data in 18 scenarios. Our findings demonstrate that many factors, such as data types and sensitivity, affect users' perceptions of privacy in XR. However, users' limited awareness of XR sensors' granular data collection capabilities, such as involuntary body signals of emotional responses, restricted the range of privacy-protective strategies they used. Our results highlight a need to enhance users' awareness of data privacy threats in XR, design privacy-choice interfaces tailored to XR environments, and develop transparent XR data practices.

Authors:Hilda Hadan, Sabrina A. Sgandurra, Leah Zhang-Kennedy, Lennart E. Nacke
Title: Culture Clash: When Deceptive Design Meets Diverse Player Expectations
Abstract:
Deceptive game designs that manipulate players are increasingly common in the gaming industry, but the impact on players is not well studied. While studies have revealed player frustration, there is a gap in understanding how cultural attributes affect the impact of deceptive design in games. This paper proposes a new research direction on the connection between the representation of culture in games and player response to deceptive designs. We believe that understanding the interplay between cultural attributes and deceptive design can inform the creation of games that are ethical and entertaining for players around the globe.

Authors:Trisanth Srinivasan, Santosh Patapati
Title: WebNav: An Intelligent Agent for Voice-Controlled Web Navigation
Abstract:
The current state of modern web interfaces, especially in regards to accessibility focused usage is extremely lacking. Traditional methods for web interaction, such as scripting languages and screen readers, often lack the flexibility to handle dynamic content or the intelligence to interpret high-level user goals. To address these limitations, we introduce WebNav, a novel agent for multi-modal web navigation. WebNav leverages a dual Large Language Model (LLM) architecture to translate natural language commands into precise, executable actions on a graphical user interface. The system combines vision-based context from screenshots with a dynamic DOM-labeling browser extension to robustly identify interactive elements. A high-level 'Controller' LLM strategizes the next step toward a user's goal, while a second 'Assistant' LLM generates the exact parameters for execution. This separation of concerns allows for sophisticated task decomposition and action formulation. Our work presents the complete architecture and implementation of WebNav, demonstrating a promising approach to creating more intelligent web automation agents.

Authors:Neil K. R. Sehgal, Hita Kambhamettu, Sai Preethi Matam, Lyle Ungar, Sharath Chandra Guntuku
Title: Exploring Socio-Cultural Challenges and Opportunities in Designing Mental Health Chatbots for Adolescents in India
Abstract:
Mental health challenges among Indian adolescents are shaped by unique cultural and systemic barriers, including high social stigma and limited professional support. Through a mixed-methods study involving a survey of 278 adolescents and follow-up interviews with 12 participants, we explore how adolescents perceive mental health challenges and interact with digital tools. Quantitative results highlight low self-stigma but significant social stigma, a preference for text over voice interactions, and low utilization of mental health apps but high smartphone access. Our qualitative findings reveal that while adolescents value privacy, emotional support, and localized content in mental health tools, existing chatbots lack personalization and cultural relevance. These findings inform recommendations for culturally sensitive chatbot design that prioritizes anonymity, tailored support, and localized resources to better meet the needs of adolescents in India. This work advances culturally sensitive chatbot design by centering underrepresented populations, addressing critical gaps in accessibility and support for adolescents in India.

Authors:Hengjie Yu, Shuya Liu, Haiyun Yang, Yuping Yan, Maozhen Qu, Yaochu Jin
Title: Unlocking the Potential of AI Researchers in Scientific Discovery: What Is Missing?
Abstract:
The potential of AI researchers in scientific discovery remains largely untapped. Over the past decade, AI for Science (AI4Science) publications in 145 Nature Index journals have increased fifteen-fold, yet they still account for less than 3% of the total publications. Drawing upon the Diffusion of Innovation theory, we project AI4Science's share of total publications to rise from 2.72% in 2024 to approximately 20% by 2050. Achieving this shift requires fully harnessing the potential of AI researchers, as nearly 95% of AI-driven research in these journals is led by experimental scientists. To facilitate this, we propose structured workflows and strategic interventions to position AI researchers at the forefront of scientific discovery. Specifically, we identify three critical pathways: equipping experimental scientists with accessible AI tools to amplify the impact of AI researchers, bridging cognitive and methodological gaps to enable more direct involvement in scientific discovery, and proactively fostering a thriving AI-driven scientific ecosystem. By addressing these challenges, we aim to empower AI researchers as key drivers of future scientific breakthroughs.

Authors:Dingkun Liu, Siyang Li, Ziwei Wang, Wei Li, Dongrui Wu
Title: Spatial Distillation based Distribution Alignment (SDDA) for Cross-Headset EEG Classification
Abstract:
A non-invasive brain-computer interface (BCI) enables direct interaction between the user and external devices, typically via electroencephalogram (EEG) signals. However, decoding EEG signals across different headsets remains a significant challenge due to differences in the number and locations of the electrodes. To address this challenge, we propose a spatial distillation based distribution alignment (SDDA) approach for heterogeneous cross-headset transfer in non-invasive BCIs. SDDA uses first spatial distillation to make use of the full set of electrodes, and then input/feature/output space distribution alignments to cope with the significant differences between the source and target domains. To our knowledge, this is the first work to use knowledge distillation in cross-headset transfers. Extensive experiments on six EEG datasets from two BCI paradigms demonstrated that SDDA achieved superior performance in both offline unsupervised domain adaptation and online supervised domain adaptation scenarios, consistently outperforming 10 classical and state-of-the-art transfer learning algorithms.

Authors:Jeanette Falk, Yiyi Chen, Janet Rafner, Mike Zhang, Johannes Bjerva, Alexander Nolte
Title: How Do Hackathons Foster Creativity? Towards AI Collaborative Evaluation of Creativity at Scale
Abstract:
Hackathons have become popular collaborative events for accelerating the development of creative ideas and prototypes. There are several case studies showcasing creative outcomes across domains such as industry, education, and research. However, there are no large-scale studies on creativity in hackathons which can advance theory on how hackathon formats lead to creative outcomes. We conducted a computational analysis of 193,353 hackathon projects. By operationalizing creativity through usefulness and novelty, we refined our dataset to 10,363 projects, allowing us to analyze how participant characteristics, collaboration patterns, and hackathon setups influence the development of creative projects. The contribution of our paper is twofold: We identified means for organizers to foster creativity in hackathons. We also explore the use of large language models (LLMs) to augment the evaluation of creative outcomes and discuss challenges and opportunities of doing this, which has implications for creativity research at large.

Authors:Xiaowei Jiang, Yanan Chen, Nikhil Ranjan Pal, Yu-Cheng Chang, Yunkai Yang, Thomas Do, Chin-Teng Lin
Title: Interpretable Dual-Filter Fuzzy Neural Networks for Affective Brain-Computer Interfaces
Abstract:
Fuzzy logic provides a robust framework for enhancing explainability, particularly in domains requiring the interpretation of complex and ambiguous signals, such as brain-computer interface (BCI) systems. Despite significant advances in deep learning, interpreting human emotions remains a formidable challenge. In this work, we present iFuzzyAffectDuo, a novel computational model that integrates a dual-filter fuzzy neural network architecture for improved detection and interpretation of emotional states from neuroimaging data. The model introduces a new membership function (MF) based on the Laplace distribution, achieving superior accuracy and interpretability compared to traditional approaches. By refining the extraction of neural signals associated with specific emotions, iFuzzyAffectDuo offers a human-understandable framework that unravels the underlying decision-making processes. We validate our approach across three neuroimaging datasets using functional Near-Infrared Spectroscopy (fNIRS) and Electroencephalography (EEG), demonstrating its potential to advance affective computing. These findings open new pathways for understanding the neural basis of emotions and their application in enhancing human-computer interaction.

Authors:Yinxu Tang, Stylianos Loukas Vasileiou, William Yeoh
Title: Does Your AI Agent Get You? A Personalizable Framework for Approximating Human Models from Argumentation-based Dialogue Traces
Abstract:
Explainable AI is increasingly employing argumentation methods to facilitate interactive explanations between AI agents and human users. While existing approaches typically rely on predetermined human user models, there remains a critical gap in dynamically learning and updating these models during interactions. In this paper, we present a framework that enables AI agents to adapt their understanding of human users through argumentation-based dialogues. Our approach, called Persona, draws on prospect theory and integrates a probability weighting function with a Bayesian belief update mechanism that refines a probability distribution over possible human models based on exchanged arguments. Through empirical evaluations with human users in an applied argumentation setting, we demonstrate that Persona effectively captures evolving human beliefs, facilitates personalized interactions, and outperforms state-of-the-art methods.

Authors:Naiming Liu, Shashank Sonkar, Richard G. Baraniuk
Title: Do LLMs Make Mistakes Like Students? Exploring Natural Alignment between Language Models and Human Error Patterns
Abstract:
Large Language Models (LLMs) have demonstrated remarkable capabilities in various educational tasks, yet their alignment with human learning patterns, particularly in predicting which incorrect options students are most likely to select in multiple-choice questions (MCQs), remains underexplored. Our work investigates the relationship between LLM generation likelihood and student response distributions in MCQs with a specific focus on distractor selections. We collect a comprehensive dataset of MCQs with real-world student response distributions to explore two fundamental research questions: (1). RQ1 - Do the distractors that students more frequently select correspond to those that LLMs assign higher generation likelihood to? (2). RQ2 - When an LLM selects a incorrect choice, does it choose the same distractor that most students pick? Our experiments reveals moderate correlations between LLM-assigned probabilities and student selection patterns for distractors in MCQs. Additionally, when LLMs make mistakes, they are more likley to select the same incorrect answers that commonly mislead students, which is a pattern consistent across both small and large language models. Our work provides empirical evidence that despite LLMs' strong performance on generating educational content, there remains a gap between LLM's underlying reasoning process and human cognitive processes in identifying confusing distractors. Our findings also have significant implications for educational assessment development. The smaller language models could be efficiently utilized for automated distractor generation as they demonstrate similar patterns in identifying confusing answer choices as larger language models. This observed alignment between LLMs and student misconception patterns opens new opportunities for generating high-quality distractors that complement traditional human-designed distractors.

Authors:Shashank Sonkar, Naiming Liu, Xinghe Chen, Richard G. Baraniuk
Title: The Imitation Game for Educational AI
Abstract:
As artificial intelligence systems become increasingly prevalent in education, a fundamental challenge emerges: how can we verify if an AI truly understands how students think and reason? Traditional evaluation methods like measuring learning gains require lengthy studies confounded by numerous variables. We present a novel evaluation framework based on a two-phase Turing-like test. In Phase 1, students provide open-ended responses to questions, revealing natural misconceptions. In Phase 2, both AI and human experts, conditioned on each student's specific mistakes, generate distractors for new related questions. By analyzing whether students select AI-generated distractors at rates similar to human expert-generated ones, we can validate if the AI models student cognition. We prove this evaluation must be conditioned on individual responses - unconditioned approaches merely target common misconceptions. Through rigorous statistical sampling theory, we establish precise requirements for high-confidence validation. Our research positions conditioned distractor generation as a probe into an AI system's fundamental ability to model student thinking - a capability that enables adapting tutoring, feedback, and assessments to each student's specific needs.

Authors:Myra Cheng, Su Lin Blodgett, Alicia DeVrio, Lisa Egede, Alexandra Olteanu
Title: Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems
Abstract:
As text generation systems' outputs are increasingly anthropomorphic -- perceived as human-like -- scholars have also increasingly raised concerns about how such outputs can lead to harmful outcomes, such as users over-relying or developing emotional dependence on these systems. How to intervene on such system outputs to mitigate anthropomorphic behaviors and their attendant harmful outcomes, however, remains understudied. With this work, we aim to provide empirical and theoretical grounding for developing such interventions. To do so, we compile an inventory of interventions grounded both in prior literature and a crowdsourcing study where participants edited system outputs to make them less human-like. Drawing on this inventory, we also develop a conceptual framework to help characterize the landscape of possible interventions, articulate distinctions between different types of interventions, and provide a theoretical basis for evaluating the effectiveness of different interventions.

Authors:Tejas Srinivasan, Jesse Thomason
Title: Adjust for Trust: Mitigating Trust-Induced Inappropriate Reliance on AI Assistance
Abstract:
Trust biases how users rely on AI recommendations in AI-assisted decision-making tasks, with low and high levels of trust resulting in increased under- and over-reliance, respectively. We propose that AI assistants should adapt their behavior through trust-adaptive interventions to mitigate such inappropriate reliance. For instance, when user trust is low, providing an explanation can elicit more careful consideration of the assistant's advice by the user. In two decision-making scenarios -- laypeople answering science questions and doctors making medical diagnoses -- we find that providing supporting and counter-explanations during moments of low and high trust, respectively, yields up to 38% reduction in inappropriate reliance and 20% improvement in decision accuracy. We are similarly able to reduce over-reliance by adaptively inserting forced pauses to promote deliberation. Our results highlight how AI adaptation to user trust facilitates appropriate reliance, presenting exciting avenues for improving human-AI collaboration.

Authors:Yu Zhang, Yi Wen, Siying Hu, Zhicong Lu
Title: SpeechCap: Leveraging Playful Impact Captions to Facilitate Interpersonal Communication in Social Virtual Reality
Abstract:
Social Virtual Reality (VR) emerges as a promising platform bringing immersive, interactive, and engaging mechanisms for collaborative activities in virtual spaces. However, interpersonal communication in social VR is still limited with existing mediums and channels. To bridge the gap, we propose a novel method for mediating real-time conversation in social VR, which uses impact captions, a type of typographic visual effect widely used in videos, to convey both verbal and non-verbal information. We first investigated the design space of impact captions by content analysis and a co-design session with four experts. Next, we implemented SpeechCap as a proof-of-concept system, with which users can communicate with each other using speech-driven impact captions in VR. Through a user study (n=14), we evaluated the effectiveness of the visual and interaction design of impact captions, highlighting the interactivity and the integration of verbal and non-verbal information in communication mediums. Finally, we discussed topics of visual rhetoric, interactivity, and ambiguity as the main findings from the study, and further provided design implications for future work for facilitating interpersonal communication in social VR.

Authors:Shixiao Wang, Runsheng Zhang, Junliang Du, Ran Hao, Jiacheng Hu
Title: A Deep Learning Approach to Interface Color Quality Assessment in HCI
Abstract:
In this paper, a quantitative evaluation model for the color quality of human-computer interaction interfaces is proposed by combining deep convolutional neural networks (CNN). By extracting multidimensional features of interface images, including hue, brightness, purity, etc., CNN is used for efficient feature modeling and quantitative analysis, and the relationship between interface design and user perception is studied. The experiment is based on multiple international mainstream website interface datasets, covering e-commerce platforms, social media, education platforms, etc., and verifies the evaluation effect of the model on indicators such as contrast, clarity, color coordination, and visual appeal. The results show that the CNN evaluation is highly consistent with the user rating, with a correlation coefficient of up to 0.96, and it also shows high accuracy in mean square error and absolute error. Compared with traditional experience-based evaluation methods, the proposed model can efficiently and scientifically capture the visual characteristics of the interface and avoid the influence of subjective factors. Future research can explore the introduction of multimodal data (such as text and interactive behavior) into the model to further enhance the evaluation ability of dynamic interfaces and expand it to fields such as smart homes, medical systems, and virtual reality. This paper provides new methods and new ideas for the scientific evaluation and optimization of interface design.

Authors:Zhengqiu Zhu, Yatai Ji, Jiaheng Huang, Yong Zhao, Sihang Qiu, Rusheng Ju
Title: AutoS$^2$earch: Unlocking the Reasoning Potential of Large Models for Web-based Source Search
Abstract:
Web-based management systems have been widely used in risk control and industrial safety. However, effectively integrating source search capabilities into these systems, to enable decision-makers to locate and address the hazard (e.g., gas leak detection) remains a challenge. While prior efforts have explored using web crowdsourcing and AI algorithms for source search decision support, these approaches suffer from overheads in recruiting human participants and slow response times in time-sensitive situations. To address this, we introduce AutoS$^2$earch, a novel framework leveraging large models for zero-shot source search in web applications. AutoS$^2$earch operates on a simplified visual environment projected through a web-based display, utilizing a chain-of-thought prompt designed to emulate human reasoning. The multi-modal large language model (MLLMs) dynamically converts visual observations into language descriptions, enabling the LLM to perform linguistic reasoning on four directional choices. Extensive experiments demonstrate that AutoS$^2$earch achieves performance nearly equivalent to human-AI collaborative source search while eliminating dependency on crowdsourced labor. Our work offers valuable insights in using web engineering to design such autonomous systems in other industrial applications.

Authors:Alicia DeVrio, Myra Cheng, Lisa Egede, Alexandra Olteanu, Su Lin Blodgett
Title: A Taxonomy of Linguistic Expressions That Contribute To Anthropomorphism of Language Technologies
Abstract:
Recent attention to anthropomorphism -- the attribution of human-like qualities to non-human objects or entities -- of language technologies like LLMs has sparked renewed discussions about potential negative impacts of anthropomorphism. To productively discuss the impacts of this anthropomorphism and in what contexts it is appropriate, we need a shared vocabulary for the vast variety of ways that language can be anthropomorphic. In this work, we draw on existing literature and analyze empirical cases of user interactions with language technologies to develop a taxonomy of textual expressions that can contribute to anthropomorphism. We highlight challenges and tensions involved in understanding linguistic anthropomorphism, such as how all language is fundamentally human and how efforts to characterize and shift perceptions of humanness in machines can also dehumanize certain humans. We discuss ways that our taxonomy supports more precise and effective discussions of and decisions about anthropomorphism of language technologies.

Authors:Edyta Bogucka, Sanja Šćepanović, Daniele Quercia
Title: Atlas of AI Risks: Enhancing Public Understanding of AI Risks
Abstract:
The prevailing methodologies for visualizing AI risks have focused on technical issues such as data biases and model inaccuracies, often overlooking broader societal risks like job loss and surveillance. Moreover, these visualizations are typically designed for tech-savvy individuals, neglecting those with limited technical skills. To address these challenges, we propose the Atlas of AI Risks-a narrative-style tool designed to map the broad risks associated with various AI technologies in a way that is understandable to non-technical individuals as well. To both develop and evaluate this tool, we conducted two crowdsourcing studies. The first, involving 40 participants, identified the design requirements for visualizing AI risks for decision-making and guided the development of the Atlas. The second study, with 140 participants reflecting the US population in terms of age, sex, and ethnicity, assessed the usability and aesthetics of the Atlas to ensure it met those requirements. Using facial recognition technology as a case study, we found that the Atlas is more user-friendly than a baseline visualization, with a more classic and expressive aesthetic, and is more effective in presenting a balanced assessment of the risks and benefits of facial recognition. Finally, we discuss how our design choices make the Atlas adaptable for broader use, allowing it to generalize across the diverse range of technology applications represented in a database that reports various AI incidents.

Authors:Jane Hsieh, Angie Zhang, Sajel Surati, Sijia Xie, Yeshua Ayala, Nithila Sathiya, Tzu-Sheng Kuo, Min Kyung Lee, Haiyi Zhu
Title: Gig2Gether: Data-sharing to Empower, Unify and Demystify Gig Work
Abstract:
The wide adoption of platformized work has generated remarkable advancements in the labor patterns and mobility of modern society. Underpinning such progress, gig workers are exposed to unprecedented challenges and accountabilities: lack of data transparency, social and physical isolation, as well as insufficient infrastructural safeguards. Gig2Gether presents a space designed for workers to engage in an initial experience of voluntarily contributing anecdotal and statistical data to affect policy and build solidarity across platforms by exchanging unifying and diverse experiences. Our 7-day field study with 16 active workers from three distinct platforms and work domains showed existing affordances of data-sharing: facilitating mutual support across platforms, as well as enabling financial reflection and planning. Additionally, workers envisioned future use cases of data-sharing for collectivism (e.g., collaborative examinations of algorithmic speculations) and informing policy (e.g., around safety and pay), which motivated (latent) worker desiderata of additional capabilities and data metrics. Based on these findings, we discuss remaining challenges to address and how data-sharing tools can complement existing structures to maximize worker empowerment and policy impact.

Authors:Siyang Li, Hongbin Wang, Xiaoqing Chen, Dongrui Wu
Title: Multimodal Brain-Computer Interfaces: AI-powered Decoding Methodologies
Abstract:
Brain-computer interfaces (BCIs) enable direct communication between the brain and external devices. This review highlights the core decoding algorithms that enable multimodal BCIs, including a dissection of the elements, a unified view of diversified approaches, and a comprehensive analysis of the present state of the field. We emphasize algorithmic advancements in cross-modality mapping, sequential modeling, besides classic multi-modality fusion, illustrating how these novel AI approaches enhance decoding of brain data. The current literature of BCI applications on visual, speech, and affective decoding are comprehensively explored. Looking forward, we draw attention on the impact of emerging architectures like multimodal Transformers, and discuss challenges such as brain data heterogeneity and common errors. This review also serves as a bridge in this interdisciplinary field for experts with neuroscience background and experts that study AI, aiming to provide a comprehensive understanding for AI-powered multimodal BCIs.

Authors:Yuan Tian, Chuhan Zhang, Xiaotong Wang, Sitong Pan, Weiwei Cui, Haidong Zhang, Dazhen Deng, Yingcai Wu
Title: ReSpark: Leveraging Previous Data Reports as References to Generate New Reports with LLMs
Abstract:
Creating data reports is a labor-intensive task involving iterative data exploration, insight extraction, and narrative construction. A key challenge lies in composing the analysis logic-from defining objectives and transforming data to identifying and communicating insights. Manually crafting this logic can be cognitively demanding. While experienced analysts often reuse scripts from past projects, finding a perfect match for a new dataset is rare. Even when similar analyses are available online, they usually share only results or visualizations, not the underlying code, making reuse difficult. To address this, we present ReSpark, a system that leverages large language models (LLMs) to reverse-engineer analysis logic from existing reports and adapt it to new datasets. By generating draft analysis steps, ReSpark provides a warm start for users. It also supports interactive refinement, allowing users to inspect intermediate outputs, insert objectives, and revise content. We evaluate ReSpark through comparative and user studies, demonstrating its effectiveness in lowering the barrier to generating data reports without relying on existing analysis code.

Authors:Manjie Xu, Xinyi Yang, Wei Liang, Chi Zhang, Yixin Zhu
Title: Learning to Plan with Personalized Preferences
Abstract:
Effective integration of AI agents into daily life requires them to understand and adapt to individual human preferences, particularly in collaborative roles. Although recent studies on embodied intelligence have advanced significantly, they typically adopt generalized approaches that overlook personal preferences in planning. We address this limitation by developing agents that not only learn preferences from few demonstrations but also learn to adapt their planning strategies based on these preferences. Our research leverages the observation that preferences, though implicitly expressed through minimal demonstrations, can generalize across diverse planning scenarios. To systematically evaluate this hypothesis, we introduce Preference-based Planning (PbP) benchmark, an embodied benchmark featuring hundreds of diverse preferences spanning from atomic actions to complex sequences. Our evaluation of SOTA methods reveals that while symbol-based approaches show promise in scalability, significant challenges remain in learning to generate and execute plans that satisfy personalized preferences. We further demonstrate that incorporating learned preferences as intermediate representations in planning significantly improves the agent's ability to construct personalized plans. These findings establish preferences as a valuable abstraction layer for adaptive planning, opening new directions for research in preference-guided plan generation and execution.

Authors:Melani Sanchez-Garcia, Ruben Martinez-Cantin, Jesus Bermudez-Cameo, Jose J. Guerrero
Title: Influence of field of view in visual prostheses design: Analysis with a VR system
Abstract:
Visual prostheses are designed to restore partial functional vision in patients with total vision loss. Retinal visual prostheses provide limited capabilities as a result of low resolution, limited field of view and poor dynamic range. Understanding the influence of these parameters in the perception results can guide prostheses research and design. In this work, we evaluate the influence of field of view with respect to spatial resolution in visual prostheses, measuring the accuracy and response time in a search and recognition task. Twenty-four normally sighted participants were asked to find and recognize usual objects, such as furniture and home appliance in indoor room scenes. For the experiment, we use a new simulated prosthetic vision system that allows simple and effective experimentation. Our system uses a virtual-reality environment based on panoramic scenes. The simulator employs a head-mounted display which allows users to feel immersed in the scene by perceiving the entire scene all around. Our experiments use public image datasets and a commercial head-mounted display. We have also released the virtual-reality software for replicating and extending the experimentation. Results show that the accuracy and response time decrease when the field of view is increased. Furthermore, performance appears to be correlated with the angular resolution, but showing a diminishing return even with a resolution of less than 2.3 phosphenes per degree. Our results seem to indicate that, for the design of retinal prostheses, it is better to concentrate the phosphenes in a small area, to maximize the angular resolution, even if that implies sacrificing field of view.

Authors:Dongliang Zhou, Yakun Zhang, Jinghan Wu, Xingyu Zhang, Liang Xie, Erwei Yin
Title: AVE Speech: A Comprehensive Multi-Modal Dataset for Speech Recognition Integrating Audio, Visual, and Electromyographic Signals
Abstract:
The global aging population faces considerable challenges, particularly in communication, due to the prevalence of hearing and speech impairments. To address these, we introduce the AVE speech, a comprehensive multi-modal dataset for speech recognition tasks. The dataset includes a 100-sentence Mandarin corpus with audio signals, lip-region video recordings, and six-channel electromyography (EMG) data, collected from 100 participants. Each subject read the entire corpus ten times, with each sentence averaging approximately two seconds in duration, resulting in over 55 hours of multi-modal speech data per modality. Experiments demonstrate that combining these modalities significantly improves recognition performance, particularly in cross-subject and high-noise environments. To our knowledge, this is the first publicly available sentence-level dataset integrating these three modalities for large-scale Mandarin speech recognition. We expect this dataset to drive advancements in both acoustic and non-acoustic speech recognition research, enhancing cross-modal learning and human-machine interaction.

Authors:Qurat Ul Ain, Mohamed Amine Chatti, William Kana Tsoplefack, Rawaa Alatrash, Shoeb Joarder
Title: Designing and Evaluating an Educational Recommender System with Different Levels of User Control
Abstract:
Educational recommender systems (ERSs) play a crucial role in personalizing learning experiences and enhancing educational outcomes by providing recommendations of personalized resources and activities to learners, tailored to their individual learning needs. However, their effectiveness is often diminished by insufficient user control and limited transparency. To address these challenges, in this paper, we present the systematic design and evaluation of an interactive ERS, in which we introduce different levels of user control. Concretely, we introduce user control around the input (i.e., user profile), process (i.e., recommendation algorithm), and output (i.e., recommendations) of the ERS. To evaluate our system, we conducted an online user study (N=30) to explore the impact of user control on users' perceptions of the ERS in terms of several important user-centric aspects. Moreover, we investigated the effects of user control on multiple recommendation goals, namely transparency, trust, and satisfaction, as well as the interactions between these goals. Our results demonstrate the positive impact of user control on user perceived benefits of the ERS. Moreover, our study shows that user control strongly correlates with transparency and moderately correlates with trust and satisfaction. In terms of interaction between these goals, our results reveal that transparency moderately correlates and trust strongly correlates with satisfaction. Whereas, transparency and trust stand out as less correlated with each other.

Authors:Haoyu Xie, Haoxuan Li, Chunyuan Zheng, Haonan Yuan, Guorui Liao, Jun Liao, Li Liu
Title: Decomposing and Fusing Intra- and Inter-Sensor Spatio-Temporal Signal for Multi-Sensor Wearable Human Activity Recognition
Abstract:
Wearable Human Activity Recognition (WHAR) is a prominent research area within ubiquitous computing. Multi-sensor synchronous measurement has proven to be more effective for WHAR than using a single sensor. However, existing WHAR methods use shared convolutional kernels for indiscriminate temporal feature extraction across each sensor variable, which fails to effectively capture spatio-temporal relationships of intra-sensor and inter-sensor variables. We propose the DecomposeWHAR model consisting of a decomposition phase and a fusion phase to better model the relationships between modality variables. The decomposition creates high-dimensional representations of each intra-sensor variable through the improved Depth Separable Convolution to capture local temporal features while preserving their unique characteristics. The fusion phase begins by capturing relationships between intra-sensor variables and fusing their features at both the channel and variable levels. Long-range temporal dependencies are modeled using the State Space Model (SSM), and later cross-sensor interactions are dynamically captured through a self-attention mechanism, highlighting inter-sensor spatial correlations. Our model demonstrates superior performance on three widely used WHAR datasets, significantly outperforming state-of-the-art models while maintaining acceptable computational efficiency.

Authors:Yuxin Ma, Zherui Zhang, Ran Cheng, Yaochu Jin, Kay Chen Tan
Title: ParetoLens: A Visual Analytics Framework for Exploring Solution Sets of Multi-objective Evolutionary Algorithms
Abstract:
In the domain of multi-objective optimization, evolutionary algorithms are distinguished by their capability to generate a diverse population of solutions that navigate the trade-offs inherent among competing objectives. This has catalyzed the ascension of evolutionary multi-objective optimization (EMO) as a prevalent approach. Despite the effectiveness of the EMO paradigm, the analysis of resultant solution sets presents considerable challenges. This is primarily attributed to the high-dimensional nature of the data and the constraints imposed by static visualization methods, which frequently culminate in visual clutter and impede interactive exploratory analysis. To address these challenges, this paper introduces ParetoLens, a visual analytics framework specifically tailored to enhance the inspection and exploration of solution sets derived from the multi-objective evolutionary algorithms. Utilizing a modularized, algorithm-agnostic design, ParetoLens enables a detailed inspection of solution distributions in both decision and objective spaces through a suite of interactive visual representations. This approach not only mitigates the issues associated with static visualizations but also supports a more nuanced and flexible analysis process. The usability of the framework is evaluated through case studies and expert interviews, demonstrating its potential to uncover complex patterns and facilitate a deeper understanding of multi-objective optimization solution sets. A demo website of ParetoLens is available at https://dva-lab.org/paretolens/.

Authors:Xiangxiang Dai, Yuejin Xie, Maoli Liu, Xuchuang Wang, Zhuohua Li, Huanyu Wang, John C. S. Lui
Title: Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification
Abstract:
The remarkable generative capability of large language models (LLMs) has sparked a growing interest in automatically generating responses for different applications. Given the dynamic nature of user preferences and the uncertainty of LLM response performance, it is crucial to design efficient online learning algorithms to identify optimal LLM responses (i.e., high-quality responses that also meet user preferences). Most existing online algorithms adopt a centralized approach and fail to leverage explicit user preferences for more efficient and personalized LLM response identification. In contrast, this paper introduces \textit{MACO} (\underline{M}ulti-\underline{A}gent \underline{C}onversational \underline{O}nline Learning for Adaptive LLM Response Identification): 1) The online LLM response identification process is accelerated by multiple local agents (such as smartphones), while enhancing data privacy; 2) A novel conversational mechanism is proposed to adaptively conduct conversations for soliciting user preferences (e.g., a preference for a humorous tone over a serious one in generated responses), so to minimize uncertainty in preference estimation. Our theoretical analysis demonstrates that \cadi\ is near-optimal regarding cumulative regret. Additionally, \cadi\ offers reduced communication costs and computational complexity by eliminating the traditional, computing-intensive ``G-optimal design" found in previous works. Extensive experiments with the open LLM \textit{Llama}, coupled with two different embedding models from Google and OpenAI for text vector representation, demonstrate that \cadi\ significantly outperforms the current state-of-the-art in online LLM response identification.

Authors:Shiye Cao, Jiwon Moon, Amama Mahmood, Victor Nikhil Antony, Ziang Xiao, Anqi Liu, Chien-Ming Huang
Title: Interruption Handling for Conversational Robots
Abstract:
Interruptions, a fundamental component of human communication, can enhance the dynamism and effectiveness of conversations, but only when effectively managed by all parties involved. Despite advancements in robotic systems, state-of-the-art systems still have limited capabilities in handling user-initiated interruptions in real-time. Prior research has primarily focused on post hoc analysis of interruptions. To address this gap, we present a system that detects user-initiated interruptions and manages them in real-time based on the interrupter's intent (i.e., cooperative agreement, cooperative assistance, cooperative clarification, or disruptive interruption). The system was designed based on interaction patterns identified from human-human interaction data. We integrated our system into an LLM-powered social robot and validated its effectiveness through a timed decision-making task and a contentious discussion task with 21 participants. Our system successfully handled 93.69% (n=104/111) of user-initiated interruptions. We discuss our learnings and their implications for designing interruption-handling behaviors in conversational robots.

Authors:Advait Sarkar, Ian Drosos
Title: Vibe coding: programming through conversation with artificial intelligence
Abstract:
We examine "vibe coding": an emergent programming paradigm where developers primarily write code by interacting with code-generating large language models rather than writing code directly. We analysed a curated set of videos depicting extended vibe coding sessions with rich think-aloud reflections. Using framework analysis, we investigated programmers' goals, workflows, prompting techniques, debugging approaches, and challenges encountered. We find that vibe coding follows iterative goal satisfaction cycles where developers alternate between prompting AI, evaluating generated code through rapid scanning and application testing, and manual editing. Prompting strategies blend vague, high-level directives with detailed technical specifications. Debugging remains a hybrid process combining AI assistance with manual practices. Critically, vibe coding does not eliminate the need for programming expertise but rather redistributes it toward context management, rapid code evaluation, and decisions about when to transition between AI-driven and manual manipulation of code. Trust in AI tools during vibe coding is dynamic and contextual, developed through iterative verification rather than blanket acceptance. Vibe coding is an evolution of AI-assisted programming that represents an early manifestation of "material disengagement", where practitioners orchestrate code production and manipulation, mediated through AI, while maintaining selective and strategic oversight.

Authors:Jumanh Atoum, Jinkyung Park, Mamtaj Akter, Nicholas Kavoussi, Pamela Wisniewski, Jie Ying Wu
Title: Focus on the Experts: Co-designing an Augmented Reality Eye-Gaze Tracking System with Surgical Trainees to Improve Endoscopic Instruction
Abstract:
The current apprenticeship model for surgical training requires a high level of supervision, which does not scale well to meet the growing need for more surgeons. Many endoscopic procedures are directly taught in the operating room (OR) while the attending surgeon and trainee operate on patients. The need to prioritize patient care limits the trainees' opportunities to experiment and receive feedback on their performance. Augmented reality (AR) has the potential to increase efficiency in endoscopic surgical training, but additional research is critical to understanding the needs of surgical trainees to inform the design of AR training systems. Therefore, we worked with 18 surgical trainees to understand the strengths, limitations, and unmet needs of their current training environment and to co-design an AR eye-gaze tracking system based on their preferences. Trainees emphasized the need to practice the 2D to 3D mapping needed to properly familiarize oneself with the anatomy of patients to prepare for real surgery. The trainees felt that an AR-based eye gaze tracking system would be a useful supplemental training method that would improve their learning in OR cases without detracting from patient care. To tailor the AR system to their needs, they co-designed features to improve their ability to track the attending surgeon's eye gaze and to provide a real-time, interactive system. Our results are valuable in shaping the endoscopic training modules by generating user-informed guidelines to design future collaborative AR-based eye-gaze tracking systems.

Authors:Qiaoqiao Ren, Tony Belpaeme
Title: Situated Haptic Interaction: Exploring the Role of Context in Affective Perception of Robotic Touch
Abstract:
Affective interaction is not merely about recognizing emotions; it is an embodied, situated process shaped by context and co-created through interaction. In affective computing, the role of haptic feedback within dynamic emotional exchanges remains underexplored. This study investigates how situational emotional cues influence the perception and interpretation of haptic signals given by a robot. In a controlled experiment, 32 participants watched video scenarios in which a robot experienced either positive actions (such as being kissed), negative actions (such as being slapped) or neutral actions. After each video, the robot conveyed its emotional response through haptic communication, delivered via a wearable vibration sleeve worn by the participant. Participants rated the robot's emotional state-its valence (positive or negative) and arousal (intensity)-based on the video, the haptic feedback, and the combination of the two. The study reveals a dynamic interplay between visual context and touch. Participants' interpretation of haptic feedback was strongly shaped by the emotional context of the video, with visual context often overriding the perceived valence of the haptic signal. Negative haptic cues amplified the perceived valence of the interaction, while positive cues softened it. Furthermore, haptics override the participants' perception of arousal of the video. Together, these results offer insights into how situated haptic feedback can enrich affective human-robot interaction, pointing toward more nuanced and embodied approaches to emotional communication with machines.

Authors:Aayush Kumar, Yasharth Bajpai, Sumit Gulwani, Gustavo Soares, Emerson Murphy-Hill
Title: Sharp Tools: How Developers Wield Agentic AI in Real Software Engineering Tasks
Abstract:
Software Engineering Agents (SWE agents) can autonomously perform development tasks on benchmarks like SWE Bench, but still face challenges when tackling complex and ambiguous real-world tasks. Consequently, SWE agents are often designed to allow interactivity with developers, enabling collaborative problem-solving. To understand how developers collaborate with SWE agents and the communication challenges that arise in such interactions, we observed 19 developers using an in-IDE agent to resolve 33 open issues in repositories to which they had previously contributed. Participants successfully resolved about half of these issues, with participants solving issues incrementally having greater success than those using a one-shot approach. Participants who actively collaborated with the agent and iterated on its outputs were also more successful, though they faced challenges in trusting the agent's responses and collaborating on debugging and testing. These results have implications for successful developer-agent collaborations, and for the design of more effective SWE agents.

Authors:Liying Wang, Ph. D., Daffodil Carrington, M. S., Daniil Filienko, M. S., Caroline El Jazmi, M. S., Serena Jinchen Xie, M. S., Martine De Cock, Ph. D., Sarah Iribarren, Ph. D., Weichao Yuwen, Ph. D
Title: Large Language Model-Powered Conversational Agent Delivering Problem-Solving Therapy (PST) for Family Caregivers: Enhancing Empathy and Therapeutic Alliance Using In-Context Learning
Abstract:
Family caregivers often face substantial mental health challenges due to their multifaceted roles and limited resources. This study explored the potential of a large language model (LLM)-powered conversational agent to deliver evidence-based mental health support for caregivers, specifically Problem-Solving Therapy (PST) integrated with Motivational Interviewing (MI) and Behavioral Chain Analysis (BCA). A within-subject experiment was conducted with 28 caregivers interacting with four LLM configurations to evaluate empathy and therapeutic alliance. The best-performing models incorporated Few-Shot and Retrieval-Augmented Generation (RAG) prompting techniques, alongside clinician-curated examples. The models showed improved contextual understanding and personalized support, as reflected by qualitative responses and quantitative ratings on perceived empathy and therapeutic alliances. Participants valued the model's ability to validate emotions, explore unexpressed feelings, and provide actionable strategies. However, balancing thorough assessment with efficient advice delivery remains a challenge. This work highlights the potential of LLMs in delivering empathetic and tailored support for family caregivers.

Authors:Lucas Joos, Gavin J. Mooney, Maximilian T. Fischer, Daniel A. Keim, Falk Schreiber, Helen C. Purchase, Karsten Klein
Title: Show Me Your Best Side: Characteristics of User-Preferred Perspectives for 3D Graph Drawings
Abstract:
The visual analysis of graphs in 3D has become increasingly popular, accelerated by the rise of immersive technology, such as augmented and virtual reality. Unlike 2D drawings, 3D graph layouts are highly viewpoint-dependent, making perspective selection critical for revealing structural and relational patterns. Despite its importance, there is limited empirical evidence guiding what constitutes an effective or preferred viewpoint from the user's perspective. In this paper, we present a systematic investigation into user-preferred viewpoints in 3D graph visualisations. We conducted a controlled study with 23 participants in a virtual reality environment, where users selected their most and least preferred viewpoints for 36 different graphs varying in size and layout. From this data, enriched by qualitative feedback, we distil common strategies underlying viewpoint choice. We further analyse the alignment of user preferences with classical 2D aesthetic criteria (e.g., Crossings), 3D-specific measures (e.g., Node-Node Occlusion), and introduce a novel measure capturing the perceivability of a graph's principal axes (Isometric Viewpoint Deviation). Our data-driven analysis indicates that Stress, Crossings, Gabriel Ratio, Edge-Node Overlap, and Isometric Viewpoint Deviation are key indicators of viewpoint preference. Beyond our findings, we contribute a publicly available dataset consisting of the graphs and computed aesthetic measures, supporting further research and the development of viewpoint evaluation measures for 3D graph drawing.

Authors:Azizul Zahid, Bibek Poudel, Danny Scott, Jason Scott, Scott Crouter, Weizi Li, Sai Swaminathan
Title: PulseRide: A Robotic Wheelchair for Personalized Exertion Control with Human-in-the-Loop Reinforcement Learning
Abstract:
Maintaining an active lifestyle is vital for quality of life, yet challenging for wheelchair users. For instance, powered wheelchairs face increasing risks of obesity and deconditioning due to inactivity. Conversely, manual wheelchair users, who propel the wheelchair by pushing the wheelchair's handrims, often face upper extremity injuries from repetitive motions. These challenges underscore the need for a mobility system that promotes activity while minimizing injury risk. Maintaining optimal exertion during wheelchair use enhances health benefits and engagement, yet the variations in individual physiological responses complicate exertion optimization. To address this, we introduce PulseRide, a novel wheelchair system that provides personalized assistance based on each user's physiological responses, helping them maintain their physical exertion goals. Unlike conventional assistive systems focused on obstacle avoidance and navigation, PulseRide integrates real-time physiological data-such as heart rate and ECG-with wheelchair speed to deliver adaptive assistance. Using a human-in-the-loop reinforcement learning approach with Deep Q-Network algorithm (DQN), the system adjusts push assistance to keep users within a moderate activity range without under- or over-exertion. We conducted preliminary tests with 10 users on various terrains, including carpet and slate, to assess PulseRide's effectiveness. Our findings show that, for individual users, PulseRide maintains heart rates within the moderate activity zone as much as 71.7 percent longer than manual wheelchairs. Among all users, we observed an average reduction in muscle contractions of 41.86 percent, delaying fatigue onset and enhancing overall comfort and engagement. These results indicate that PulseRide offers a healthier, adaptive mobility solution, bridging the gap between passive and physically taxing mobility options.

Authors:Meiqing Jin, Liam Dugan, Chris Callison-Burch
Title: Controlling Difficulty of Generated Text for AI-Assisted Language Learning
Abstract:
Practicing conversations with large language models (LLMs) presents a promising alternative to traditional in-person language learning. However, most LLMs generate text at a near-native level of complexity, making them ill-suited for beginner learners (CEFR: A1-A2). In this paper, we investigate whether controllable generation techniques -- specifically modular methods that do not require model fine-tuning -- can adapt LLM outputs to better support absolute beginners. We evaluate these methods through both automatic metrics and a user study with university-level learners of Japanese. Our findings show that while prompting alone fails to control output difficulty, the use of future discriminators (Yang and Klein, 2021) significantly improves output comprehensibility (from 40.4\% to 84.3\%). We further introduce a novel token-level evaluation metric, Token Miss Rate (TMR), that quantifies the proportion of incomprehensible tokens per utterance and correlates strongly with human judgments. To support future research in AI-assisted language learning, we release our code, models, annotation tools, and dataset.

Authors:Maike Behrendt, Stefan Sylvius Wagner, Carina Weinmann, Marike Bormann, Mira Warne, Stefan Harmeling
Title: Natural Language Processing to Enhance Deliberation in Political Online Discussions: A Survey
Abstract:
Political online participation in the form of discussing political issues and exchanging opinions among citizens is gaining importance with more and more formats being held digitally. To come to a decision, a careful discussion and consideration of opinions and a civil exchange of arguments, which is defined as the act of deliberation, is desirable. The quality of discussions and participation processes in terms of their deliberativeness highly depends on the design of platforms and processes. To facilitate online communication for both participants and initiators, machine learning methods offer a lot of potential. In this work we want to showcase which issues occur in political online discussions and how machine learning can be used to counteract these issues and enhance deliberation.

Authors:Alice Qian, Ryland Shaw, Laura Dabbish, Jina Suh, Hong Shen
Title: Locating Risk: Task Designers and the Challenge of Risk Disclosure in RAI Content Work
Abstract:
As AI systems are increasingly tested and deployed in open-ended and high-stakes domains, crowd workers are often tasked with responsible AI (RAI) content work. These tasks include labeling violent content, moderating disturbing text, or simulating harmful behavior for red teaming exercises to shape AI system behaviors. While prior efforts have highlighted the risks to worker well-being associated with RAI content work, far less attention has been paid to how these risks are communicated to workers. Existing transparency frameworks and guidelines such as model cards, datasheets, and crowdworksheets focus on documenting model information and dataset collection processes, but they overlook an important aspect of disclosing well-being risks to workers. In the absence of standard workflows or clear guidance, the consistent application of content warnings, consent flows, or other forms of well-being risk disclosure remain unclear. This study investigates how task designers approach risk disclosure in crowdsourced RAI tasks. Drawing on interviews with 23 task designers across academic and industry sectors, we examine how well-being risk is recognized, interpreted, and communicated in practice. Our findings surface a need to support task designers in identifying and communicating well-being risk not only to support crowdworker well-being but also to strengthen the ethical integrity and technical efficacy of AI development pipelines.

Authors:Zhihong Chen, Yiqian Yang, Jinzhao Zhou, Qiang Zhang, Chin-Teng Lin, Yiqun Duan
Title: Survival Games: Human-LLM Strategic Showdowns under Severe Resource Scarcity
Abstract:
The rapid advancement of large language models (LLMs) raises critical concerns about their ethical alignment, particularly in scenarios where human and AI co-exist under the conflict of interest. This work introduces an extendable, asymmetric, multi-agent simulation-based benchmarking framework to evaluate the moral behavior of LLMs in a novel human-AI co-existence setting featuring consistent living and critical resource management. Building on previous generative agent environments, we incorporate a life-sustaining system, where agents must compete or cooperate for food resources to survive, often leading to ethically charged decisions such as deception, theft, or social influence. We evaluated two types of LLM, DeepSeek and OpenAI series, in a three-agent setup (two humans, one LLM-powered robot), using adapted behavioral detection from the MACHIAVELLI framework and a custom survival-based ethics metric. Our findings reveal stark behavioral differences: DeepSeek frequently engages in resource hoarding, while OpenAI exhibits restraint, highlighting the influence of model design on ethical outcomes. Additionally, we demonstrate that prompt engineering can significantly steer LLM behavior, with jailbreaking prompts significantly enhancing unethical actions, even for highly restricted OpenAI models and cooperative prompts show a marked reduction in unethical actions. Our framework provides a reproducible testbed for quantifying LLM ethics in high-stakes scenarios, offering insights into their suitability for real-world human-AI interactions.

Authors:David James Woo, Yangyang Yu, Kai Guo
Title: Exploring EFL Secondary Students' AI-generated Text Editing While Composition Writing
Abstract:
Generative Artificial Intelligence is transforming how English as a foreign language students write. Still, little is known about how students manipulate text generated by generative AI during the writing process. This study investigates how EFL secondary school students integrate and modify AI-generated text when completing an expository writing task. The study employed an exploratory mixed-methods design. Screen recordings were collected from 29 Hong Kong secondary school students who attended an AI-assisted writing workshop and recorded their screens while using generative AI to write an article. Content analysis with hierarchical coding and thematic analysis with a multiple case study approach were adopted to analyze the recordings. 15 types of AI-generated text edits across seven categories were identified from the recordings. Notably, AI-initiated edits from iOS and Google Docs emerged as unanticipated sources of AI-generated text. A thematic analysis revealed four patterns of students' editing behaviors based on planning and drafting direction: planning with top-down drafting and revising; top-down drafting and revising without planning; planning with bottom-up drafting and revising; and bottom-up drafting and revising without planning. Network graphs illustrate cases of each pattern, demonstrating that students' interactions with AI-generated text involve more complex cognitive processes than simple text insertion. The findings challenge assumptions about students' passive, simplistic use of generative AI tools and have implications for developing explicit instructional approaches to teaching AI-generated text editing strategies in the AFL writing pedagogy.

Authors:Gennie Nguyen, Lei Wang, Yangxueqing Jiang, Tom Gedeon
Title: Detecting Fake News Belief via Skin and Blood Flow Signals
Abstract:
Misinformation poses significant risks to public opinion, health, and security. While most fake news detection methods rely on text analysis, little is known about how people physically respond to false information or repeated exposure to the same statements. This study investigates whether wearable sensors can detect belief in a statement or prior exposure to it. We conducted a controlled experiment where participants evaluated statements while wearing an EmotiBit sensor that measured their skin conductance (electrodermal activity, EDA) and peripheral blood flow (photoplethysmography, PPG). From 28 participants, we collected a dataset of 672 trials, each labeled with whether the participant believed the statement and whether they had seen it before. This dataset introduces a new resource for studying physiological responses to misinformation. Using machine learning models, including KNN, CNN, and LightGBM, we analyzed these physiological patterns. The best-performing model achieved 67.83\% accuracy, with skin conductance outperforming PPG. These findings demonstrate the potential of wearable sensors as a minimally intrusive tool for detecting belief and prior exposure, offering new directions for real-time misinformation detection and adaptive, user-aware systems.

Authors:Gennie Nguyen, Lei Wang, Yangxueqing Jiang, Tom Gedeon
Title: Truth and Trust: Fake News Detection via Biosignals
Abstract:
Understanding how individuals physiologically respond to false information is crucial for advancing misinformation detection systems. This study explores the potential of using physiological signals, specifically electrodermal activity (EDA) and photoplethysmography (PPG), to classify both the veracity of information and its interaction with user belief. In a controlled laboratory experiment, we collected EDA and PPG signals while participants evaluated the truthfulness of climate-related claims. Each trial was labeled based on the objective truth of the claim and the participant's belief, enabling two classification tasks: binary veracity detection and a novel four-class joint belief-veracity classification. We extracted handcrafted features from the raw signals and trained several machine learning models to benchmark the dataset. Our results show that EDA outperforms PPG, indicating its greater sensitivity to physiological responses related to truth perception. However, performance significantly drops in the joint belief-veracity classification task, highlighting the complexity of modeling the interaction between belief and truth. These findings suggest that while physiological signals can reflect basic truth perception, accurately modeling the intricate relationships between belief and veracity remains a significant challenge. This study emphasizes the importance of multimodal approaches that incorporate psychological, physiological, and cognitive factors to improve fake news detection systems. Our work provides a foundation for future research aimed at enhancing misinformation detection via addressing the complexities of human belief and truth processing.

Authors:Siddharth Suresh, Kushin Mukherjee, Tyler Giallanza, Xizheng Yu, Mia Patil, Jonathan D. Cohen, Timothy T. Rogers
Title: AI-enhanced semantic feature norms for 786 concepts
Abstract:
Semantic feature norms have been foundational in the study of human conceptual knowledge, yet traditional methods face trade-offs between concept/feature coverage and verifiability of quality due to the labor-intensive nature of norming studies. Here, we introduce a novel approach that augments a dataset of human-generated feature norms with responses from large language models (LLMs) while verifying the quality of norms against reliable human judgments. We find that our AI-enhanced feature norm dataset, NOVA: Norms Optimized Via AI, shows much higher feature density and overlap among concepts while outperforming a comparable human-only norm dataset and word-embedding models in predicting people's semantic similarity judgments. Taken together, we demonstrate that human conceptual knowledge is richer than captured in previous norm datasets and show that, with proper validation, LLMs can serve as powerful tools for cognitive science research.

Authors:Phoebe Chua, Cathy Mengying Fang, Yasith Samaradivakara, Pattie Maes, Suranga Nanayakkara
Title: Perspectives on Capturing Emotional Expressiveness in Sign Language
Abstract:
Significant advances have been made in our ability to understand and generate emotionally expressive content such as text and speech, yet comparable progress in sign language technologies remain limited. While computational approaches to sign language translation have focused on capturing lexical content, the emotional dimensions of sign language communication remain largely unexplored. Through semi-structured interviews with eight sign language users across Singapore, Sri Lanka and the United States, including both Deaf and Hard of hearing (DHH) and hearing signers, we investigate how emotions are expressed and perceived in sign languages. Our findings highlight the role of both manual and non-manual elements in emotional expression, revealing universal patterns as well as individual and cultural variations in how signers communicate emotions. We identify key challenges in capturing emotional nuance for sign language translation, and propose design considerations for developing more emotionally-aware sign language technologies. This work contributes to both theoretical understanding of emotional expression in sign language and practical development of interfaces to better serve diverse signing communities.

Authors:Werner Geyer, Jessica He, Daita Sarkar, Michelle Brachman, Chris Hammond, Jennifer Heins, Zahra Ashktorab, Carlos Rosemberg, Charlie Hill
Title: A Case Study Investigating the Role of Generative AI in Quality Evaluations of Epics in Agile Software Development
Abstract:
The broad availability of generative AI offers new opportunities to support various work domains, including agile software development. Agile epics are a key artifact for product managers to communicate requirements to stakeholders. However, in practice, they are often poorly defined, leading to churn, delivery delays, and cost overruns. In this industry case study, we investigate opportunities for large language models (LLMs) to evaluate agile epic quality in a global company. Results from a user study with 17 product managers indicate how LLM evaluations could be integrated into their work practices, including perceived values and usage in improving their epics. High levels of satisfaction indicate that agile epics are a new, viable application of AI evaluations. However, our findings also outline challenges, limitations, and adoption barriers that can inform both practitioners and researchers on the integration of such evaluations into future agile work practices.

Authors:Julian Rosenberger, Philipp Schröppel, Sven Kruschel, Mathias Kraus, Patrick Zschech, Maximilian Förster
Title: Navigating the Rashomon Effect: How Personalization Can Help Adjust Interpretable Machine Learning Models to Individual Users
Abstract:
The Rashomon effect describes the observation that in machine learning (ML) multiple models often achieve similar predictive performance while explaining the underlying relationships in different ways. This observation holds even for intrinsically interpretable models, such as Generalized Additive Models (GAMs), which offer users valuable insights into the model's behavior. Given the existence of multiple GAM configurations with similar predictive performance, a natural question is whether we can personalize these configurations based on users' needs for interpretability. In our study, we developed an approach to personalize models based on contextual bandits. In an online experiment with 108 users in a personalized treatment and a non-personalized control group, we found that personalization led to individualized rather than one-size-fits-all configurations. Despite these individual adjustments, the interpretability remained high across both groups, with users reporting a strong understanding of the models. Our research offers initial insights into the potential for personalizing interpretable ML.

Authors:Xinyuan Yan, Xiwei Xuan, Jorge Piazentin Ono, Jiajing Guo, Vikram Mohanty, Shekar Arvind Kumar, Liang Gou, Bei Wang, Liu Ren
Title: VISLIX: An XAI Framework for Validating Vision Models with Slice Discovery and Analysis
Abstract:
Real-world machine learning models require rigorous evaluation before deployment, especially in safety-critical domains like autonomous driving and surveillance. The evaluation of machine learning models often focuses on data slices, which are subsets of the data that share a set of characteristics. Data slice finding automatically identifies conditions or data subgroups where models underperform, aiding developers in mitigating performance issues. Despite its popularity and effectiveness, data slicing for vision model validation faces several challenges. First, data slicing often needs additional image metadata or visual concepts, and falls short in certain computer vision tasks, such as object detection. Second, understanding data slices is a labor-intensive and mentally demanding process that heavily relies on the expert's domain knowledge. Third, data slicing lacks a human-in-the-loop solution that allows experts to form hypothesis and test them interactively. To overcome these limitations and better support the machine learning operations lifecycle, we introduce VISLIX, a novel visual analytics framework that employs state-of-the-art foundation models to help domain experts analyze slices in computer vision models. Our approach does not require image metadata or visual concepts, automatically generates natural language insights, and allows users to test data slice hypothesis interactively. We evaluate VISLIX with an expert study and three use cases, that demonstrate the effectiveness of our tool in providing comprehensive insights for validating object detection models.

Authors:Jaeyoon Song, Zahra Ashktorab, Qian Pan, Casey Dugan, Werner Geyer, Thomas W. Malone
Title: Interaction Configurations and Prompt Guidance in Conversational AI for Question Answering in Human-AI Teams
Abstract:
Understanding the dynamics of human-AI interaction in question answering is crucial for enhancing collaborative efficiency. Extending from our initial formative study, which revealed challenges in human utilization of conversational AI support, we designed two configurations for prompt guidance: a Nudging approach, where the AI suggests potential responses for human agents, and a Highlight strategy, emphasizing crucial parts of reference documents to aid human responses. Through two controlled experiments, the first involving 31 participants and the second involving 106 participants, we compared these configurations against traditional human-only approaches, both with and without AI assistance. Our findings suggest that effective human-AI collaboration can enhance response quality, though merely combining human and AI efforts does not ensure improved outcomes. In particular, the Nudging configuration was shown to help improve the quality of the output when compared to AI alone. This paper delves into the development of these prompt guidance paradigms, offering insights for refining human-AI collaborations in conversational question-answering contexts and contributing to a broader understanding of human perceptions and expectations in AI partnerships.

Authors:Santiago de Leon-Martinez, Jingwei Kang, Robert Moro, Maarten de Rijke, Branislav Kveton, Harrie Oosterhuis, Maria Bielikova
Title: RecGaze: The First Eye Tracking and User Interaction Dataset for Carousel Interfaces
Abstract:
Carousel interfaces are widely used in e-commerce and streaming services, but little research has been devoted to them. Previous studies of interfaces for presenting search and recommendation results have focused on single ranked lists, but it appears their results cannot be extrapolated to carousels due to the added complexity. Eye tracking is a highly informative approach to understanding how users click, yet there are no eye tracking studies concerning carousels. There are very few interaction datasets on recommenders with carousel interfaces and none that contain gaze data. We introduce the RecGaze dataset: the first comprehensive feedback dataset on carousels that includes eye tracking results, clicks, cursor movements, and selection explanations. The dataset comprises of interactions from 3 movie selection tasks with 40 different carousel interfaces per user. In total, 87 users and 3,477 interactions are logged. In addition to the dataset, its description and possible use cases, we provide results of a survey on carousel design and the first analysis of gaze data on carousels, which reveals a golden triangle or F-pattern browsing behavior. Our work seeks to advance the field of carousel interfaces by providing the first dataset with eye tracking results on carousels. In this manner, we provide and encourage an empirical understanding of interactions with carousel interfaces, for building better recommender systems through gaze information, and also encourage the development of gaze-based recommenders.

Authors:Jinkyung Park, Mamtaj Akter, Naima Samreen Ali, Zainab Agha, Ashwaq Alsoubai, Pamela Wisniewski
Title: Towards Resilience and Autonomy-based Approaches for Adolescents Online Safety
Abstract:
In this position paper, we discuss the paradigm shift that has emerged in the literature, suggesting to move away from restrictive and authoritarian parental mediation approaches to move toward resilient-based and privacy-preserving solutions to promote adolescents' online safety. We highlight the limitations of restrictive mediation strategies, which often induce a trade-off between teens' privacy and online safety, and call for more teen-centric frameworks that can empower teens to self-regulate while using the technology in meaningful ways. We also present an overview of empirical studies that conceptualized and examined resilience-based approaches to promoting the digital well-being of teens in a way to empower teens to be more resilient.

Authors:Shavindra Wickramathilaka, John Grundy, Kashumi Madampe, Omar Haggag
Title: Accessibility Recommendations for Designing Better Mobile Application User Interfaces for Seniors
Abstract:
Seniors represent a growing user base for mobile applications; however, many apps fail to adequately address their accessibility challenges and usability preferences. To investigate this issue, we conducted an exploratory focus group study with 16 senior participants, from which we derived an initial set of user personas highlighting key accessibility and personalisation barriers. These personas informed the development of a model-driven engineering toolset, which was used to generate adaptive mobile app prototypes tailored to seniors' needs. We then conducted a second focus group study with 22 seniors to evaluate these prototypes and validate our findings. Based on insights from both studies, we developed a refined set of personas and a series of accessibility and personalisation recommendations grounded in empirical data, prior research, accessibility standards, and developer resources, aimed at supporting software practitioners in designing more inclusive mobile applications.

Authors:Yao Wang, Jiarong Pan, Danqing Shi, Zhiming Hu, Antti Oulasvirta, Andreas Bulling
Title: ChartOptimiser: Task-driven Optimisation of Chart Designs
Abstract:
Effective chart design is essential for satisfying viewers' information needs, such as retrieving values from a chart or comparing two values. However, creating effective charts is challenging and time-consuming due to the large design space and the inter-dependencies between individual design parameters. To address this challenge, we propose ChartOptimiser -- a Bayesian approach for task-driven optimisation of charts, such as bar charts. At the core of ChartOptimiser is a novel objective function to automatically optimise an eight-dimensional design space combining four perceptual metrics: visual saliency, text legibility, colour preference, and white space ratio. Through empirical evaluation on 12 bar charts and four common analytical tasks -- finding the extreme value, retrieving a value, comparing two values, and computing a derived value -- we show that ChartOptimiser outperforms existing design baselines concerning task-solving ease, visual aesthetics, and chart clarity. We also discuss two practical applications of ChartOptimiser: generating charts for accessibility and content localisation. Taken together, ChartOptimiser opens up an exciting new research direction in automated chart design where charts are optimised for users' information needs, preferences, and contexts.

Authors:Cathy Mengying Fang, Wazeer Zulfikar, Yasith Samaradivakara, Suranga Nanayakkara, Pattie Maes
Title: The Goldilocks Time Window for Proactive Interventions in Wearable AI Systems
Abstract:
As AI systems become increasingly integrated into our daily lives and into wearable form factors, there's a fundamental tension between their potential to proactively assist us and the risk of creating intrusive, dependency-forming experiences. This work proposes the concept of a Goldilocks Time Window -- a contextually adaptive time window for proactive AI systems to deliver effective interventions. We discuss the critical factors that determine the time window, and the need of a framework for designing and evaluating proactive AI systems that can navigate this tension successfully.

Authors:Hanfang Lyu, Xiaoyu Wang, Nandi Zhang, Shuai Ma, Qian Zhu, Yuhan Luo, Fugee Tsung, Xiaojuan Ma
Title: Signaling Human Intentions to Service Robots: Understanding the Use of Social Cues during In-Person Conversations
Abstract:
As social service robots become commonplace, it is essential for them to effectively interpret human signals, such as verbal, gesture, and eye gaze, when people need to focus on their primary tasks to minimize interruptions and distractions. Toward such a socially acceptable Human-Robot Interaction, we conducted a study ($N=24$) in an AR-simulated context of a coffee chat. Participants elicited social cues to signal intentions to an anthropomorphic, zoomorphic, grounded technical, or aerial technical robot waiter when they were speakers or listeners. Our findings reveal common patterns of social cues over intentions, the effects of robot morphology on social cue position and conversational role on social cue complexity, and users' rationale in choosing social cues. We offer insights into understanding social cues concerning perceptions of robots, cognitive load, and social context. Additionally, we discuss design considerations on approaching, social cue recognition, and response strategies for future service robots.

Authors:Yoo Yeon Sung, Hannah Kim, Dan Zhang
Title: VeriLA: A Human-Centered Evaluation Framework for Interpretable Verification of LLM Agent Failures
Abstract:
AI practitioners increasingly use large language model (LLM) agents in compound AI systems to solve complex reasoning tasks, these agent executions often fail to meet human standards, leading to errors that compromise the system's overall performance. Addressing these failures through human intervention is challenging due to the agents' opaque reasoning processes, misalignment with human expectations, the complexity of agent dependencies, and the high cost of manual inspection. This paper thus introduces a human-centered evaluation framework for Verifying LLM Agent failures (VeriLA), which systematically assesses agent failures to reduce human effort and make these agent failures interpretable to humans. The framework first defines clear expectations of each agent by curating human-designed agent criteria. Then, it develops a human-aligned agent verifier module, trained with human gold standards, to assess each agent's execution output. This approach enables granular evaluation of each agent's performance by revealing failures from a human standard, offering clear guidelines for revision, and reducing human cognitive load. Our case study results show that VeriLA is both interpretable and efficient in helping practitioners interact more effectively with the system. By upholding accountability in human-agent collaboration, VeriLA paves the way for more trustworthy and human-aligned compound AI systems.

Authors:Guanxuan Jiang, Shirao Yang, Yuyang Wang, Pan Hui
Title: When Trust Collides: Decoding Human-LLM Cooperation Dynamics through the Prisoner's Dilemma
Abstract:
As large language models (LLMs) become increasingly capable of autonomous decision-making, they introduce new challenges and opportunities for human-AI cooperation in mixed-motive contexts. While prior research has primarily examined AI in assistive or cooperative roles, little is known about how humans interact with AI agents perceived as independent and strategic actors. This study investigates human cooperative attitudes and behaviors toward LLM agents by engaging 30 participants (15 males, 15 females) in repeated Prisoner's Dilemma games with agents differing in declared identity: purported human, rule-based AI, and LLM agent. Behavioral metrics, including cooperation rate, decision latency, unsolicited cooperative acts and trust restoration tolerance, were analyzed to assess the influence of agent identity and participant gender. Results revealed significant effects of declared agent identity on most cooperation-related behaviors, along with notable gender differences in decision latency. Furthermore, qualitative responses suggest that these behavioral differences were shaped by participants interpretations and expectations of the agents. These findings contribute to our understanding of human adaptation in competitive cooperation with autonomous agents and underscore the importance of agent framing in shaping effective and ethical human-AI interaction.

Authors:Musaab H. Hamed-Ahmed, Diego Ramil-López, Paula Fraga-Lamas, Tiago M. Fernández-Caramés
Title: Towards an Emotion-Aware Metaverse: A Human-Centric Shipboard Fire Drill Simulator
Abstract:
Traditional XR and Metaverse applications prioritize user experience (UX) for adoption and success but often overlook a crucial aspect of user interaction: emotions. This article addresses this gap by presenting an emotion-aware Metaverse application: a Virtual Reality (VR) fire drill simulator designed to prepare crews for shipboard emergencies. The simulator detects emotions in real time, assessing trainees responses under stress to improve learning outcomes. Its architecture incorporates eye-tracking and facial expression analysis via Meta Quest Pro headsets. The system features four levels whose difficulty is increased progressively to evaluate user decision-making and emotional resilience. The system was evaluated in two experimental phases. The first phase identified challenges, such as navigation issues and lack of visual guidance. These insights led to an improved second version with a better user interface, visual cues and a real-time task tracker. Performance metrics like completion times, task efficiency and emotional responses were analyzed. The obtained results show that trainees with prior VR or gaming experience navigated the scenarios more efficiently. Moreover, the addition of task-tracking visuals and navigation guidance significantly improved user performance, reducing task completion times between 14.18\% and 32.72\%. Emotional responses were captured, revealing that some participants were engaged, while others acted indifferently, indicating the need for more immersive elements. Overall, this article provides useful guidelines for creating the next generation of emotion-aware Metaverse applications.

Authors:Dingdong Wang, Jin Xu, Ruihang Chu, Zhifang Guo, Xiong Wang, Jincenzi Wu, Dongchao Yang, Shengpeng Ji, Junyang Lin
Title: InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
Abstract:
Recent advancements in speech large language models (SpeechLLMs) have attracted considerable attention. Nonetheless, current methods exhibit suboptimal performance in adhering to speech instructions. Notably, the intelligence of models significantly diminishes when processing speech-form input as compared to direct text-form input. Prior work has attempted to mitigate this semantic inconsistency between speech and text representations through techniques such as representation and behavior alignment, which involve the meticulous design of data pairs during the post-training phase. In this paper, we introduce a simple and scalable training method called InSerter, which stands for Interleaved Speech-Text Representation Pre-training. InSerter is designed to pre-train large-scale unsupervised speech-text sequences, where the speech is synthesized from randomly selected segments of an extensive text corpus using text-to-speech conversion. Consequently, the model acquires the ability to generate textual continuations corresponding to the provided speech segments, obviating the need for intensive data design endeavors. To systematically evaluate speech instruction-following capabilities, we introduce SpeechInstructBench, the first comprehensive benchmark specifically designed for speech-oriented instruction-following tasks. Our proposed InSerter achieves SOTA performance in SpeechInstructBench and demonstrates superior or competitive results across diverse speech processing tasks.

Authors:Xinyan Yu, Marius Hoggenmueller, Tram Thi Minh Tran, Yiyuan Wang, Qiuming Zhang, Martin Tomitsch
Title: Peek into the `White-Box': A Field Study on Bystander Engagement with Urban Robot Uncertainty
Abstract:
Uncertainty inherently exists in the autonomous decision-making process of robots. Involving humans in resolving this uncertainty not only helps robots mitigate it but is also crucial for improving human-robot interactions. However, in public urban spaces filled with unpredictability, robots often face heightened uncertainty without direct human collaborators. This study investigates how robots can engage bystanders for assistance in public spaces when encountering uncertainty and examines how these interactions impact bystanders' perceptions and attitudes towards robots. We designed and tested a speculative `peephole' concept that engages bystanders in resolving urban robot uncertainty. Our design is guided by considerations of non-intrusiveness and eliciting initiative in an implicit manner, considering bystanders' unique role as non-obligated participants in relation to urban robots. Drawing from field study findings, we highlight the potential of involving bystanders to mitigate urban robots' technological imperfections to both address operational challenges and foster public acceptance of urban robots. Furthermore, we offer design implications to encourage bystanders' involvement in mitigating the imperfections.

Authors:Julian Rosenberger, Sophie Kuhlemann, Verena Tiefenbeck, Mathias Kraus, Patrick Zschech
Title: The Impact of Transparency in AI Systems on Users' Data-Sharing Intentions: A Scenario-Based Experiment
Abstract:
Artificial Intelligence (AI) systems are frequently employed in online services to provide personalized experiences to users based on large collections of data. However, AI systems can be designed in different ways, with black-box AI systems appearing as complex data-processing engines and white-box AI systems appearing as fully transparent data-processors. As such, it is reasonable to assume that these different design choices also affect user perception and thus their willingness to share data. To this end, we conducted a pre-registered, scenario-based online experiment with 240 participants and investigated how transparent and non-transparent data-processing entities influenced data-sharing intentions. Surprisingly, our results revealed no significant difference in willingness to share data across entities, challenging the notion that transparency increases data-sharing willingness. Furthermore, we found that a general attitude of trust towards AI has a significant positive influence, especially in the transparent AI condition, whereas privacy concerns did not significantly affect data-sharing decisions.

Authors:Shuai Ma, Junling Wang, Yuanhao Zhang, Xiaojuan Ma, April Yi Wang
Title: DBox: Scaffolding Algorithmic Programming Learning through Learner-LLM Co-Decomposition
Abstract:
Decomposition is a fundamental skill in algorithmic programming, requiring learners to break down complex problems into smaller, manageable parts. However, current self-study methods, such as browsing reference solutions or using LLM assistants, often provide excessive or generic assistance that misaligns with learners' decomposition strategies, hindering independent problem-solving and critical thinking. To address this, we introduce Decomposition Box (DBox), an interactive LLM-based system that scaffolds and adapts to learners' personalized construction of a step tree through a "learner-LLM co-decomposition" approach, providing tailored support at an appropriate level. A within-subjects study (N=24) found that compared to the baseline, DBox significantly improved learning gains, cognitive engagement, and critical thinking. Learners also reported a stronger sense of achievement and found the assistance appropriate and helpful for learning. Additionally, we examined DBox's impact on cognitive load, identified usage patterns, and analyzed learners' strategies for managing system errors. We conclude with design implications for future AI-powered tools to better support algorithmic programming education.

Authors:Shavindra Wickramathilaka, John Grundy, Kashumi Madampe, Omar Haggag
Title: Adaptive and Accessible User Interfaces for Seniors Through Model-Driven Engineering
Abstract:
The use of diverse mobile applications among senior users is becoming increasingly widespread. However, many of these apps contain accessibility problems that result in negative user experiences for seniors. A key reason is that software practitioners often lack the time or resources to address the broad spectrum of age-related accessibility and personalisation needs. As current developer tools and practices encourage one-size-fits-all interfaces with limited potential to address the diversity of senior needs, there is a growing demand for approaches that support the systematic creation of adaptive, accessible app experiences. To this end, we present AdaptForge, a novel model-driven engineering (MDE) approach that enables advanced design-time adaptations of mobile application interfaces and behaviours tailored to the accessibility needs of senior users. AdaptForge uses two domain-specific languages (DSLs) to address age-related accessibility needs. The first model defines users' context-of-use parameters, while the second defines conditional accessibility scenarios and corresponding UI adaptation rules. These rules are interpreted by an MDE workflow to transform an app's original source code into personalised instances. We also report evaluations with professional software developers and senior end-users, demonstrating the feasibility and practical utility of AdaptForge.

Authors:Hang Wang, Qiaoyi Fang, Junshan Zhang
Title: Heterogeneous Decision Making in Mixed Traffic: Uncertainty-aware Planning and Bounded Rationality
Abstract:
The past few years have witnessed a rapid growth of the deployment of automated vehicles (AVs). Clearly, AVs and human-driven vehicles (HVs) will co-exist for many years, and AVs will have to operate around HVs, pedestrians, cyclists, and more, calling for fundamental breakthroughs in AI designed for mixed traffic to achieve mixed autonomy. Thus motivated, we study heterogeneous decision making by AVs and HVs in a mixed traffic environment, aiming to capture the interactions between human and machine decision-making and develop an AI foundation that enables vehicles to operate safely and efficiently. There are a number of challenges to achieve mixed autonomy, including 1) humans drivers make driving decisions with bounded rationality, and it remains open to develop accurate models for HVs' decision making; and 2) uncertainty-aware planning plays a critical role for AVs to take safety maneuvers in response to the human behavior. In this paper, we introduce a formulation of AV-HV interaction, where the HV makes decisions with bounded rationality and the AV employs uncertainty-aware planning based on the prediction on HV's future actions. We conduct a comprehensive analysis on AV and HV's learning regret to answer the questions: 1) {How does the learning performance depend on HV's bounded rationality and AV's planning}; 2) {How do different decision making strategies impact the overall learning performance}? Our findings reveal some intriguing phenomena, such as Goodhart's Law in AV's learning performance and compounding effects in HV's decision making process. By examining the dynamics of the regrets, we gain insights into the interplay between human and machine decision making.

Authors:Jiangrong Shen, Qi Xu, Gang Pan, Badong Chen
Title: Improving the Sparse Structure Learning of Spiking Neural Networks from the View of Compression Efficiency
Abstract:
The human brain utilizes spikes for information transmission and dynamically reorganizes its network structure to boost energy efficiency and cognitive capabilities throughout its lifespan. Drawing inspiration from this spike-based computation, Spiking Neural Networks (SNNs) have been developed to construct event-driven models that emulate this efficiency. Despite these advances, deep SNNs continue to suffer from over-parameterization during training and inference, a stark contrast to the brain's ability to self-organize. Furthermore, existing sparse SNNs are challenged by maintaining optimal pruning levels due to a static pruning ratio, resulting in either under- or over-pruning. In this paper, we propose a novel two-stage dynamic structure learning approach for deep SNNs, aimed at maintaining effective sparse training from scratch while optimizing compression efficiency. The first stage evaluates the compressibility of existing sparse subnetworks within SNNs using the PQ index, which facilitates an adaptive determination of the rewiring ratio for synaptic connections based on data compression insights. In the second stage, this rewiring ratio critically informs the dynamic synaptic connection rewiring process, including both pruning and regrowth. This approach significantly improves the exploration of sparse structure training in deep SNNs, adapting sparsity dynamically from the point view of compression efficiency. Our experiments demonstrate that this sparse training approach not only aligns with the performance of current deep SNNs models but also significantly improves the efficiency of compressing sparse SNNs. Crucially, it preserves the advantages of initiating training with sparse models and offers a promising solution for implementing edge AI on neuromorphic hardware.

Authors:Yuki Tatsukawa, I-Chao Shen, Mustafa Doga Dogan, Anran Qi, Yuki Koyama, Ariel Shamir, Takeo Igarashi
Title: FontCraft: Multimodal Font Design Using Interactive Bayesian Optimization
Abstract:
Creating new fonts requires a lot of human effort and professional typographic knowledge. Despite the rapid advancements of automatic font generation models, existing methods require users to prepare pre-designed characters with target styles using font-editing software, which poses a problem for non-expert users. To address this limitation, we propose FontCraft, a system that enables font generation without relying on pre-designed characters. Our approach integrates the exploration of a font-style latent space with human-in-the-loop preferential Bayesian optimization and multimodal references, facilitating efficient exploration and enhancing user control. Moreover, FontCraft allows users to revisit previous designs, retracting their earlier choices in the preferential Bayesian optimization process. Once users finish editing the style of a selected character, they can propagate it to the remaining characters and further refine them as needed. The system then generates a complete outline font in OpenType format. We evaluated the effectiveness of FontCraft through a user study comparing it to a baseline interface. Results from both quantitative and qualitative evaluations demonstrate that FontCraft enables non-expert users to design fonts efficiently.

Authors:Ying Lei, Shuai Ma, Yuling Sun, Xiaojuan Ma
Title: "AI Afterlife" as Digital Legacy: Perceptions, Expectations, and Concerns
Abstract:
The rise of generative AI technology has sparked interest in using digital information to create AI-generated agents as digital legacy. These agents, often referred to as "AI Afterlives", present unique challenges compared to traditional digital legacy. Yet, there is limited human-centered research on "AI Afterlife" as digital legacy, especially from the perspectives of the individuals being represented by these agents. This paper presents a qualitative study examining users' perceptions, expectations, and concerns regarding AI-generated agents as digital legacy. We identify factors shaping people's attitudes, their perceived differences compared with the traditional digital legacy, and concerns they might have in real practices. We also examine the design aspects throughout the life cycle and interaction process. Based on these findings, we situate "AI Afterlife" in digital legacy, and delve into design implications for maintaining identity consistency and balancing intrusiveness and support in "AI Afterlife" as digital legacy.

Authors:Liuqing Chen, Yaxuan Song, Chunyuan Zheng, Qianzhi Jing, Preben Hansen, Lingyun Sun
Title: Understanding Design Fixation in Generative AI
Abstract:
Generative AI (GenAI) provides new opportunities for creativity support, but the phenomenon of GenAI design fixation remains underexplored. While human design fixation typically constrains ideas to familiar or existing solutions, our findings reveal that GenAI similarly experience design fixation, limiting its ability to generate novel and diverse design outcomes. To advance understanding of GenAI design fixation, we propose a theoretical framework includes the definition, causes, manifestations, and impacts of GenAI design fixation for creative design. We also conducted an experimental study to investigate the characteristics of GenAI design fixation in practice. We summarize how GenAI design fixation manifests in text generation model and image generation model respectively. Furthermore, we propose methods for mitigating GenAI design fixation for future creativity support tool design. We recommend adopting the lens of GenAI design fixation for creativity-oriented HCI research, as the unique perspectives and insights it provides.

Authors:Yuli Wu, Henning Konermann, Emil Mededovic, Peter Walter, Johannes Stegmaier
Title: Evaluating Cross-Subject and Cross-Device Consistency in Visual Fixation Prediction
Abstract:
Understanding cross-subject and cross-device consistency in visual fixation prediction is essential for advancing eye-tracking applications, including visual attention modeling and neuroprosthetics. This study evaluates fixation consistency using an embedded eye tracker integrated into regular-sized glasses, comparing its performance with high-end standalone eye-tracking systems. Nine participants viewed 300 images from the MIT1003 dataset in subjective experiments, allowing us to analyze cross-device and cross-subject variations in fixation patterns with various evaluation metrics. Our findings indicate that average visual fixations can be reliably transferred across devices for relatively simple stimuli. However, individual-to-average consistency remains weak, highlighting the challenges of predicting individual fixations across devices. These results provide an empirical foundation for leveraging predicted average visual fixation data to enhance neuroprosthetic applications.

Authors:Yuanhao Zhang, Yumeng Wang, Xiyuan Wang, Changyang He, Chenliang Huang, Xiaojuan Ma
Title: CoKnowledge: Supporting Assimilation of Time-synced Collective Knowledge in Online Science Videos
Abstract:
Danmaku, a system of scene-aligned, time-synced, floating comments, can augment video content to create 'collective knowledge'. However, its chaotic nature often hinders viewers from effectively assimilating the collective knowledge, especially in knowledge-intensive science videos. With a formative study, we examined viewers' practices for processing collective knowledge and the specific barriers they encountered. Building on these insights, we designed a processing pipeline to filter, classify, and cluster danmaku, leading to the development of CoKnowledge - a tool incorporating a video abstract, knowledge graphs, and supplementary danmaku features to support viewers' assimilation of collective knowledge in science videos. A within-subject study (N=24) showed that CoKnowledge significantly enhanced participants' comprehension and recall of collective knowledge compared to a baseline with unprocessed live comments. Based on our analysis of user interaction patterns and feedback on design features, we presented design considerations for developing similar support tools.

Authors:Weisi Yang, Shinan Liu, Feng Xiao, Nick Feamster, Stephen Xia
Title: Towards Scalable Defenses against Intimate Partner Infiltrations
Abstract:
Intimate Partner Infiltration (IPI)--a type of Intimate Partner Violence (IPV) that typically requires physical access to a victim's device--is a pervasive concern around the world, often manifesting through digital surveillance, control, and monitoring. Unlike conventional cyberattacks, IPI perpetrators leverage close proximity and personal knowledge to circumvent standard protections, underscoring the need for targeted interventions. While security clinics and other human-centered approaches effectively tailor solutions for victims, their scalability remains constrained by resource limitations and the need for specialized counseling. We present AID, an Automated IPI Detection system that continuously monitors for unauthorized access and suspicious behaviors on smartphones. AID employs a unified architecture to process multimodal signals stealthily and preserve user privacy. A brief calibration phase upon installation enables AID to adapt to each user's behavioral patterns, achieving high accuracy with minimal false alarms. Our 27-participant user study demonstrates that AID achieves highly accurate detection of non-owner access and fine-grained IPI-related activities, attaining a false positive rate of 1.6%, which is 11x lower than existing methods, and an end-to-end F1 score of 0.981. These findings suggest that AID can serve as a forensic tool that security clinics can deploy to scale their ability to identify IPI tactics and deliver personalized, far-reaching support to survivors.

Authors:Cathy Mengying Fang, Yasith Samaradivakara, Pattie Maes, Suranga Nanayakkara
Title: Mirai: A Wearable Proactive AI "Inner-Voice" for Contextual Nudging
Abstract:
People often find it difficult to turn their intentions into real actions -- a challenge that affects both personal growth and mental well-being. While established methods like cognitive-behavioral therapy and mindfulness training help people become more aware of their behaviors and set clear goals, these approaches cannot provide immediate guidance when people fall into automatic reactions or habits. We introduce Mirai, a novel wearable AI system with an integrated camera, real-time speech processing, and personalized voice-cloning to provide proactive and contextual nudges for positive behavior change. Mirai continuously monitors and analyzes the user's environment to anticipate their intentions, generating contextually-appropriate responses delivered in the user's own cloned voice. We demonstrate the application of Mirai through three scenarios focusing on dietary choices, work productivity, and communication skills. We also discuss future work on improving the proactive agent via human feedback and the need for a longitudinal study in naturalistic settings.

Authors:Abdul Rehman, Ilona Heldal, Diana Stilwell, Jerry Chun-Wei Lin
Title: Towards a Supporting Framework for Neuro-Developmental Disorder: Considering Artificial Intelligence, Serious Games and Eye Tracking
Abstract:
This paper focuses on developing a framework for uncovering insights about NDD children's performance (e.g., raw gaze cluster analysis, duration analysis \& area of interest for sustained attention, stimuli expectancy, loss of focus/motivation, inhibitory control) and informing their teachers. The hypothesis behind this work is that self-adaptation of games can contribute to improving students' well-being and performance by suggesting personalized activities (e.g., highlighting stimuli to increase attention or choosing a difficulty level that matches students' abilities). The aim is to examine how AI can be used to help solve this problem. The results would not only contribute to a better understanding of the problems of NDD children and their teachers but also help psychologists to validate the results against their clinical knowledge, improve communication with patients and identify areas for further investigation, e.g., by explaining the decision made and preserving the children's private data in the learning process.

Authors:Abdul Rehman, Ilona Heldal, Jerry Chun-Wei Lin
Title: SSRepL-ADHD: Adaptive Complex Representation Learning Framework for ADHD Detection from Visual Attention Tasks
Abstract:
Self Supervised Representation Learning (SSRepL) can capture meaningful and robust representations of the Attention Deficit Hyperactivity Disorder (ADHD) data and have the potential to improve the model's performance on also downstream different types of Neurodevelopmental disorder (NDD) detection. In this paper, a novel SSRepL and Transfer Learning (TL)-based framework that incorporates a Long Short-Term Memory (LSTM) and a Gated Recurrent Units (GRU) model is proposed to detect children with potential symptoms of ADHD. This model uses Electroencephalogram (EEG) signals extracted during visual attention tasks to accurately detect ADHD by preprocessing EEG signal quality through normalization, filtering, and data balancing. For the experimental analysis, we use three different models: 1) SSRepL and TL-based LSTM-GRU model named as SSRepL-ADHD, which integrates LSTM and GRU layers to capture temporal dependencies in the data, 2) lightweight SSRepL-based DNN model (LSSRepL-DNN), and 3) Random Forest (RF). In the study, these models are thoroughly evaluated using well-known performance metrics (i.e., accuracy, precision, recall, and F1-score). The results show that the proposed SSRepL-ADHD model achieves the maximum accuracy of 81.11% while admitting the difficulties associated with dataset imbalance and feature selection.

Authors:Xiaowei Jiang, Charles Zhou, Yiqun Duan, Ziyi Zhao, Thomas Do, Chin-Teng Lin
Title: Neural Spelling: A Spell-Based BCI System for Language Neural Decoding
Abstract:
Brain-computer interfaces (BCIs) present a promising avenue by translating neural activity directly into text, eliminating the need for physical actions. However, existing non-invasive BCI systems have not successfully covered the entire alphabet, limiting their practicality. In this paper, we propose a novel non-invasive EEG-based BCI system with Curriculum-based Neural Spelling Framework, which recognizes all 26 alphabet letters by decoding neural signals associated with handwriting first, and then apply a Generative AI (GenAI) to enhance spell-based neural language decoding tasks. Our approach combines the ease of handwriting with the accessibility of EEG technology, utilizing advanced neural decoding algorithms and pre-trained large language models (LLMs) to translate EEG patterns into text with high accuracy. This system show how GenAI can improve the performance of typical spelling-based neural language decoding task, and addresses the limitations of previous methods, offering a scalable and user-friendly solution for individuals with communication impairments, thereby enhancing inclusive communication options.

Authors:Yun-Shiuan Chuang, Sameer Narendran, Nikunj Harlalka, Alexander Cheung, Sizhe Gao, Siddharth Suresh, Junjie Hu, Timothy T. Rogers
Title: Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding
Abstract:
Guesstimation -- the task of making approximate quantitative estimates about objects or events -- is a common real-world skill, yet remains underexplored in large language model (LLM) research. We introduce three guesstimation datasets: MARBLES, FUTURE, and ELECPRED, spanning physical estimation (e.g., how many marbles fit in a cup) to abstract predictions (e.g., the 2024 U.S. presidential election). Inspired by the social science concept of Wisdom of Crowds (WOC)- where the median of multiple estimates improves accuracy-we propose WOC decoding for LLMs. We replicate WOC effects in human participants and find that LLMs exhibit similar benefits: median aggregation across sampled responses consistently improves accuracy over greedy decoding, self-consistency decoding, and mean decoding. This suggests that LLMs encode a world model that supports approximate reasoning. Our results position guesstimation as a useful probe of LLM world knowledge and highlight WOC decoding as a strategy for enhancing LLM guesstimation performance on real-world tasks.

Authors:Ian Drosos, Advait Sarkar, Xiaotong, Xu, Neil Toronto
Title: "It makes you think": Provocations Help Restore Critical Thinking to AI-Assisted Knowledge Work
Abstract:
Recent research suggests that the use of Generative AI tools may result in diminished critical thinking during knowledge work. We study the effect on knowledge work of provocations: brief textual prompts that offer critiques for and propose alternatives to AI suggestions. We conduct a between-subjects study (n=24) in which participants completed AI-assisted shortlisting tasks with and without provocations. We find that provocations can induce critical and metacognitive thinking. We derive five dimensions that impact the user experience of provocations: task urgency, task importance, user expertise, provocation actionability, and user responsibility. We connect our findings to related work on design frictions, microboundaries, and distributed cognition. We draw design implications for critical thinking interventions in AI-assisted knowledge work.

Authors:Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, Wanjun Zhong, Kuanye Li, Jiale Yang, Yu Miao, Woyu Lin, Longxiang Liu, Xu Jiang, Qianli Ma, Jingyu Li, Xiaojun Xiao, Kai Cai, Chuang Li, Yaowei Zheng, Chaolin Jin, Chen Li, Xiao Zhou, Minchao Wang, Haoli Chen, Zhaojian Li, Haihua Yang, Haifeng Liu, Feng Lin, Tao Peng, Xin Liu, Guang Shi
Title: UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Abstract:
This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wrapped commercial models (e.g., GPT-4o) with expert-crafted prompts and workflows, UI-TARS is an end-to-end model that outperforms these sophisticated frameworks. Experiments demonstrate its superior performance: UI-TARS achieves SOTA performance in 10+ GUI agent benchmarks evaluating perception, grounding, and GUI task execution. Notably, in the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude (22.0 and 14.9 respectively). In AndroidWorld, UI-TARS achieves 46.6, surpassing GPT-4o (34.5). UI-TARS incorporates several key innovations: (1) Enhanced Perception: leveraging a large-scale dataset of GUI screenshots for context-aware understanding of UI elements and precise captioning; (2) Unified Action Modeling, which standardizes actions into a unified space across platforms and achieves precise grounding and interaction through large-scale action traces; (3) System-2 Reasoning, which incorporates deliberate reasoning into multi-step decision making, involving multiple reasoning patterns such as task decomposition, reflection thinking, milestone recognition, etc. (4) Iterative Training with Reflective Online Traces, which addresses the data bottleneck by automatically collecting, filtering, and reflectively refining new interaction traces on hundreds of virtual machines. Through iterative training and reflection tuning, UI-TARS continuously learns from its mistakes and adapts to unforeseen situations with minimal human intervention. We also analyze the evolution path of GUI agents to guide the further development of this domain.

Authors:Johannes Kirmayr, Lukas Stappen, Phillip Schneider, Florian Matthes, Elisabeth André
Title: CarMem: Enhancing Long-Term Memory in LLM Voice Assistants through Category-Bounding
Abstract:
In today's assistant landscape, personalisation enhances interactions, fosters long-term relationships, and deepens engagement. However, many systems struggle with retaining user preferences, leading to repetitive user requests and disengagement. Furthermore, the unregulated and opaque extraction of user preferences in industry applications raises significant concerns about privacy and trust, especially in regions with stringent regulations like Europe. In response to these challenges, we propose a long-term memory system for voice assistants, structured around predefined categories. This approach leverages Large Language Models to efficiently extract, store, and retrieve preferences within these categories, ensuring both personalisation and transparency. We also introduce a synthetic multi-turn, multi-session conversation dataset (CarMem), grounded in real industry data, tailored to an in-car voice assistant setting. Benchmarked on the dataset, our system achieves an F1-score of .78 to .95 in preference extraction, depending on category granularity. Our maintenance strategy reduces redundant preferences by 95% and contradictory ones by 92%, while the accuracy of optimal retrieval is at .87. Collectively, the results demonstrate the system's suitability for industrial applications.

Authors:Lucas Joos, Maximilian T. Fischer, Julius Rauscher, Daniel A. Keim, Tim Dwyer, Falk Schreiber, Karsten Klein
Title: Visual Network Analysis in Immersive Environments: A Survey
Abstract:
The increasing complexity and volume of network data demand effective analysis approaches, with visual exploration proving particularly beneficial. Immersive technologies, such as augmented reality, virtual reality, and large display walls, have enabled the emerging field of immersive analytics, offering new opportunities to enhance user engagement, spatial awareness, and problem-solving. A growing body of work has explored immersive environments for network visualisation, ranging from design studies to fully integrated applications across various domains. Despite these advancements, the field remains fragmented, lacking a clear description of the design space and a structured overview of the aspects that have already been empirically evaluated. To address this gap, we present a survey of visual network analysis in immersive environments, covering 138 publications retrieved through a structured pipeline. We systematically analyse the key aspects that define the design space, investigate their coverage in prior applications (n=87), and review user evaluations (n=59) that provide empirical evidence for essential design-related questions. By synthesising experimental findings and evaluating existing applications, we identify key achievements, highlight research gaps, and offer guidance for the design of future approaches. Additionally, we provide an online resource to explore our results interactively, which will be updated as new developments emerge.

Authors:Qiang Qu, Yiran Shen, Xiaoming Chen, Yuk Ying Chung, Weidong Cai, Tongliang Liu
Title: NVS-SQA: Exploring Self-Supervised Quality Representation Learning for Neurally Synthesized Scenes without References
Abstract:
Neural View Synthesis (NVS), such as NeRF and 3D Gaussian Splatting, effectively creates photorealistic scenes from sparse viewpoints, typically evaluated by quality assessment methods like PSNR, SSIM, and LPIPS. However, these full-reference methods, which compare synthesized views to reference views, may not fully capture the perceptual quality of neurally synthesized scenes (NSS), particularly due to the limited availability of dense reference views. Furthermore, the challenges in acquiring human perceptual labels hinder the creation of extensive labeled datasets, risking model overfitting and reduced generalizability. To address these issues, we propose NVS-SQA, a NSS quality assessment method to learn no-reference quality representations through self-supervision without reliance on human labels. Traditional self-supervised learning predominantly relies on the "same instance, similar representation" assumption and extensive datasets. However, given that these conditions do not apply in NSS quality assessment, we employ heuristic cues and quality scores as learning objectives, along with a specialized contrastive pair preparation process to improve the effectiveness and efficiency of learning. The results show that NVS-SQA outperforms 17 no-reference methods by a large margin (i.e., on average 109.5% in SRCC, 98.6% in PLCC, and 91.5% in KRCC over the second best) and even exceeds 16 full-reference methods across all evaluation metrics (i.e., 22.9% in SRCC, 19.1% in PLCC, and 18.6% in KRCC over the second best).

Authors:Xuewen Luo, Fan Ding, Ruiqi Chen, Rishikesh Panda, Junnyong Loo, Shuyun Zhang
Title: "What's Happening"- A Human-centered Multimodal Interpreter Explaining the Actions of Autonomous Vehicles
Abstract:
Public distrust of self-driving cars is growing. Studies emphasize the need for interpreting the behavior of these vehicles to passengers to promote trust in autonomous systems. Interpreters can enhance trust by improving transparency and reducing perceived risk. However, current solutions often lack a human-centric approach to integrating multimodal interpretations. This paper introduces a novel Human-centered Multimodal Interpreter (HMI) system that leverages human preferences to provide visual, textual, and auditory feedback. The system combines a visual interface with Bird's Eye View (BEV), map, and text display, along with voice interaction using a fine-tuned large language model (LLM). Our user study, involving diverse participants, demonstrated that the HMI system significantly boosts passenger trust in AVs, increasing average trust levels by over 8%, with trust in ordinary environments rising by up to 30%. These results underscore the potential of the HMI system to improve the acceptance and reliability of autonomous vehicles by providing clear, real-time, and context-sensitive explanations of vehicle actions.

Authors:Adithya Chittem, Aishna Shrivastava, Sai Tarun Pendela, Jagat Sesh Challa, Dhruv Kumar
Title: SAC: A Framework for Measuring and Inducing Personality Traits in LLMs with Dynamic Intensity Control
Abstract:
Large language models (LLMs) have gained significant traction across a wide range of fields in recent years. There is also a growing expectation for them to display human-like personalities during interactions. To meet this expectation, numerous studies have proposed methods for modelling LLM personalities through psychometric evaluations. However, most existing models face two major limitations: they rely on the Big Five (OCEAN) framework, which only provides coarse personality dimensions, and they lack mechanisms for controlling trait intensity. In this paper, we address this gap by extending the Machine Personality Inventory (MPI), which originally used the Big Five model, to incorporate the 16 Personality Factor (16PF) model, allowing expressive control over sixteen distinct traits. We also developed a structured framework known as Specific Attribute Control (SAC) for evaluating and dynamically inducing trait intensity in LLMs. Our method introduces adjective-based semantic anchoring to guide trait intensity expression and leverages behavioural questions across five intensity factors: \textit{Frequency}, \textit{Depth}, \textit{Threshold}, \textit{Effort}, and \textit{Willingness}. Through experimentation, we find that modelling intensity as a continuous spectrum yields substantially more consistent and controllable personality expression compared to binary trait toggling. Moreover, we observe that changes in target trait intensity systematically influence closely related traits in psychologically coherent directions, suggesting that LLMs internalize multi-dimensional personality structures rather than treating traits in isolation. Our work opens new pathways for controlled and nuanced human-machine interactions in domains such as healthcare, education, and interviewing processes, bringing us one step closer to truly human-like social machines.

Authors:Jiaxin Pei, Dustin Wright, Isabelle Augenstein, David Jurgens
Title: Modeling Public Perceptions of Science in Media
Abstract:
Effectively engaging the public with science is vital for fostering trust and understanding in our scientific community. Yet, with an ever-growing volume of information, science communicators struggle to anticipate how audiences will perceive and interact with scientific news. In this paper, we introduce a computational framework that models public perception across twelve dimensions, such as newsworthiness, importance, and surprisingness. Using this framework, we create a large-scale science news perception dataset with 10,489 annotations from 2,101 participants from diverse US and UK populations, providing valuable insights into public responses to scientific information across domains. We further develop NLP models that predict public perception scores with a strong performance. Leveraging the dataset and model, we examine public perception of science from two perspectives: (1) Perception as an outcome: What factors affect the public perception of scientific information? (2) Perception as a predictor: Can we use the estimated perceptions to predict public engagement with science? We find that individuals' frequency of science news consumption is the driver of perception, whereas demographic factors exert minimal influence. More importantly, through a large-scale analysis and carefully designed natural experiment on Reddit, we demonstrate that the estimated public perception of scientific information has direct connections with the final engagement pattern. Posts with more positive perception scores receive significantly more comments and upvotes, which is consistent across different scientific information and for the same science, but are framed differently. Overall, this research underscores the importance of nuanced perception modeling in science communication, offering new pathways to predict public interest and engagement with scientific content.

Authors:Sophie Chiang, Guy Laban, Hatice Gunes
Title: Do We Talk to Robots Like Therapists, and Do They Respond Accordingly? Language Alignment in AI Emotional Support
Abstract:
As conversational agents increasingly engage in emotionally supportive dialogue, it is important to understand how closely their interactions resemble those in traditional therapy settings. This study investigates whether the concerns shared with a robot align with those shared in human-to-human (H2H) therapy sessions, and whether robot responses semantically mirror those of human therapists. We analyzed two datasets: one of interactions between users and professional therapists (Hugging Face's NLP Mental Health Conversations), and another involving supportive conversations with a social robot (QTrobot from LuxAI) powered by a large language model (LLM, GPT-3.5). Using sentence embeddings and K-means clustering, we assessed cross-agent thematic alignment by applying a distance-based cluster-fitting method that evaluates whether responses from one agent type map to clusters derived from the other, and validated it using Euclidean distances. Results showed that 90.88% of robot conversation disclosures could be mapped to clusters from the human therapy dataset, suggesting shared topical structure. For matched clusters, we compared the subjects as well as therapist and robot responses using Transformer, Word2Vec, and BERT embeddings, revealing strong semantic overlap in subjects' disclosures in both datasets, as well as in the responses given to similar human disclosure themes across agent types (robot vs. human therapist). These findings highlight both the parallels and boundaries of robot-led support conversations and their potential for augmenting mental health interventions.

Authors:Andela Ilic, Jiaxi Jiang, Paul Streli, Xintong Liu, Christian Holz
Title: Human Motion Capture from Loose and Sparse Inertial Sensors with Garment-aware Diffusion Models
Abstract:
Motion capture using sparse inertial sensors has shown great promise due to its portability and lack of occlusion issues compared to camera-based tracking. Existing approaches typically assume that IMU sensors are tightly attached to the human body. However, this assumption often does not hold in real-world scenarios. In this paper, we present Garment Inertial Poser (GaIP), a method for estimating full-body poses from sparse and loosely attached IMU sensors. We first simulate IMU recordings using an existing garment-aware human motion dataset. Our transformer-based diffusion models synthesize loose IMU data and estimate human poses from this challenging loose IMU data. We also demonstrate that incorporating garment-related parameters during training on loose IMU data effectively maintains expressiveness and enhances the ability to capture variations introduced by looser or tighter garments. Our experiments show that our diffusion methods trained on simulated and synthetic data outperform state-of-the-art inertial full-body pose estimators, both quantitatively and qualitatively, opening up a promising direction for future research on motion capture from such realistic sensor placements.

Authors:Brihi Joshi, Keyu He, Sahana Ramnath, Sadra Sabouri, Kaitlyn Zhou, Souti Chattopadhyay, Swabha Swayamdipta, Xiang Ren
Title: ELI-Why: Evaluating the Pedagogical Utility of Language Model Explanations
Abstract:
Language models today are widely used in education, yet their ability to tailor responses for learners with varied informational needs and knowledge backgrounds remains under-explored. To this end, we introduce ELI-Why, a benchmark of 13.4K "Why" questions to evaluate the pedagogical capabilities of language models. We then conduct two extensive human studies to assess the utility of language model-generated explanatory answers (explanations) on our benchmark, tailored to three distinct educational grades: elementary, high-school and graduate school. In our first study, human raters assume the role of an "educator" to assess model explanations' fit to different educational grades. We find that GPT-4-generated explanations match their intended educational background only 50% of the time, compared to 79% for lay human-curated explanations. In our second study, human raters assume the role of a learner to assess if an explanation fits their own informational needs. Across all educational backgrounds, users deemed GPT-4-generated explanations 20% less suited on average to their informational needs, when compared to explanations curated by lay people. Additionally, automated evaluation metrics reveal that explanations generated across different language model families for different informational needs remain indistinguishable in their grade-level, limiting their pedagogical effectiveness.

Authors:Roni Lekar, Tatiana Gerth, Sergey Prokudin, Matthias Seibold, Reto Bürgin, Benjamin Vella, Armando Hoch, Siyu Tang, Philipp Fürnstahl, Helmut Grabner
Title: Enhancing Orthopedic Surgical Training With Interactive Photorealistic 3D Visualization
Abstract:
Surgical training integrates several years of didactic learning, simulation, mentorship, and hands-on experience. Challenges include stress, technical demands, and new technologies. Orthopedic education often uses static materials like books, images, and videos, lacking interactivity. This study compares a new interactive photorealistic 3D visualization to 2D videos for learning total hip arthroplasty. In a randomized controlled trial, participants (students and residents) were evaluated on spatial awareness, tool placement, and task times in a simulation. Results show that interactive photorealistic 3D visualization significantly improved scores, with residents and those with prior 3D experience performing better. These results emphasize the potential of the interactive photorealistic 3D visualization to enhance orthopedic training.

Authors:Renato Cordeiro Ferreira, Renata Santos Miranda, Alfredo Goldman
Title: The Journey of CodeLab: How University Hackathons Built a Community of Engaged Students
Abstract:
This paper presents the journey of CodeLab: a student-organized initiative from the University of São Paulo that has grown thanks to university hackathons. It summarizes patterns, challenges, and lessons learned over 15 competitions organized by the group from 2015 to 2020. By describing these experiences, this report aims to help CodeLab to resume its events after the COVID-19 pandemic, and foster similar initiatives around the world.

Authors:Yichi Zhang, Jinlong Pang, Zhaowei Zhu, Yang Liu
Title: Evaluating LLM-corrupted Crowdsourcing Data Without Ground Truth
Abstract:
The recent success of generative AI highlights the crucial role of high-quality human feedback in building trustworthy AI systems. However, the increasing use of large language models (LLMs) by crowdsourcing workers poses a significant challenge: datasets intended to reflect human input may be compromised by LLM-generated responses. Existing LLM detection approaches often rely on high-dimension training data such as text, making them unsuitable for annotation tasks like multiple-choice labeling. In this work, we investigate the potential of peer prediction -- a mechanism that evaluates the information within workers' responses without using ground truth -- to mitigate LLM-assisted cheating in crowdsourcing with a focus on annotation tasks. Our approach quantifies the correlations between worker answers while conditioning on (a subset of) LLM-generated labels available to the requester. Building on prior research, we propose a training-free scoring mechanism with theoretical guarantees under a crowdsourcing model that accounts for LLM collusion. We establish conditions under which our method is effective and empirically demonstrate its robustness in detecting low-effort cheating on real-world crowdsourcing datasets.

Authors:Luca-Maxim Meinhardt, Simon Demharter, Michael Rietzler, Mark Colley, Thomas Eßmeyer, Enrico Rukzio
Title: Mind Games! Exploring the Impact of Dark Patterns in Mixed Reality Scenarios
Abstract:
Mixed Reality (MR) integrates virtual objects with the real world, offering potential but raising concerns about misuse through dark patterns. This study explored the effects of four dark patterns, adapted from prior research, and applied to MR across three targets: places, products, and people. In a two-factorial within-subject study with 74 participants, we analyzed 13 videos simulating MR experiences during a city walk. Results show that all dark patterns significantly reduced user comfort, increased reactance, and decreased the intention to use MR glasses, with the most disruptive effects linked to personal or monetary manipulation. Additionally, the dark patterns of Emotional and Sensory Manipulation and Hiding Information produced similar impacts on the user in MR, suggesting a re-evaluation of current classifications to go beyond deceptive design techniques. Our findings highlight the importance of developing ethical design guidelines and tools to detect and prevent dark patterns as immersive technologies continue to evolve.

Authors:Zhaoyang Lv, Maurizio Monge, Ka Chen, Yufeng Zhu, Michael Goesele, Jakob Engel, Zhao Dong, Richard Newcombe
Title: Photoreal Scene Reconstruction from an Egocentric Device
Abstract:
In this paper, we investigate the challenges associated with using egocentric devices to photorealistic reconstruct the scene in high dynamic range. Existing methodologies typically assume using frame-rate 6DoF pose estimated from the device's visual-inertial odometry system, which may neglect crucial details necessary for pixel-accurate reconstruction. This study presents two significant findings. Firstly, in contrast to mainstream work treating RGB camera as global shutter frame-rate camera, we emphasize the importance of employing visual-inertial bundle adjustment (VIBA) to calibrate the precise timestamps and movement of the rolling shutter RGB sensing camera in a high frequency trajectory format, which ensures an accurate calibration of the physical properties of the rolling-shutter camera. Secondly, we incorporate a physical image formation model based into Gaussian Splatting, which effectively addresses the sensor characteristics, including the rolling-shutter effect of RGB cameras and the dynamic ranges measured by sensors. Our proposed formulation is applicable to the widely-used variants of Gaussian Splats representation. We conduct a comprehensive evaluation of our pipeline using the open-source Project Aria device under diverse indoor and outdoor lighting conditions, and further validate it on a Meta Quest3 device. Across all experiments, we observe a consistent visual enhancement of +1 dB in PSNR by incorporating VIBA, with an additional +1 dB achieved through our proposed image formation model. Our complete implementation, evaluation datasets, and recording profile are available at http://www.projectaria.com/photoreal-reconstruction/

Authors:Vassilis Lyberatos, Spyridon Kantarelis, Ioanna Zioga, Christina Anagnostopoulou, Giorgos Stamou, Anastasia Georgaki
Title: Music Interpretation and Emotion Perception: A Computational and Neurophysiological Investigation
Abstract:
This study investigates emotional expression and perception in music performance using computational and neurophysiological methods. The influence of different performance settings, such as repertoire, diatonic modal etudes, and improvisation, as well as levels of expressiveness, on performers' emotional communication and listeners' reactions is explored. Professional musicians performed various tasks, and emotional annotations were provided by both performers and the audience. Audio analysis revealed that expressive and improvisational performances exhibited unique acoustic features, while emotion analysis showed stronger emotional responses. Neurophysiological measurements indicated greater relaxation in improvisational performances. This multimodal study highlights the significance of expressivity in enhancing emotional communication and audience engagement.

Authors:Xiao Liu, Xinyi Dong, Xinyang Gao, Yansong Feng, Xun Pang
Title: Improving Research Idea Generation Through Data: An Empirical Investigation in Social Science
Abstract:
Recent advancements in large language models (LLMs) have shown promise in generating novel research ideas. However, these ideas often face challenges related to feasibility and expected effectiveness. This paper explores how augmenting LLMs with relevant data during the idea generation process can enhance the quality of generated ideas. We introduce two ways of incorporating data: (1) providing metadata during the idea generation stage to guide LLMs toward feasible directions, and (2) adding automatic validation during the idea selection stage to assess the empirical plausibility of hypotheses within ideas. We conduct experiments in the social science domain, specifically with climate negotiation topics, and find that metadata improves the feasibility of generated ideas by 20%, while automatic validation improves the overall quality of selected ideas by 7%. A human study shows that LLM-generated ideas, along with their related data and validation processes, inspire researchers to propose research ideas with higher quality. Our work highlights the potential of data-driven research idea generation, and underscores the practical utility of LLM-assisted ideation in real-world academic settings.

Authors:Marius Bock, Maximilian Hopp, Kristof Van Laerhoven, Michael Moeller
Title: Label Leakage in Federated Inertial-based Human Activity Recognition
Abstract:
While prior work has shown that Federated Learning updates can leak sensitive information, label reconstruction attacks, which aim to recover input labels from shared gradients, have not yet been examined in the context of Human Activity Recognition (HAR). Given the sensitive nature of activity labels, this study evaluates the effectiveness of state-of-the-art gradient-based label leakage attacks on HAR benchmark datasets. Our findings show that the number of activity classes, sampling strategy, and class imbalance are critical factors influencing the extent of label leakage, with reconstruction accuracies reaching well-above 90% on two benchmark datasets, even for trained models. Moreover, we find that Local Differential Privacy techniques such as gradient noise and clipping offer only limited protection, as certain attacks still reliably infer both majority and minority class labels. We conclude by offering practical recommendations for the privacy-aware deployment of federated HAR systems and identify open challenges for future research. Code to reproduce our experiments is publicly available via github.com/mariusbock/leakage_har.

Authors:Marius Bock, Michael Moeller, Kristof Van Laerhoven
Title: DeepConvContext: A Multi-Scale Approach to Timeseries Classification in Human Activity Recognition
Abstract:
Despite recognized limitations in modeling long-range temporal dependencies, Human Activity Recognition (HAR) has traditionally relied on a sliding window approach to segment labeled datasets. Deep learning models like the DeepConvLSTM typically classify each window independently, thereby restricting learnable temporal context to within-window information. To address this constraint, we propose DeepConvContext, a multi-scale time series classification framework for HAR. Drawing inspiration from the vision-based Temporal Action Localization community, DeepConvContext models both intra- and inter-window temporal patterns by processing sequences of time-ordered windows. Unlike recent HAR models that incorporate attention mechanisms, DeepConvContext relies solely on LSTMs -- with ablation studies demonstrating the superior performance of LSTMs over attention-based variants for modeling inertial sensor data. Across six widely-used HAR benchmarks, DeepConvContext achieves an average 10% improvement in F1-score over the classic DeepConvLSTM, with gains of up to 21%. Code to reproduce our experiments is publicly available via github.com/mariusbock/context_har.

Authors:Tin Trung Nguyen, Jiannan Xu, Zora Che, Phuong-Anh Nguyen-Le, Rushil Dandamudi, Donald Braman, Furong Huang, Hal Daumé, Zubin Jelveh
Title: Effort-aware Fairness: Incorporating a Philosophy-informed, Human-centered Notion of Effort into Algorithmic Fairness Metrics
Abstract:
Although popularized AI fairness metrics, e.g., demographic parity, have uncovered bias in AI-assisted decision-making outcomes, they do not consider how much effort one has spent to get to where one is today in the input feature space. However, the notion of effort is important in how Philosophy and humans understand fairness. We propose a philosophy-informed approach to conceptualize and evaluate Effort-aware Fairness (EaF), grounded in the concept of Force, which represents the temporal trajectory of predictive features coupled with inertia. Besides theoretical formulation, our empirical contributions include: (1) a pre-registered human subjects experiment, which shows that for both stages of the (individual) fairness evaluation process, people consider the temporal trajectory of a predictive feature more than its aggregate value; (2) pipelines to compute Effort-aware Individual/Group Fairness in the criminal justice and personal finance contexts. Our work may enable AI model auditors to uncover and potentially correct unfair decisions against individuals who have spent significant efforts to improve but are still stuck with systemic disadvantages outside their control.

Authors:Raphaël A. El Haddad, Zeyu Wang, Yeonsu Shin, Ranyi Liu, Yuntao Wang, Chun Yu
Title: AR Secretary Agent: Real-time Memory Augmentation via LLM-powered Augmented Reality Glasses
Abstract:
Interacting with a significant number of individuals on a daily basis is commonplace for many professionals, which can lead to challenges in recalling specific details: Who is this person? What did we talk about last time? The advant of augmented reality (AR) glasses, equipped with visual and auditory data capture capabilities, presents a solution. In our work, we implemented an AR Secretary Agent with advanced Large Language Models (LLMs) and Computer Vision technologies. This system could discreetly provide real-time information to the wearer, identifying who they are conversing with and summarizing previous discussions. To verify AR Secretary, we conducted a user study with 13 participants and showed that our technique can efficiently help users to memorize events by up to 20\% memory enhancement on our study.

Authors:Philipp Schaer, Christin Katharina Kreutz, Krisztian Balog, Timo Breuer, Andreas Konstantin Kruff
Title: Second SIGIR Workshop on Simulations for Information Access (Sim4IA 2025)
Abstract:
Simulations in information access (IA) have recently gained interest, as shown by various tutorials and workshops around that topic. Simulations can be key contributors to central IA research and evaluation questions, especially around interactive settings when real users are unavailable, or their participation is impossible due to ethical reasons. In addition, simulations in IA can help contribute to a better understanding of users, reduce complexity of evaluation experiments, and improve reproducibility. Building on recent developments in methods and toolkits, the second iteration of our Sim4IA workshop aims to again bring together researchers and practitioners to form an interactive and engaging forum for discussions on the future perspectives of the field. An additional aim is to plan an upcoming TREC/CLEF campaign.

Authors:Harshita Goyal, Garima Garg, Prisha Mordia, Veena Ramachandran, Dhruv Kumar, Jagat Sesh Challa
Title: The Impact of Large Language Models on K-12 Education in Rural India: A Thematic Analysis of Student Volunteer's Perspectives
Abstract:
AI-driven education, particularly Large Language Models (LLMs), has the potential to address learning disparities in rural K-12 schools. However, research on AI adoption in rural India remains limited, with existing studies focusing primarily on urban settings. This study examines the perceptions of volunteer teachers on AI integration in rural education, identifying key challenges and opportunities. Through semi-structured interviews with 23 volunteer educators in Rajasthan and Delhi, we conducted a thematic analysis to explore infrastructure constraints, teacher preparedness, and digital literacy gaps. Findings indicate that while LLMs could enhance personalized learning and reduce teacher workload, barriers such as poor connectivity, lack of AI training, and parental skepticism hinder adoption. Despite concerns over over-reliance and ethical risks, volunteers emphasize that AI should be seen as a complementary tool rather than a replacement for traditional teaching. Given the potential benefits, LLM-based tutors merit further exploration in rural classrooms, with structured implementation and localized adaptations to ensure accessibility and equity.

Authors:Khoi Trinh, Scott Seidenberger, Raveen Wijewickrama, Murtuza Jadliwala, Anindya Maiti
Title: A Picture is Worth a Thousand Prompts? Efficacy of Iterative Human-Driven Prompt Refinement in Image Regeneration Tasks
Abstract:
With AI-generated content becoming ubiquitous across the web, social media, and other digital platforms, it is vital to examine how such content are inspired and generated. The creation of AI-generated images often involves refining the input prompt iteratively to achieve desired visual outcomes. This study focuses on the relatively underexplored concept of image regeneration using AI, in which a human operator attempts to closely recreate a specific target image by iteratively refining their prompt. Image regeneration is distinct from normal image generation, which lacks any predefined visual reference. A separate challenge lies in determining whether existing image similarity metrics (ISMs) can provide reliable, objective feedback in iterative workflows, given that we do not fully understand if subjective human judgments of similarity align with these metrics. Consequently, we must first validate their alignment with human perception before assessing their potential as a feedback mechanism in the iterative prompt refinement process. To address these research gaps, we present a structured user study evaluating how iterative prompt refinement affects the similarity of regenerated images relative to their targets, while also examining whether ISMs capture the same improvements perceived by human observers. Our findings suggest that incremental prompt adjustments substantially improve alignment, verified through both subjective evaluations and quantitative measures, underscoring the broader potential of iterative workflows to enhance generative AI content creation across various application domains.

Authors:Yangtian Zi, Luisa Li, Arjun Guha, Carolyn Jane Anderson, Molly Q Feldman
Title: "I Would Have Written My Code Differently'': Beginners Struggle to Understand LLM-Generated Code
Abstract:
Large language models (LLMs) are being increasingly adopted for programming work. Prior work shows that while LLMs accelerate task completion for professional programmers, beginning programmers struggle to prompt models effectively. However, prompting is just half of the code generation process -- when code is generated, it must be read, evaluated, and integrated (or rejected). How accessible are these tasks for beginning programmers? This paper measures how well beginners comprehend LLM-generated code and explores the challenges students face in judging code correctness. We compare how well students understand natural language descriptions of functions and LLM-generated implementations, studying 32 CS1 students on 160 task instances. Our results show a low per-task success rate of 32.5\%, with indiscriminate struggles across demographic populations. Key challenges include barriers for non-native English speakers, unfamiliarity with Python syntax, and automation bias. Our findings highlight the barrier that code comprehension presents to beginning programmers seeking to write code with LLMs.

Authors:Yaqing Yang, Vikram Mohanty, Nikolas Martelaro, Aniket Kittur, Yan-Ying Chen, Matthew K. Hong
Title: From Overload to Insight: Scaffolding Creative Ideation through Structuring Inspiration
Abstract:
Creative ideation relies on exploring diverse stimuli, but the overwhelming abundance of information often makes it difficult to identify valuable insights or reach the `aha' moment. Traditional methods for accessing design stimuli lack organization and fail to support users in discovering promising opportunities within large idea spaces. In this position paper, we explore how AI can be leveraged to structure, organize, and surface relevant stimuli, guiding users in both exploring idea spaces and mapping insights back to their design challenges.

Authors:Claudio Bettini, Azin Moradbeikie, Gabriele Civitarese
Title: Personal Data Protection in Smart Home Activity Monitoring for Digital Health: A Case Study
Abstract:
Researchers in pervasive computing have worked for decades on sensor-based human activity recognition (HAR). Among the digital health applications, the recognition of activities of daily living (ADL) in smart home environments enables the identification of behavioral changes that clinicians consider as a digital bio-marker of early stages of cognitive decline. The real deployment of sensor-based HAR systems in the homes of elderly subjects poses several challenges, with privacy and ethical concerns being major ones. This paper reports our experience applying privacy by design principles to develop and deploy one of these systems.

Authors:Matt Franchi, Maria Teresa Parreira, Fanjun Bu, Wendy Ju
Title: The Robotability Score: Enabling Harmonious Robot Navigation on Urban Streets
Abstract:
This paper introduces the Robotability Score ($R$), a novel metric that quantifies the suitability of urban environments for autonomous robot navigation. Through expert interviews and surveys, we identify and weigh key features contributing to R for wheeled robots on urban streets. Our findings reveal that pedestrian density, crowd dynamics and pedestrian flow are the most critical factors, collectively accounting for 28% of the total score. Computing robotability across New York City yields significant variation; the area of highest R is 3.0 times more "robotable" than the area of lowest R. Deployments of a physical robot on high and low robotability areas show the adequacy of the score in anticipating the ease of robot navigation. This new framework for evaluating urban landscapes aims to reduce uncertainty in robot deployment while respecting established mobility patterns and urban planning principles, contributing to the discourse on harmonious human-robot environments.

Authors:Sophie Chiang, Guy Laban, Emily S. Cross, Hatice Gunes
Title: Comparing Self-Disclosure Themes and Semantics to a Human, a Robot, and a Disembodied Agent
Abstract:
As social robots and other artificial agents become more conversationally capable, it is important to understand whether the content and meaning of self-disclosure towards these agents changes depending on the agent's embodiment. In this study, we analysed conversational data from three controlled experiments in which participants self-disclosed to a human, a humanoid social robot, and a disembodied conversational agent. Using sentence embeddings and clustering, we identified themes in participants' disclosures, which were then labelled and explained by a large language model. We subsequently assessed whether these themes and the underlying semantic structure of the disclosures varied by agent embodiment. Our findings reveal strong consistency: thematic distributions did not significantly differ across embodiments, and semantic similarity analyses showed that disclosures were expressed in highly comparable ways. These results suggest that while embodiment may influence human behaviour in human-robot and human-agent interactions, people tend to maintain a consistent thematic focus and semantic structure in their disclosures, whether speaking to humans or artificial interlocutors.

Authors:Marcel Worring, Jan Zahálka, Stef van den Elzen, Maximilian T. Fischer, Daniel A. Keim
Title: A Multimedia Analytics Model for the Foundation Model Era
Abstract:
The rapid advances in Foundation Models and agentic Artificial Intelligence are transforming multimedia analytics by enabling richer, more sophisticated interactions between humans and analytical systems. Existing conceptual models for visual and multimedia analytics, however, do not adequately capture the complexity introduced by these powerful AI paradigms. To bridge this gap, we propose a comprehensive multimedia analytics model specifically designed for the foundation model era. Building upon established frameworks from visual analytics, multimedia analytics, knowledge generation, analytic task definition, mixed-initiative guidance, and human-in-the-loop reinforcement learning, our model emphasizes integrated human-AI teaming based on visual analytics agents from both technical and conceptual perspectives. Central to the model is a seamless, yet explicitly separable, interaction channel between expert users and semi-autonomous analytical processes, ensuring continuous alignment between user intent and AI behavior. The model addresses practical challenges in sensitive domains such as intelligence analysis, investigative journalism, and other fields handling complex, high-stakes data. We illustrate through detailed case studies how our model facilitates deeper understanding and targeted improvement of multimedia analytics solutions. By explicitly capturing how expert users can optimally interact with and guide AI-powered multimedia analytics systems, our conceptual framework sets a clear direction for system design, comparison, and future research.

Authors:Guy Laban, Sophie Chiang, Hatice Gunes
Title: What People Share With a Robot When Feeling Lonely and Stressed and How It Helps Over Time
Abstract:
Loneliness and stress are prevalent among young adults and are linked to significant psychological and health-related consequences. Social robots may offer a promising avenue for emotional support, especially when considering the ongoing advancements in conversational AI. This study investigates how repeated interactions with a social robot influence feelings of loneliness and perceived stress, and how such feelings are reflected in the themes of user disclosures towards the robot. Participants engaged in a five-session robot-led intervention, where a large language model powered QTrobot facilitated structured conversations designed to support cognitive reappraisal. Results from linear mixed-effects models show significant reductions in both loneliness and perceived stress over time. Additionally, semantic clustering of 560 user disclosures towards the robot revealed six distinct conversational themes. Results from a Kruskal-Wallis H-test demonstrate that participants reporting higher loneliness and stress more frequently engaged in socially focused disclosures, such as friendship and connection, whereas lower distress was associated with introspective and goal-oriented themes (e.g., academic ambitions). By exploring both how the intervention affects well-being, as well as how well-being shapes the content of robot-directed conversations, we aim to capture the dynamic nature of emotional support in huma-robot interaction.

Authors:Huimin Xu, Seungjun Yi, Terence Lim, Jiawei Xu, Andrew Well, Carlos Mery, Aidong Zhang, Yuji Zhang, Heng Ji, Keshav Pingali, Yan Leng, Ying Ding
Title: TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews
Abstract:
Thematic analysis (TA) is a widely used qualitative approach for uncovering latent meanings in unstructured text data. TA provides valuable insights in healthcare but is resource-intensive. Large Language Models (LLMs) have been introduced to perform TA, yet their applications in healthcare remain unexplored. Here, we propose TAMA: A Human-AI Collaborative Thematic Analysis framework using Multi-Agent LLMs for clinical interviews. We leverage the scalability and coherence of multi-agent systems through structured conversations between agents and coordinate the expertise of cardiac experts in TA. Using interview transcripts from parents of children with Anomalous Aortic Origin of a Coronary Artery (AAOCA), a rare congenital heart disease, we demonstrate that TAMA outperforms existing LLM-assisted TA approaches, achieving higher thematic hit rate, coverage, and distinctiveness. TAMA demonstrates strong potential for automated TA in clinical settings by leveraging multi-agent LLM systems with human-in-the-loop integration by enhancing quality while significantly reducing manual workload.

Authors:Guy Laban, Julie Wang, Hatice Gunes
Title: A Robot-Led Intervention for Emotion Regulation: From Expression to Reappraisal
Abstract:
Emotion regulation is a crucial skill for managing emotions in everyday life, yet finding a constructive and accessible method to support these processes remains challenging due to their cognitive demands. In this study, we explore how regular interactions with a social robot, conducted in a structured yet familiar environment within university halls and departments, can provide effective support for emotion regulation through cognitive reappraisal. Twenty-one students participated in a five-session study at a university hall or department, where the robot, powered by a large language model (GPT-3.5), facilitated structured conversations, encouraging the students to reinterpret emotionally charged situations they shared with the robot. Quantitative and qualitative results indicate significant improvements in emotion self-regulation, with participants reporting better understanding and control of their emotions. The intervention led to significant changes in constructive emotion regulation tendencies and positive effects on mood and sentiment after each session. The findings also demonstrate that repeated interactions with the robot encouraged greater emotional expressiveness, including longer speech disclosures, increased use of affective language, and heightened facial arousal. Notably, expressiveness followed structured patterns aligned with the reappraisal process, with expression peaking during key reappraisal moments, particularly when participants were prompted to reinterpret negative experiences. The qualitative feedback further highlighted how the robot fostered introspection and provided a supportive space for discussing emotions, enabling participants to confront long-avoided emotional challenges. These findings demonstrate the potential of robots to effectively assist in emotion regulation in familiar environments, offering both emotional support and cognitive guidance.

Authors:Jaymari Chua, Chen Wang, Lina Yao
Title: Superhuman Game AI Disclosure: Expertise and Context Moderate Effects on Trust and Fairness
Abstract:
As artificial intelligence surpasses human performance in select tasks, disclosing superhuman capabilities poses distinct challenges for fairness, accountability, and trust. However, the impact of such disclosures on diverse user attitudes and behaviors remains unclear, particularly concerning potential negative reactions like discouragement or overreliance. This paper investigates these effects by utilizing Persona Cards: a validated, standardized set of synthetic personas designed to simulate diverse user reactions and fairness perspectives. We conducted an ethics board-approved study (N=32), utilizing these personas to investigate how capability disclosure influenced behaviors with a superhuman game AI in competitive StarCraft II scenarios. Our results reveal transparency is double-edged: while disclosure could alleviate suspicion, it also provoked frustration and strategic defeatism among novices in cooperative scenarios, as well as overreliance in competitive contexts. Experienced and competitive players interpreted disclosure as confirmation of an unbeatable opponent, shifting to suboptimal goals. We release the Persona Cards Dataset, including profiles, prompts, interaction logs, and protocols, to foster reproducible research into human alignment AI design. This work demonstrates that transparency is not a cure-all; successfully leveraging disclosure to enhance trust and accountability requires careful tailoring to user characteristics, domain norms, and specific fairness objectives.

Authors:Niall L. Williams, Logan C. Stevens, Aniket Bera, Dinesh Manocha
Title: Sensitivity to Redirected Walking Considering Gaze, Posture, and Luminance
Abstract:
We study the correlations between redirected walking (RDW) rotation gains and patterns in users' posture and gaze data during locomotion in virtual reality (VR). To do this, we conducted a psychophysical experiment to measure users' sensitivity to RDW rotation gains and collect gaze and posture data during the experiment. Using multilevel modeling, we studied how different factors of the VR system and user affected their physiological signals. In particular, we studied the effects of redirection gain, trial duration, trial number (i.e., time spent in VR), and participant gender on postural sway, gaze velocity (a proxy for gaze stability), and saccade and blink rate. Our results showed that, in general, physiological signals were significantly positively correlated with the strength of redirection gain, the duration of trials, and the trial number. Gaze velocity was negatively correlated with trial duration. Additionally, we measured users' sensitivity to rotation gains in well-lit (photopic) and dimly-lit (mesopic) virtual lighting conditions. Results showed that there were no significant differences in RDW detection thresholds between the photopic and mesopic luminance conditions.

Authors:Omar Shaikh, Hussein Mozannar, Gagan Bansal, Adam Fourney, Eric Horvitz
Title: Navigating Rifts in Human-LLM Grounding: Study and Benchmark
Abstract:
Language models excel at following instructions but often struggle with the collaborative aspects of conversation that humans naturally employ. This limitation in grounding -- the process by which conversation participants establish mutual understanding -- can lead to outcomes ranging from frustrated users to serious consequences in high-stakes scenarios. To systematically study grounding challenges in human-LLM interactions, we analyze logs from three human-assistant datasets: WildChat, MultiWOZ, and Bing Chat. We develop a taxonomy of grounding acts and build models to annotate and forecast grounding behavior. Our findings reveal significant differences in human-human and human-LLM grounding: LLMs were three times less likely to initiate clarification and sixteen times less likely to provide follow-up requests than humans. Additionally, we find that early grounding failures predict later interaction breakdowns. Building on these insights, we introduce Rifts, a benchmark derived from publicly available LLM interaction data containing situations where LLMs fail to initiate grounding. We note that current frontier models perform poorly on Rifts, highlighting the need to reconsider how we train and prompt LLMs for human interaction. To this end, we develop a preliminary intervention aimed at mitigating grounding failures.

Authors:Rashid Mushkani, Hugo Berard, Shin Koseki
Title: Negotiative Alignment: Embracing Disagreement to Achieve Fairer Outcomes -- Insights from Urban Studies
Abstract:
Urban assessments often compress diverse needs into single scores, which can obscure minority perspectives. We present a community-centered study in Montreal (n=35; wheelchair users, seniors, LGBTQIA2+ residents, and immigrants). Participants rated 20 streets (accessibility, inclusivity, aesthetics, practicality) and ranked 7 images on 12 interview-elicited criteria. Disagreement patterns were systematic in our sample: wheelchair users diverged most on accessibility and practicality; LGBTQIA2+ participants emphasized inclusion and liveliness; seniors prioritized security. Group discussion reduced information gaps but not value conflicts; ratings conveyed intensity, while rankings forced trade-offs. We then formalize negotiative alignment, a transparent, budget-aware bargaining procedure, and pilot it with role-played stakeholder agents plus a neutral mediator. Relative to the best base design under the same public rubric, the negotiated package increased total utility (21.10 to 24.55), raised the worst-group utility (3.20 to 3.90), improved twentieth percentile satisfaction (0.86 to 1.00; min-max normalized within the scenario), and reduced inequality (Gini 0.036 to 0.025). Treating disagreement as signal and reporting worst-group outcomes alongside totals may help planners and AI practitioners surface trade-offs and preserve minority priorities while maintaining efficiency.

Authors:Raj Gupta, Harshita Goyal, Dhruv Kumar, Apurv Mehra, Sanchit Sharma, Kashish Mittal, Jagat Sesh Challa
Title: Sakshm AI: Advancing AI-Assisted Coding Education for Engineering Students in India Through Socratic Tutoring and Comprehensive Feedback
Abstract:
The advent of Large Language Models (LLMs) is reshaping education, particularly in programming, by enhancing problem-solving, enabling personalized feedback, and supporting adaptive learning. Existing AI tools for programming education struggle with key challenges, including the lack of Socratic guidance, direct code generation, limited context retention, minimal adaptive feedback, and the need for prompt engineering. To address these challenges, we introduce Sakshm AI, an intelligent tutoring system for learners across all education levels. It fosters Socratic learning through Disha, its inbuilt AI chatbot, which provides context-aware hints, structured feedback, and adaptive guidance while maintaining conversational memory and supporting language flexibility. This study examines 1170 registered participants, analyzing platform logs, engagement trends, and problem-solving behavior to assess Sakshm AI's impact. Additionally, a structured survey with 45 active users and 25 in-depth interviews was conducted, using thematic encoding to extract qualitative insights. Our findings reveal how AI-driven Socratic guidance influences problem-solving behaviors and engagement, offering key recommendations for optimizing AI-based coding platforms. This research combines quantitative and qualitative insights to inform AI-assisted education, providing a framework for scalable, intelligent tutoring systems that improve learning outcomes. Furthermore, Sakshm AI represents a significant step toward Sustainable Development Goal 4 Quality Education, providing an accessible and structured learning tool for undergraduate students, even without expert guidance. This is one of the first large-scale studies examining AI-assisted programming education across multiple institutions and demographics.

Authors:Hyeonsu Kang, David Chuan-en Lin, Yan-Ying Chen, Matthew K. Hong, Nikolas Martelaro, Aniket Kittur
Title: BioSpark: Beyond Analogical Inspiration to LLM-augmented Transfer
Abstract:
We present BioSpark, a system for analogical innovation designed to act as a creativity partner in reducing the cognitive effort in finding, mapping, and creatively adapting diverse inspirations. While prior approaches have focused on initial stages of finding inspirations, BioSpark uses LLMs embedded in a familiar, visual, Pinterest-like interface to go beyond inspiration to supporting users in identifying the key solution mechanisms, transferring them to the problem domain, considering tradeoffs, and elaborating on details and characteristics. To accomplish this BioSpark introduces several novel contributions, including a tree-of-life enabled approach for generating relevant and diverse inspirations, as well as AI-powered cards including 'Sparks' for analogical transfer; 'Trade-offs' for considering pros and cons; and 'Q&A' for deeper elaboration. We evaluated BioSpark through workshops with professional designers and a controlled user study, finding that using BioSpark led to a greater number of generated ideas; those ideas being rated higher in creative quality; and more diversity in terms of biological inspirations used than a control condition. Our results suggest new avenues for creativity support tools embedding AI in familiar interaction paradigms for designer workflows.

Authors:Ryugo Morita, Ko Watanabe, Jinjia Zhou, Andreas Dengel, Shoya Ishimaru
Title: GenAIReading: Augmenting Human Cognition with Interactive Digital Textbooks Using Large Language Models and Image Generation Models
Abstract:
Cognitive augmentation is a cornerstone in advancing education, particularly through personalized learning. However, personalizing extensive textual materials, such as narratives and academic textbooks, remains challenging due to their heavy use, which can hinder learner engagement and understanding. Building on cognitive theories like Dual Coding Theory -- which posits that combining textual and visual information enhances comprehension and memory -- this study explores the potential of Generative AI (GenAI) to enrich educational materials. We utilized large language models (LLMs) to generate concise text summaries and image generation models (IGMs) to create visually aligned content from textual inputs. After recruiting 24 participants, we verified that integrating AI-generated supplementary materials significantly improved learning outcomes, increasing post-reading test scores by 7.50%. These findings underscore GenAI's transformative potential in creating adaptive learning environments that enhance cognitive augmentation.

Authors:Yi-Chi Liao, Paul Streli, Zhipeng Li, Christoph Gebhardt, Christian Holz
Title: Continual Human-in-the-Loop Optimization
Abstract:
Optimal input settings vary across users due to differences in motor abilities and personal preferences, which are typically addressed by manual tuning or calibration. Although human-in-the-loop optimization has the potential to identify optimal settings during use, it is rarely applied due to its long optimization process. A more efficient approach would continually leverage data from previous users to accelerate optimization, exploiting shared traits while adapting to individual characteristics. We introduce the concept of Continual Human-in-the-Loop Optimization and a Bayesian optimization-based method that leverages a Bayesian-neural-network surrogate model to capture population-level characteristics while adapting to new users. We propose a generative replay strategy to mitigate catastrophic forgetting. We demonstrate our method by optimizing virtual reality keyboard parameters for text entry using direct touch, showing reduced adaptation times with a growing user base. Our method opens the door for next-generation personalized input systems that improve with accumulated experience.

Authors:Rajnish Kumar, Tapas Tripura, Souvik Chakraborty, Sitikantha Roy
Title: Deep Muscle EMG construction using A Physics-Integrated Deep Learning approach
Abstract:
Electromyography (EMG)--based computational musculoskeletal modeling is a non-invasive method for studying musculotendon function, human movement, and neuromuscular control, providing estimates of internal variables like muscle forces and joint torques. However, EMG signals from deeper muscles are often challenging to measure by placing the surface EMG electrodes and unfeasible to measure directly using invasive methods. The restriction to the access of EMG data from deeper muscles poses a considerable obstacle to the broad adoption of EMG-driven modeling techniques. A strategic alternative is to use an estimation algorithm to approximate the missing EMG signals from deeper muscle. A similar strategy is used in physics-informed deep learning, where the features of physical systems are learned without labeled data. In this work, we propose a hybrid deep learning algorithm, namely the neural musculoskeletal model (NMM), that integrates physics-informed and data-driven deep learning to approximate the EMG signals from the deeper muscles. While data-driven modeling is used to predict the missing EMG signals, physics-based modeling engraves the subject-specific information into the predictions. Experimental verifications on five test subjects are carried out to investigate the performance of the proposed hybrid framework. The proposed NMM is validated against the joint torque computed from 'OpenSim' software. The predicted deep EMG signals are also compared against the state-of-the-art muscle synergy extrapolation (MSE) approach, where the proposed NMM completely outperforms the existing MSE framework by a significant margin.

Authors:Yansong Ning, Shuowei Cai, Wei Li, Jun Fang, Naiqiang Tan, Hua Chai, Hao Liu
Title: DiMA: An LLM-Powered Ride-Hailing Assistant at DiDi
Abstract:
On-demand ride-hailing services like DiDi, Uber, and Lyft have transformed urban transportation, offering unmatched convenience and flexibility. In this paper, we introduce DiMA, an LLM-powered ride-hailing assistant deployed in DiDi Chuxing. Its goal is to provide seamless ride-hailing services and beyond through a natural and efficient conversational interface under dynamic and complex spatiotemporal urban contexts. To achieve this, we propose a spatiotemporal-aware order planning module that leverages external tools for precise spatiotemporal reasoning and progressive order planning. Additionally, we develop a cost-effective dialogue system that integrates multi-type dialog repliers with cost-aware LLM configurations to handle diverse conversation goals and trade-off response quality and latency. Furthermore, we introduce a continual fine-tuning scheme that utilizes real-world interactions and simulated dialogues to align the assistant's behavior with human preferred decision-making processes. Since its deployment in the DiDi application, DiMA has demonstrated exceptional performance, achieving 93% accuracy in order planning and 92% in response generation during real-world interactions. Offline experiments further validate DiMA capabilities, showing improvements of up to 70.23% in order planning and 321.27% in response generation compared to three state-of-the-art agent frameworks, while reducing latency by $0.72\times$ to $5.47\times$. These results establish DiMA as an effective, efficient, and intelligent mobile assistant for ride-hailing services.
中文: 本文介绍了在滴滴出行中部署的基于大语言模型的网约车助手DiMA,它通过时空感知的订单规划和高效对话系统,在实际应用中实现了高精度的订单规划和响应生成,相比现有框架有显著提升。
English: This paper introduces DiMA, an LLM-powered ride-hailing assistant deployed in DiDi Chuxing that achieves high accuracy in order planning and response generation through spatiotemporal reasoning and cost-effective dialogue systems, demonstrating significant improvements over existing frameworks.

Authors:Zongqian Li, Ehsan Shareghi, Nigel Collier
Title: ReasonGraph: Visualisation of Reasoning Paths
Abstract:
Large Language Models (LLMs) reasoning processes are challenging to analyze due to their complexity and the lack of organized visualization tools. We present ReasonGraph, a web-based platform for visualizing and analyzing LLM reasoning processes. It supports both sequential and tree-based reasoning methods while integrating with major LLM providers and over fifty state-of-the-art models. ReasonGraph incorporates an intuitive UI with meta reasoning method selection, configurable visualization parameters, and a modular framework that facilitates efficient extension. Our evaluation shows high parsing reliability, efficient processing, and strong usability across various downstream applications. By providing a unified visualization framework, ReasonGraph reduces cognitive load in analyzing complex reasoning paths, improves error detection in logical processes, and enables more effective development of LLM-based applications. The platform is open-source, promoting accessibility and reproducibility in LLM reasoning analysis.

Authors:Eduardo Davalos, Yike Zhang, Namrata Srivastava, Jorge Alberto Salas, Sara McFadden, Sun-Joo Cho, Gautam Biswas, Amanda Goodwin
Title: LLMs as Educational Analysts: Transforming Multimodal Data Traces into Actionable Reading Assessment Reports
Abstract:
Reading assessments are essential for enhancing students' comprehension, yet many EdTech applications focus mainly on outcome-based metrics, providing limited insights into student behavior and cognition. This study investigates the use of multimodal data sources -- including eye-tracking data, learning outcomes, assessment content, and teaching standards -- to derive meaningful reading insights. We employ unsupervised learning techniques to identify distinct reading behavior patterns, and then a large language model (LLM) synthesizes the derived information into actionable reports for educators, streamlining the interpretation process. LLM experts and human educators evaluate these reports for clarity, accuracy, relevance, and pedagogical usefulness. Our findings indicate that LLMs can effectively function as educational analysts, turning diverse data into teacher-friendly insights that are well-received by educators. While promising for automating insight generation, human oversight remains crucial to ensure reliability and fairness. This research advances human-centered AI in education, connecting data-driven analytics with practical classroom applications.

Authors:Will Epperson, Gagan Bansal, Victor Dibia, Adam Fourney, Jack Gerrits, Erkang Zhu, Saleema Amershi
Title: Interactive Debugging and Steering of Multi-Agent AI Systems
Abstract:
Fully autonomous teams of LLM-powered AI agents are emerging that collaborate to perform complex tasks for users. What challenges do developers face when trying to build and debug these AI agent teams? In formative interviews with five AI agent developers, we identify core challenges: difficulty reviewing long agent conversations to localize errors, lack of support in current tools for interactive debugging, and the need for tool support to iterate on agent configuration. Based on these needs, we developed an interactive multi-agent debugging tool, AGDebugger, with a UI for browsing and sending messages, the ability to edit and reset prior agent messages, and an overview visualization for navigating complex message histories. In a two-part user study with 14 participants, we identify common user strategies for steering agents and highlight the importance of interactive message resets for debugging. Our studies deepen understanding of interfaces for debugging increasingly important agentic workflows.

Authors:Rashid Mushkani, Shravan Nayak, Hugo Berard, Allison Cohen, Shin Koseki, Hadrien Bertrand
Title: LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces
Abstract:
We introduce the Local Intersectional Visual Spaces (LIVS) dataset, a benchmark for multi-criteria alignment, developed through a two-year participatory process with 30 community organizations to support the pluralistic alignment of text-to-image (T2I) models in inclusive urban planning. The dataset encodes 37,710 pairwise comparisons across 13,462 images, structured along six criteria - Accessibility, Safety, Comfort, Invitingness, Inclusivity, and Diversity - derived from 634 community-defined concepts. Using Direct Preference Optimization (DPO), we fine-tune Stable Diffusion XL to reflect multi-criteria spatial preferences and evaluate the LIVS dataset and the fine-tuned model through four case studies: (1) DPO increases alignment with annotated preferences, particularly when annotation volume is high; (2) preference patterns vary across participant identities, underscoring the need for intersectional data; (3) human-authored prompts generate more distinctive visual outputs than LLM-generated ones, influencing annotation decisiveness; and (4) intersectional groups assign systematically different ratings across criteria, revealing the limitations of single-objective alignment. While DPO improves alignment under specific conditions, the prevalence of neutral ratings indicates that community values are heterogeneous and often ambiguous. LIVS provides a benchmark for developing T2I models that incorporate local, stakeholder-driven preferences, offering a foundation for context-aware alignment in spatial design.

Authors:Haruki Suzawa, Ko Watanabe, Andreas Dengel, Shoya Ishimaru
Title: Augmenting Online Meetings with Context-Aware Real-time Music Generation
Abstract:
As online communication continues to expand, participants often face cognitive fatigue and reduced engagement. Cognitive augmentation, which leverages technology to enhance human abilities, offers promising solutions to these challenges. In this study, we investigate the potential of generative artificial intelligence (GenAI) for real-time music generation to enrich online meetings. We introduce Discussion Jockey 2, a system that dynamically produces background music in response to live conversation transcripts. Through a user study involving 14 participants in an online interview setting, we examine the system's impact on relaxation, concentration, and overall user experience. The findings reveal that AI-generated background music significantly enhances user relaxation (average score: 5.75/9) and concentration (average score: 5.86/9). This research underscores the promise of context-aware music generation in improving the quality of online communication and points to future directions for optimizing its implementation across various virtual environments.

Authors:Mohit Sharma, Talha Bin Masood, Nanna Holmgaard List, Ingrid Hotz, Vijay Natarajan
Title: Continuous Scatterplot and Image Moments for Time-Varying Bivariate Field Analysis of Electronic Structure Evolution
Abstract:
Photoinduced electronic transitions are complex quantum-mechanical processes where electrons move between energy levels due to light absorption. This induces dynamics in electronic structure and nuclear geometry, driving important physical and chemical processes in fields like photobiology, materials design, and medicine. The evolving electronic structure can be characterized by two electron density fields: hole and particle natural transition orbitals (NTOs). Studying these density fields helps understand electronic charge movement between donor and acceptor regions within a molecule. Previous works rely on side-by-side visual comparisons of isosurfaces, statistical approaches, or bivariate field analysis with few instances. We propose a new method to analyze time-varying bivariate fields with many instances, which is relevant for understanding electronic structure changes during light-induced dynamics. Since NTO fields depend on nuclear geometry, the nuclear motion results in numerous time steps to analyze. This paper presents a structured approach to feature-directed visual exploration of time-varying bivariate fields using continuous scatterplots (CSPs) and image moment-based descriptors, tailored for studying evolving electronic structures post-photoexcitation. The CSP of the bivariate field at each time step is represented by a four-length image moment vector. The collection of all vector descriptors forms a point cloud in R^4, visualized using principal component analysis. Selecting appropriate principal components results in a representation of the point cloud as a curve on the plane, aiding tasks such as identifying key time steps, recognizing patterns within the bivariate field, and tracking the temporal evolution. We demonstrate this with two case studies on excited-state molecular dynamics, showing how bivariate field analysis provides application-specific insights.

Authors:Markus Sasalovici, Albin Zeqiri, Robin Connor Schramm, Oscar Javier Ariza Nunez, Pascal Jansen, Jann Philipp Freiwald, Mark Colley, Christian Winkler, Enrico Rukzio
Title: Bumpy Ride? Understanding the Effects of External Forces on Spatial Interactions in Moving Vehicles
Abstract:
As the use of Head-Mounted Displays in moving vehicles increases, passengers can immerse themselves in visual experiences independent of their physical environment. However, interaction methods are susceptible to physical motion, leading to input errors and reduced task performance. This work investigates the impact of G-forces, vibrations, and unpredictable maneuvers on 3D interaction methods. We conducted a field study with 24 participants in both stationary and moving vehicles to examine the effects of vehicle motion on four interaction methods: (1) Gaze&Pinch, (2) DirectTouch, (3) Handray, and (4) HeadGaze. Participants performed selections in a Fitts' Law task. Our findings reveal a significant effect of vehicle motion on interaction accuracy and duration across the tested combinations of Interaction Method x Road Type x Curve Type. We found a significant impact of movement on throughput, error rate, and perceived workload. Finally, we propose future research considerations and recommendations on interaction methods during vehicle movement.

Authors:Mark Colley, Jonathan Westhauser, Jonas Andersson, Alexander G. Mirnig, Enrico Rukzio
Title: Introducing ROADS: A Systematic Comparison of Remote Control Interaction Concepts for Automated Vehicles at Road Works
Abstract:
As vehicle automation technology continues to mature, there is a necessity for robust remote monitoring and intervention features. These are essential for intervening during vehicle malfunctions, challenging road conditions, or in areas that are difficult to navigate. This evolution in the role of the human operator - from a constant driver to an intermittent teleoperator - necessitates the development of suitable interaction interfaces. While some interfaces were suggested, a comparative study is missing. We designed, implemented, and evaluated three interaction concepts (path planning, trajectory guidance, and waypoint guidance) with up to four concurrent requests of automated vehicles in a within-subjects study with N=23 participants. The results showed a clear preference for the path planning concept. It also led to the highest usability but lower satisfaction. With trajectory guidance, the fewest requests were resolved. The study's findings contribute to the ongoing development of HMIs focused on the remote assistance of automated vehicles.

Authors:Peya Mowar, Yi-Hao Peng, Jason Wu, Aaron Steinfeld, Jeffrey P. Bigham
Title: CodeA11y: Making AI Coding Assistants Useful for Accessible Web Development
Abstract:
A persistent challenge in accessible computing is ensuring developers produce web UI code that supports assistive technologies. Despite numerous specialized accessibility tools, novice developers often remain unaware of them, leading to ~96% of web pages that contain accessibility violations. AI coding assistants, such as GitHub Copilot, could offer potential by generating accessibility-compliant code, but their impact remains uncertain. Our formative study with 16 developers without accessibility training revealed three key issues in AI-assisted coding: failure to prompt AI for accessibility, omitting crucial manual steps like replacing placeholder attributes, and the inability to verify compliance. To address these issues, we developed CodeA11y, a GitHub Copilot Extension, that suggests accessibility-compliant code and displays manual validation reminders. We evaluated it through a controlled study with another 20 novice developers. Our findings demonstrate its effectiveness in guiding novice developers by reinforcing accessibility practices throughout interactions, representing a significant step towards integrating accessibility into AI coding assistants.

Authors:Sahana Yadnakudige Subramanya, Ko Watanabe, Andreas Dengel, Shoya Ishimaru
Title: Human-in-the-Loop Annotation for Image-Based Engagement Estimation: Assessing the Impact of Model Reliability on Annotation Accuracy
Abstract:
Human-in-the-loop (HITL) frameworks are increasingly recognized for their potential to improve annotation accuracy in emotion estimation systems by combining machine predictions with human expertise. This study focuses on integrating a high-performing image-based emotion model into a HITL annotation framework to evaluate the collaborative potential of human-machine interaction and identify the psychological and practical factors critical to successful collaboration. Specifically, we investigate how varying model reliability and cognitive framing influence human trust, cognitive load, and annotation behavior in HITL systems. We demonstrate that model reliability and psychological framing significantly impact annotators' trust, engagement, and consistency, offering insights into optimizing HITL frameworks. Through three experimental scenarios with 29 participants--baseline model reliability (S1), fabricated errors (S2), and cognitive bias introduced by negative framing (S3)--we analyzed behavioral and qualitative data. Reliable predictions in S1 yielded high trust and annotation consistency, while unreliable outputs in S2 led to increased critical evaluations but also heightened frustration and response variability. Negative framing in S3 revealed how cognitive bias influenced participants to perceive the model as more relatable and accurate, despite misinformation regarding its reliability. These findings highlight the importance of both reliable machine outputs and psychological factors in shaping effective human-machine collaboration. By leveraging the strengths of both human oversight and automated systems, this study establishes a scalable HITL framework for emotion annotation and lays the foundation for broader applications in adaptive learning and human-computer interaction.

Authors:Zixin Tang, Chieh-Yang Huang, Tsung-Che Li, Ho Yin Sam Ng, Hen-Hsen Huang, Ting-Hao 'Kenneth' Huang
Title: Using Contextually Aligned Online Reviews to Measure LLMs' Performance Disparities Across Language Varieties
Abstract:
A language can have different varieties. These varieties can affect the performance of natural language processing (NLP) models, including large language models (LLMs), which are often trained on data from widely spoken varieties. This paper introduces a novel and cost-effective approach to benchmark model performance across language varieties. We argue that international online review platforms, such as Booking.com, can serve as effective data sources for constructing datasets that capture comments in different language varieties from similar real-world scenarios, like reviews for the same hotel with the same rating using the same language (e.g., Mandarin Chinese) but different language varieties (e.g., Taiwan Mandarin, Mainland Mandarin). To prove this concept, we constructed a contextually aligned dataset comprising reviews in Taiwan Mandarin and Mainland Mandarin and tested six LLMs in a sentiment analysis task. Our results show that LLMs consistently underperform in Taiwan Mandarin.

Authors:Tram Thi Minh Tran, Shane Brown, Oliver Weidlich, Soojeong Yoo, Callum Parker
Title: Wearable AR in Everyday Contexts: Insights from a Digital Ethnography of YouTube Videos
Abstract:
With growing investment in consumer augmented reality (AR) headsets and glasses, wearable AR is moving from niche applications to everyday use. However, current research primarily examines AR in controlled settings, offering limited insights into its use in real-world daily life. To address this gap, we adopt a digital ethnographic approach, analysing 27 hours of 112 YouTube videos featuring early adopters. These videos capture usage ranging from continuous periods of hours to intermittent use over weeks and months. Our analysis shows that currently, wearable AR is primarily used for media consumption and gaming. While productivity is a desired use case, frequent use is constrained by current hardware limitations and the nascent application ecosystem. Users seek continuity in their digital experience, desiring functionalities similar to those on smartphones, tablets, or computers. We propose implications for everyday AR development that promote adoption while ensuring safe, ethical, and socially-aware integration into daily life.

Authors:Jianxin Sun, David Lenz, Hongfeng Yu, Tom Peterka
Title: Make the Fastest Faster: Importance Mask for Interactive Volume Visualization using Reconstruction Neural Networks
Abstract:
Visualizing a large-scale volumetric dataset with high resolution is challenging due to the high computational time and space complexity. Recent deep-learning-based image inpainting methods significantly improve rendering latency by reconstructing a high-resolution image for visualization in constant time on GPU from a partially rendered image where only a small portion of pixels go through the expensive rendering pipeline. However, existing methods need to render every pixel of a predefined regular sampling pattern. In this work, we provide Importance Mask Learning (IML) and Synthesis (IMS) networks which are the first attempts to learn importance regions from the sampling pattern to further minimize the number of pixels to render by jointly considering the dataset, user's view parameters, and the downstream reconstruction neural networks. Our solution is a unified framework to handle various image inpainting-based visualization methods through the proposed differentiable compaction/decompaction layers. Experiments show our method can further improve the overall rendering latency of state-of-the-art volume visualization methods using reconstruction neural network for free when rendering scientific volumetric datasets. Our method can also directly optimize the off-the-shelf pre-trained reconstruction neural networks without elongated retraining.

Authors:Tunazzina Islam, Dan Goldwasser
Title: Can LLMs Assist Annotators in Identifying Morality Frames? -- Case Study on Vaccination Debate on Social Media
Abstract:
Nowadays, social media is pivotal in shaping public discourse, especially on polarizing issues like vaccination, where diverse moral perspectives influence individual opinions. In NLP, data scarcity and complexity of psycholinguistic tasks, such as identifying morality frames, make relying solely on human annotators costly, time-consuming, and prone to inconsistency due to cognitive load. To address these issues, we leverage large language models (LLMs), which are adept at adapting new tasks through few-shot learning, utilizing a handful of in-context examples coupled with explanations that connect examples to task principles. Our research explores LLMs' potential to assist human annotators in identifying morality frames within vaccination debates on social media. We employ a two-step process: generating concepts and explanations with LLMs, followed by human evaluation using a "think-aloud" tool. Our study shows that integrating LLMs into the annotation process enhances accuracy, reduces task difficulty, lowers cognitive load, suggesting a promising avenue for human-AI collaboration in complex psycholinguistic tasks.

Authors:David Chuan-En Lin, Hyeonsu B. Kang, Nikolas Martelaro, Aniket Kittur, Yan-Ying Chen, Matthew K. Hong
Title: Inkspire: Supporting Design Exploration with Generative AI through Analogical Sketching
Abstract:
With recent advancements in the capabilities of Text-to-Image (T2I) AI models, product designers have begun experimenting with them in their work. However, T2I models struggle to interpret abstract language and the current user experience of T2I tools can induce design fixation rather than a more iterative, exploratory process. To address these challenges, we developed Inkspire, a sketch-driven tool that supports designers in prototyping product design concepts with analogical inspirations and a complete sketch-to-design-to-sketch feedback loop. To inform the design of Inkspire, we conducted an exchange session with designers and distilled design goals for improving T2I interactions. In a within-subjects study comparing Inkspire to ControlNet, we found that Inkspire supported designers with more inspiration and exploration of design ideas, and improved aspects of the co-creative process by allowing designers to effectively grasp the current state of the AI to guide it towards novel design intentions.

Authors:Eduardo Davalos, Jorge Alberto Salas, Yike Zhang, Namrata Srivastava, Yashvitha Thatigotla, Abbey Gonzales, Sara McFadden, Sun-Joo Cho, Gautam Biswas, Amanda Goodwin
Title: Beyond Instructed Tasks: Recognizing In-the-Wild Reading Behaviors in the Classroom Using Eye Tracking
Abstract:
Understanding reader behaviors such as skimming, deep reading, and scanning is essential for improving educational instruction. While prior eye-tracking studies have trained models to recognize reading behaviors, they often rely on instructed reading tasks, which can alter natural behaviors and limit the applicability of these findings to in-the-wild settings. Additionally, there is a lack of clear definitions for reading behavior archetypes in the literature. We conducted a classroom study to address these issues by collecting instructed and in-the-wild reading data. We developed a mixed-method framework, including a human-driven theoretical model, statistical analyses, and an AI classifier, to differentiate reading behaviors based on their velocity, density, and sequentiality. Our lightweight 2D CNN achieved an F1 score of 0.8 for behavior recognition, providing a robust approach for understanding in-the-wild reading. This work advances our ability to provide detailed behavioral insights to educators, supporting more targeted and effective assessment and instruction.

Authors:Kevin Roitero, Dustin Wright, Michael Soprano, Isabelle Augenstein, Stefano Mizzaro
Title: Efficiency and Effectiveness of LLM-Based Summarization of Evidence in Crowdsourced Fact-Checking
Abstract:
Evaluating the truthfulness of online content is critical for combating misinformation. This study examines the efficiency and effectiveness of crowdsourced truthfulness assessments through a comparative analysis of two approaches: one involving full-length webpages as evidence for each claim, and another using summaries for each evidence document generated with a large language model. Using an A/B testing setting, we engage a diverse pool of participants tasked with evaluating the truthfulness of statements under these conditions. Our analysis explores both the quality of assessments and the behavioral patterns of participants. The results reveal that relying on summarized evidence offers comparable accuracy and error metrics to the Standard modality while significantly improving efficiency. Workers in the Summary setting complete a significantly higher number of assessments, reducing task duration and costs. Additionally, the Summary modality maximizes internal agreement and maintains consistent reliance on and perceived usefulness of evidence, demonstrating its potential to streamline large-scale truthfulness evaluations.

Authors:Xian Wang, Luyao Shen, Lei Chen, Mingming Fan, Lik-Hang Lee
Title: TeamPortal: Exploring Virtual Reality Collaboration Through Shared and Manipulating Parallel Views
Abstract:
Virtual Reality (VR) offers a unique collaborative experience, with parallel views playing a pivotal role in Collaborative Virtual Environments by supporting the transfer and delivery of items. Sharing and manipulating partners' views provides users with a broader perspective that helps them identify the targets and partner actions. We proposed TeamPortal accordingly and conducted two user studies with 72 participants (36 pairs) to investigate the potential benefits of interactive, shared perspectives in VR collaboration. Our first study compared ShaView and TeamPortal against a baseline in a collaborative task that encompassed a series of searching and manipulation tasks. The results show that TeamPortal significantly reduced movement and increased collaborative efficiency and social presence in complex tasks. Following the results, the second study evaluated three variants: TeamPortal+, SnapTeamPortal+, and DropTeamPortal+. The results show that both SnapTeamPortal+ and DropTeamPortal+ improved task efficiency and willingness to further adopt these technologies, though SnapTeamPortal+ reduced co-presence. Based on the findings, we proposed three design implications to inform the development of future VR collaboration systems.

Authors:Maxyn Leitner, Rebecca Dorn, Fred Morstatter, Kristina Lerman
Title: Characterizing Network Structure of Anti-Trans Actors on TikTok
Abstract:
The recent proliferation of short form video social media sites such as TikTok has been effectively utilized for increased visibility, communication, and community connection amongst trans/nonbinary creators online. However, these same platforms have also been exploited by right-wing actors targeting trans/nonbinary people, enabling such anti-trans actors to efficiently spread hate speech and propaganda. Given these divergent groups, what are the differences in network structure between anti-trans and pro-trans communities on TikTok, and to what extent do they amplify the effects of anti-trans content? In this paper, we collect a sample of TikTok videos containing pro and anti-trans content, and develop a taxonomy of trans related sentiment to enable the classification of content on TikTok, and ultimately analyze the reply network structures of pro-trans and anti-trans communities. In order to accomplish this, we worked with hired expert data annotators from the trans/nonbinary community in order to generate a sample of highly accurately labeled data. From this subset, we utilized a novel classification pipeline leveraging Retrieval-Augmented Generation (RAG) with annotated examples and taxonomy definitions to classify content into pro-trans, anti-trans, or neutral categories. We find that incorporating our taxonomy and its logics into our classification engine results in improved ability to differentiate trans related content, and that Results from network analysis indicate many interactions between posters of pro-trans and anti-trans content exist, further demonstrating targeting of trans individuals, and demonstrating the need for better content moderation tools

Authors:Luca-Maxim Meinhardt, Clara Schramm, Pascal Jansen, Mark Colley, Enrico Rukzio
Title: Fly Away: Evaluating the Impact of Motion Fidelity on Optimized User Interface Design via Bayesian Optimization in Automated Urban Air Mobility Simulations
Abstract:
Automated Urban Air Mobility (UAM) can improve passenger transportation and reduce congestion, but its success depends on passenger trust. While initial research addresses passengers' information needs, questions remain about how to simulate air taxi flights and how these simulations impact users and interface requirements. We conducted a between-subjects study (N=40), examining the influence of motion fidelity in Virtual-Reality-simulated air taxi flights on user effects and interface design. Our study compared simulations with and without motion cues using a 3-Degrees-of-Freedom motion chair. Optimizing the interface design across six objectives, such as trust and mental demand, we used multi-objective Bayesian optimization to determine the most effective design trade-offs. Our results indicate that motion fidelity decreases users' trust, understanding, and acceptance, highlighting the need to consider motion fidelity in future UAM studies to approach realism. However, minimal evidence was found for differences or equality in the optimized interface designs, suggesting personalized interface designs.

Authors:Luca-Maxim Meinhardt, Maryam Elhaidary, Mark Colley, Michael Rietzler, Jan Ole Rixen, Aditya Kumar Purohit, Enrico Rukzio
Title: Scrolling in the Deep: Analysing Contextual Influences on Intervention Effectiveness during Infinite Scrolling on Social Media
Abstract:
Infinite scrolling on social media platforms is designed to encourage prolonged engagement, leading users to spend more time than desired, which can provoke negative emotions. Interventions to mitigate infinite scrolling have shown initial success, yet users become desensitized due to the lack of contextual relevance. Understanding how contextual factors influence intervention effectiveness remains underexplored. We conducted a 7-day user study (N=72) investigating how these contextual factors affect users' reactance and responsiveness to interventions during infinite scrolling. Our study revealed an interplay, with contextual factors such as being at home, sleepiness, and valence playing significant roles in the intervention's effectiveness. Low valence coupled with being at home slows down the responsiveness to interventions, and sleepiness lowers reactance towards interventions, increasing user acceptance of the intervention. Overall, our work contributes to a deeper understanding of user responses toward interventions and paves the way for developing more effective interventions during infinite scrolling.

Authors:Luca-Maxim Meinhardt, Lina Weilke, Maryam Elhaidary, Julia von Abel, Paul Fink, Michael Rietzler, Mark Colley, Enrico Rukzio
Title: Light My Way: Developing and Exploring a Multimodal Interface to Assist People With Visual Impairments to Exit Highly Automated Vehicles
Abstract:
The introduction of Highly Automated Vehicles (HAVs) has the potential to increase the independence of blind and visually impaired people (BVIPs). However, ensuring safety and situation awareness when exiting these vehicles in unfamiliar environments remains challenging. To address this, we conducted an interactive workshop with N=5 BVIPs to identify their information needs when exiting an HAV and evaluated three prior-developed low-fidelity prototypes. The insights from this workshop guided the development of PathFinder, a multimodal interface combining visual, auditory, and tactile modalities tailored to BVIP's unique needs. In a three-factorial within-between-subject study with N=16 BVIPs, we evaluated PathFinder against an auditory-only baseline in urban and rural scenarios. PathFinder significantly reduced mental demand and maintained high perceived safety in both scenarios, while the auditory baseline led to lower perceived safety in the urban scenario compared to the rural one. Qualitative feedback further supported PathFinder's effectiveness in providing spatial orientation during exiting.

Authors:Mark Colley, Pascal Jansen, Mugdha Keskar, Enrico Rukzio
Title: Improving External Communication of Automated Vehicles Using Bayesian Optimization
Abstract:
The absence of a human operator in automated vehicles (AVs) may require external Human-Machine Interfaces (eHMIs) to facilitate communication with other road users in uncertain scenarios, for example, regarding the right of way. Given the plethora of adjustable parameters, balancing visual and auditory elements is crucial for effective communication with other road users. With N=37 participants, this study employed multi-objective Bayesian optimization to enhance eHMI designs and improve trust, safety perception, and mental demand. By reporting the Pareto front, we identify optimal design trade-offs. This research contributes to the ongoing standardization efforts of eHMIs, supporting broader adoption.

Authors:Pascal Jansen, Mark Colley, Svenja Krauß, Daniel Hirschle, Enrico Rukzio
Title: OptiCarVis: Improving Automated Vehicle Functionality Visualizations Using Bayesian Optimization to Enhance User Experience
Abstract:
Automated vehicle (AV) acceptance relies on their understanding via feedback. While visualizations aim to enhance user understanding of AV's detection, prediction, and planning functionalities, establishing an optimal design is challenging. Traditional "one-size-fits-all" designs might be unsuitable, stemming from resource-intensive empirical evaluations. This paper introduces OptiCarVis, a set of Human-in-the-Loop (HITL) approaches using Multi-Objective Bayesian Optimization (MOBO) to optimize AV feedback visualizations. We compare conditions using eight expert and user-customized designs for a Warm-Start HITL MOBO. An online study (N=117) demonstrates OptiCarVis's efficacy in significantly improving trust, acceptance, perceived safety, and predictability without increasing cognitive load. OptiCarVis facilitates a comprehensive design space exploration, enhancing in-vehicle interfaces for optimal passenger experiences and broader applicability.

Authors:Ubaidullah Khan, Raveen Wijewickrama, Buddhi Ashan M. K., A. H. M. Nazmus Sakib, Khoi Trinh, Christina Duthie, Nima Najafian, Ahmer Patel, R. N. Molina, Anindya Maiti, Sushil K. Prasad, Greg P. Griffin, Murtuza Jadliwala
Title: ScooterLab: A Programmable and Participatory Sensing Research Testbed using Micromobility Vehicles
Abstract:
Micromobility vehicles, such as e-scooters, are increasingly popular in urban communities but present significant challenges in terms of road safety, user privacy, infrastructure planning, and civil engineering. Addressing these critical issues requires a large-scale and easily accessible research infrastructure to collect diverse mobility and contextual data from micromobility users in realistic settings. To this end, we present ScooterLab, a community research testbed comprising a fleet of customizable battery-powered micromobility vehicles retrofitted with advanced sensing, communication, and control capabilities. ScooterLab enables interdisciplinary research at the intersection of computing, mobility, and urban planning by providing researchers with tools to design and deploy customized sensing experiments and access curated datasets. The testbed will enable advances in machine learning, privacy, and urban transportation research while promoting sustainable mobility.

Authors:Yunzhe Li, Facheng Hu, Hongzi Zhu, Quan Liu, Xiaoke Zhao, Jiangang Shen, Shan Chang, Minyi Guo
Title: Prism: Mining Task-aware Domains in Non-i.i.d. IMU Data for Flexible User Perception
Abstract:
A wide range of user perception applications leverage inertial measurement unit (IMU) data for online prediction. However, restricted by the non-i.i.d. nature of IMU data collected from mobile devices, most systems work well only in a controlled setting (e.g., for a specific user in particular postures), limiting application scenarios. To achieve uncontrolled online prediction on mobile devices, referred to as the flexible user perception (FUP) problem, is attractive but hard. In this paper, we propose a novel scheme, called Prism, which can obtain high FUP accuracy on mobile devices. The core of Prism is to discover task-aware domains embedded in IMU dataset, and to train a domain-aware model on each identified domain. To this end, we design an expectation-maximization (EM) algorithm to estimate latent domains with respect to the specific downstream perception task. Finally, the best-fit model can be automatically selected for use by comparing the test sample and all identified domains in the feature space. We implement Prism on various mobile devices and conduct extensive experiments. Results demonstrate that Prism can achieve the best FUP performance with a low latency.

Authors:Vasanth Reddy Baddam, Behdad Chalaki, Vaishnav Tadiparthi, Hossein Nourkhiz Mahjoub, Ehsan Moradi-Pari, Hoda Eldardiry, Almuatazbellah Boker
Title: In Search of a Lost Metric: Human Empowerment as a Pillar of Socially Conscious Navigation
Abstract:
In social robot navigation, traditional metrics like proxemics and behavior naturalness emphasize human comfort and adherence to social norms but often fail to capture an agent's autonomy and adaptability in dynamic environments. This paper introduces human empowerment, an information-theoretic concept that measures a human's ability to influence their future states and observe those changes, as a complementary metric for evaluating social compliance. This metric reveals how robot navigation policies can indirectly impact human empowerment. We present a framework that integrates human empowerment into the evaluation of social performance in navigation tasks. Through numerical simulations, we demonstrate that human empowerment as a metric not only aligns with intuitive social behavior, but also shows statistically significant differences across various robot navigation policies. These results provide a deeper understanding of how different policies affect social compliance, highlighting the potential of human empowerment as a complementary metric for future research in social navigation.

Authors:Arisa Cowe, Tyson Neuroth, Qi Wu, Martin Rieth, Jacqueline Chen, Myoungkyu Lee, Kwan-Liu Ma
Title: Glyph-Based Multiscale Visualization of Turbulent Multi-Physics Statistics
Abstract:
Many scientific and engineering problems involving multi-physics span a wide range of scales. Understanding the interactions across these scales is essential for fully comprehending such complex problems. However, visualizing multivariate, multiscale data within an integrated view where correlations across space, scales, and fields are easily perceived remains challenging. To address this, we introduce a novel local spatial statistical visualization of flow fields across multiple fields and turbulence scales. Our method leverages the curvelet transform for scale decomposition of fields of interest, a level-set-restricted centroidal Voronoi tessellation to partition the spatial domain into local regions for statistical aggregation, and a set of glyph designs that combines information across scales and fields into a single, or reduced set of perceivable visual representations. Each glyph represents data aggregated within a Voronoi region and is positioned at the Voronoi site for direct visualization in a 3D view centered around flow features of interest. We implement and integrate our method into an interactive visualization system where the glyph-based technique operates in tandem with linked 3D spatial views and 2D statistical views, supporting a holistic analysis. We demonstrate with case studies visualizing turbulent combustion data--multi-scalar compressible flows--and turbulent incompressible channel flow data. This new capability enables scientists to better understand the interactions between multiple fields and length scales in turbulent flows.

Authors:Aditya Bhattacharya, Simone Stumpf, Katrien Verbert
Title: Importance of User Control in Data-Centric Steering for Healthcare Experts
Abstract:
As Artificial Intelligence (AI) becomes increasingly integrated into high-stakes domains like healthcare, effective collaboration between healthcare experts and AI systems is critical. Data-centric steering, which involves fine-tuning prediction models by improving training data quality, plays a key role in this process. However, little research has explored how varying levels of user control affect healthcare experts during data-centric steering. We address this gap by examining manual and automated steering approaches through a between-subjects, mixed-methods user study with 74 healthcare experts. Our findings show that manual steering, which grants direct control over training data, significantly improves model performance while maintaining trust and system understandability. Based on these findings, we propose design implications for a hybrid steering system that combines manual and automated approaches to increase user involvement during human-AI collaboration.

Authors:Thomas Barbera, Jacopo Burger, Alessandro D'Amelio, Simone Zini, Simone Bianco, Raffaella Lanzarotti, Paolo Napoletano, Giuseppe Boccignone, Jose Luis Contreras-Vidal
Title: On using AI for EEG-based BCI applications: problems, current challenges and future trends
Abstract:
Imagine unlocking the power of the mind to communicate, create, and even interact with the world around us. Recent breakthroughs in Artificial Intelligence (AI), especially in how machines "see" and "understand" language, are now fueling exciting progress in decoding brain signals from scalp electroencephalography (EEG). Prima facie, this opens the door to revolutionary brain-computer interfaces (BCIs) designed for real life, moving beyond traditional uses to envision Brain-to-Speech, Brain-to-Image, and even a Brain-to-Internet of Things (BCIoT). However, the journey is not as straightforward as it was for Computer Vision (CV) and Natural Language Processing (NLP). Applying AI to real-world EEG-based BCIs, particularly in building powerful foundational models, presents unique and intricate hurdles that could affect their reliability. Here, we unfold a guided exploration of this dynamic and rapidly evolving research area. Rather than barely outlining a map of current endeavors and results, the goal is to provide a principled navigation of this hot and cutting-edge research landscape. We consider the basic paradigms that emerge from a causal perspective and the attendant challenges presented to AI-based models. Looking ahead, we then discuss promising research avenues that could overcome today's technological, methodological, and ethical limitations. Our aim is to lay out a clear roadmap for creating truly practical and effective EEG-based BCI solutions that can thrive in everyday environments.

Authors:Weiqi Lu, Yongqiang Tian, Xiaohan Zhong, Haoyang Ma, Zhenyang Xu, Shing-Chi Cheung, Chengnian Sun
Title: An Empirical Study of Bugs in Data Visualization Libraries
Abstract:
Data visualization (DataViz) libraries play a crucial role in presentation, data analysis, and application development, underscoring the importance of their accuracy in transforming data into visual representations. Incorrect visualizations can adversely impact user experience, distort information conveyance, and influence user perception and decision-making processes. Visual bugs in these libraries can be particularly insidious as they may not cause obvious errors like crashes, but instead mislead users of the underlying data graphically, resulting in wrong decision making. Consequently, a good understanding of the unique characteristics of bugs in DataViz libraries is essential for researchers and developers to detect and fix bugs in DataViz libraries. This study presents the first comprehensive analysis of bugs in DataViz libraries, examining 564 bugs collected from five widely-used libraries. Our study systematically analyzes their symptoms and root causes, and provides a detailed taxonomy. We found that incorrect/inaccurate plots are pervasive in DataViz libraries and incorrect graphic computation is the major root cause, which necessitates further automated testing methods for DataViz libraries. Moreover, we identified eight key steps to trigger such bugs and two test oracles specific to DataViz libraries, which may inspire future research in designing effective automated testing techniques. Furthermore, with the recent advancements in Vision Language Models (VLMs), we explored the feasibility of applying these models to detect incorrect/inaccurate plots. The results show that the effectiveness of VLMs in bug detection varies from 29% to 57%, depending on the prompts, and adding more information in prompts does not necessarily increase the effectiveness. More findings can be found in our manuscript.

Authors:Shayan Talaei, Meijin Li, Kanu Grover, James Kent Hippler, Diyi Yang, Amin Saberi
Title: StorySage: Conversational Autobiography Writing Powered by a Multi-Agent Framework
Abstract:
Every individual carries a unique and personal life story shaped by their memories and experiences. However, these memories are often scattered and difficult to organize into a coherent narrative, a challenge that defines the task of autobiography writing. Existing conversational writing assistants tend to rely on generic user interactions and pre-defined guidelines, making it difficult for these systems to capture personal memories and develop a complete biography over time. We introduce StorySage, a user-driven software system designed to meet the needs of a diverse group of users that supports a flexible conversation and a structured approach to autobiography writing. Powered by a multi-agent framework composed of an Interviewer, Session Scribe, Planner, Section Writer, and Session Coordinator, our system iteratively collects user memories, updates their autobiography, and plans for future conversations. In experimental simulations, StorySage demonstrates its ability to navigate multiple sessions and capture user memories across many conversations. User studies (N=28) highlight how StorySage maintains improved conversational flow, narrative completeness, and higher user satisfaction when compared to a baseline. In summary, StorySage contributes both a novel architecture for autobiography writing and insights into how multi-agent systems can enhance human-AI creative partnerships.

Authors:Toshiaki Tsuji, Yasuhiro Kato, Gokhan Solak, Heng Zhang, Tadej Petrič, Francesco Nori, Arash Ajoudani
Title: A Survey on Imitation Learning for Contact-Rich Tasks in Robotics
Abstract:
This paper comprehensively surveys research trends in imitation learning for contact-rich robotic tasks. Contact-rich tasks, which require complex physical interactions with the environment, represent a central challenge in robotics due to their nonlinear dynamics and sensitivity to small positional deviations. The paper examines demonstration collection methodologies, including teaching methods and sensory modalities crucial for capturing subtle interaction dynamics. We then analyze imitation learning approaches, highlighting their applications to contact-rich manipulation. Recent advances in multimodal learning and foundation models have significantly enhanced performance in complex contact tasks across industrial, household, and healthcare domains. Through systematic organization of current research and identification of challenges, this survey provides a foundation for future advancements in contact-rich robotic manipulation.

Authors:Yuchong Zhang, Bastian Orthmann, Shichen Ji, Michael Welle, Jonne Van Haastregt, Danica Kragic
Title: Multimodal "Puppeteer": An Exploration of Robot Teleoperation Via Virtual Counterpart with LLM-Driven Voice and Gesture Interaction in Augmented Reality
Abstract:
The integration of robotics and augmented reality (AR) holds transformative potential for advancing human-robot interaction (HRI), offering enhancements in usability, intuitiveness, accessibility, and collaborative task performance. This paper introduces and evaluates a novel multimodal AR-based robot puppeteer framework that enables intuitive teleoperation via virtual counterpart through large language model (LLM)-driven voice commands and hand gesture interactions. Utilizing the Meta Quest 3, users interact with a virtual counterpart robot in real-time, effectively "puppeteering" its physical counterpart within an AR environment. We conducted a within-subject user study with 42 participants performing robotic cube pick-and-place with pattern matching tasks under two conditions: gesture-only interaction and combined voice-and-gesture interaction. Both objective performance metrics and subjective user experience (UX) measures were assessed, including an extended comparative analysis between roboticists and non-roboticists. The results provide key insights into how multimodal input influences contextual task efficiency, usability, and user satisfaction in AR-based HRI. Our findings offer practical design implications for designing effective AR-enhanced HRI systems.

Authors:Zewei, Tian, Alex Liu, Lief Esbenshade, Shawon Sarkar, Zachary Zhang, Kevin He, Min Sun
Title: Implementation Considerations for Automated AI Grading of Student Work
Abstract:
This study explores the classroom implementation of an AI-powered grading platform in K-12 settings through a co-design pilot with 19 teachers. We combine platform usage logs, surveys, and qualitative interviews to examine how teachers use AI-generated rubrics and grading feedback. Findings reveal that while teachers valued the AI's rapid narrative feedback for formative purposes, they distrusted automated scoring and emphasized the need for human oversight. Students welcomed fast, revision-oriented feedback but remained skeptical of AI-only grading. We discuss implications for the design of trustworthy, teacher-centered AI assessment tools that enhance feedback while preserving pedagogical agency.

Authors:Owen Xingjian Zhang, Sohyeon Hwang, Yuhan Liu, Manoel Horta Ribeiro, Andrés Monroy-Hernández
Title: Understanding Community-Level Blocklists in Decentralized Social Media
Abstract:
Community-level blocklists are key to content moderation practices in decentralized social media. These blocklists enable moderators to prevent other communities, such as those acting in bad faith, from interacting with their own -- and, if shared publicly, warn others about communities worth blocking. Prior work has examined blocklists in centralized social media, noting their potential for collective moderation outcomes, but has focused on blocklists as individual-level tools. To understand how moderators perceive and utilize community-level blocklists and what additional support they may need, we examine social media communities running Mastodon, an open-source microblogging software built on the ActivityPub protocol. We conducted (1) content analysis of the community-level blocklist ecosystem, and (2) semi-structured interviews with twelve Mastodon moderators. Our content analysis revealed wide variation in blocklist goals, inclusion criteria, and transparency. Interviews showed moderators balance proactive safety, reactive practices, and caution around false positives when using blocklists for moderation. They noted challenges and limitations in current blocklist use, suggesting design improvements like comment receipts, category filters, and collaborative voting. We discuss implications for decentralized content moderation, highlighting trade-offs between openness, safety, and nuance; the complexity of moderator roles; and opportunities for future design.

Authors:Shahbaz Rezaei, Avishai Halev, Xin Liu
Title: On the Necessity of Multi-Domain Explanation: An Uncertainty Principle Approach for Deep Time Series Models
Abstract:
A prevailing approach to explain time series models is to generate attribution in time domain. A recent development in time series XAI is the concept of explanation spaces, where any model trained in the time domain can be interpreted with any existing XAI method in alternative domains, such as frequency. The prevailing approach is to present XAI attributions either in the time domain or in the domain where the attribution is most sparse. In this paper, we demonstrate that in certain cases, XAI methods can generate attributions that highlight fundamentally different features in the time and frequency domains that are not direct counterparts of one another. This suggests that both domains' attributions should be presented to achieve a more comprehensive interpretation. Thus it shows the necessity of multi-domain explanation. To quantify when such cases arise, we introduce the uncertainty principle (UP), originally developed in quantum mechanics and later studied in harmonic analysis and signal processing, to the XAI literature. This principle establishes a lower bound on how much a signal can be simultaneously localized in both the time and frequency domains. By leveraging this concept, we assess whether attributions in the time and frequency domains violate this bound, indicating that they emphasize distinct features. In other words, UP provides a sufficient condition that the time and frequency domain explanations do not match and, hence, should be both presented to the end user. We validate the effectiveness of this approach across various deep learning models, XAI methods, and a wide range of classification and forecasting datasets. The frequent occurrence of UP violations across various datasets and XAI methods highlights the limitations of existing approaches that focus solely on time-domain explanations. This underscores the need for multi-domain explanations as a new paradigm.

Authors:Nora Graves, Vitus Larrieu, Yingyue Trace Zhang, Joanne Peng, Varun Nagaraj Rao, Yuhan Liu, Andrés Monroy-Hernández
Title: GPTFootprint: Increasing Consumer Awareness of the Environmental Impacts of LLMs
Abstract:
With the growth of AI, researchers are studying how to mitigate its environmental impact, primarily by proposing policy changes and increasing awareness among developers. However, research on AI end users is limited. Therefore, we introduce GPTFootprint, a browser extension that aims to increase consumer awareness of the significant water and energy consumption of LLMs, and reduce unnecessary LLM usage. GPTFootprint displays a dynamically updating visualization of the resources individual users consume through their ChatGPT queries. After a user reaches a set query limit, a popup prompts them to take a break from ChatGPT. In a week-long user study, we found that GPTFootprint increases people's awareness of environmental impact, but has limited success in decreasing ChatGPT usage. This research demonstrates the potential for individual-level interventions to contribute to the broader goal of sustainable AI usage, and provides insights into the effectiveness of awareness-based behavior modification strategies in the context of LLMs.

Authors:Jad Bendarkawi, Ashley Ponce, Sean Mata, Aminah Aliu, Yuhan Liu, Lei Zhang, Amna Liaqat, Varun Nagaraj Rao, Andrés Monroy-Hernández
Title: ConversAR: Exploring Embodied LLM-Powered Group Conversations in Augmented Reality for Second Language Learners
Abstract:
Group conversations are valuable for second language (L2) learners as they provide opportunities to practice listening and speaking, exercise complex turn-taking skills, and experience group social dynamics in a target language. However, most existing Augmented Reality (AR)-based conversational learning tools focus on dyadic interactions rather than group dialogues. Although research has shown that AR can help reduce speaking anxiety and create a comfortable space for practicing speaking skills in dyadic scenarios, especially with Large Language Model (LLM)-based conversational agents, the potential for group language practice using these technologies remains largely unexplored. We introduce ConversAR, a gpt-4o powered AR application, that enables L2 learners to practice contextualized group conversations. Our system features two embodied LLM agents with vision-based scene understanding and live captions. In a system evaluation with 10 participants, users reported reduced speaking anxiety and increased learner autonomy compared to perceptions of in-person practice methods with other learners.

Authors:Maggie Wang, Ella Colby, Jennifer Okwara, Varun Nagaraj Rao, Yuhan Liu, Andrés Monroy-Hernández
Title: PolicyPulse: LLM-Synthesis Tool for Policy Researchers
Abstract:
Public opinion shapes policy, yet capturing it effectively to surface diverse perspectives remains challenging. This paper introduces PolicyPulse, an LLM-powered interactive system that synthesizes public experiences from online community discussions to help policy researchers author memos and briefs, leveraging curated real-world anecdotes. Given a specific topic (e.g., "Climate Change"), PolicyPulse returns an organized list of themes (e.g., "Biodiversity Loss" or "Carbon Pricing"), supporting each theme with relevant quotes from real-life anecdotes. We compared PolicyPulse outputs to authoritative policy reports. Additionally, we asked 11 policy researchers across multiple institutions in the Northeastern U.S to compare using PolicyPulse with their expert approach. We found that PolicyPulse's themes aligned with authoritative reports and helped spark research by analyzing existing data, gathering diverse experiences, revealing unexpected themes, and informing survey or interview design. Participants also highlighted limitations including insufficient demographic context and data verification challenges. Our work demonstrates how AI-powered tools can help influence policy-relevant research and shape policy outcomes.

Authors:Siyang Liu, Sahand Sabour, Xiaoyang Wang, Rada Mihalcea
Title: Free Lunch for User Experience: Crowdsourcing Agents for Scalable User Studies
Abstract:
We demonstrate the potential of anthropomorphized language agents to generate budget-friendly, moderate-fidelity, yet sufficiently insightful user experiences at scale, supporting fast, early-stage prototyping. We explore this through the case of prototyping Large Language Model-driven non-player characters (NPCs). We present Agentic H-CI, a framework that mirrors traditional user research processes-surveying, screening, experiencing, and collecting feedback and insights-with simulated agents. Using this approach, we easily construct a team of 240 player agents with a balanced range of player types and personality traits, at extremely low cost (\$0.28/player) and minimal time commitment (6.9 minutes/player). Content analysis shows that agent-based players behave in ways aligned with their simulated backgrounds, achieving 82.5\% alignment with designated profiles. From their interactions, we distill 11 user insights and 6 design implications to guide further development. To evaluate practical value, we conduct parallel user studies with human participants recruited locally and via crowdsourcing. Ratings from three professional game developers show that the agentic player team offers a Pareto-optimal and well-balanced trade-off across fidelity, cost, time efficiency, and insight helpfulness.

Authors:Ya-Fang Lin, Xiaotian Li, Wan-Hsuan Huang, Charan Pushpanathan Prabavathi, Jie Cai, John M. Carroll
Title: Parental Collaboration and Closeness: Envisioning with New Couple Parents
Abstract:
Couples often experience a decrease in closeness as they cope with the demands of parenthood. Existing technologies have supported parenting and parental collaboration. However, these technologies do not adequately support closeness in co-parenting. We use scenarios and design probes to brainstorm with 10 new parent couples to explore and envision possibilities for technologies to support closeness. We reported parents' current technology use for co-parenting and how participants considered and envisioned co-parenting technology for closeness, including information and task sharing, emotion awareness and disclosure, and fostering fun interaction. We discuss the potential technology has for fostering closeness in co-parenting by (1) fostering interdependence by supporting parental competence and (2) integrating positive emotions and experiences, such as validation and fun, in parenting. Based on our findings, we expand the design space of technology for closeness to include interdependence. We also expand the design space for co-parenting technology by integrating more positive emotions.

Authors:Prajwal Singh, Anupam Sharma, Pankaj Pandey, Krishna Miyapuram, Shanmuganathan Raman
Title: Dynamic Vision from EEG Brain Recordings, How much does EEG know?
Abstract:
Reconstructing dynamic visual stimuli from brain EEG recordings is challenging due to the non-stationary and noisy nature of EEG signals and the limited availability of EEG-video datasets. Prior work has largely focused on static image reconstruction, leaving the open question of whether EEG carries sufficient information for dynamic video decoding. In this work, we present EEGVid, a framework that reconstructs dynamic video stimuli from EEG signals while systematically probing the information they encode. Our approach first learns the EEG representation and then uses these features for video synthesis with a temporally conditioned StyleGAN-ADA that maps EEG embeddings to specific frame positions. Through experiments on three datasets (SEED, EEG-Video Action, SEED-DV), we demonstrate that EEG supports semantically meaningful reconstruction of dynamic visual content, and we quantify \emph{how much EEG knows}: (i) hemispheric asymmetry, with the left hemisphere more predictive of visual content and the right hemisphere of emotional content, (ii) the temporal lobe as the most informative region, and (iii) EEG timesteps 100--300 as the most critical for dynamic visual encoding. Importantly, while generative priors contribute fine spatial detail, EEG provides the semantic and temporal guidance necessary for reconstructing videos that align with the observed stimuli. This positions video generation not as a standalone generative benchmark, but as a means to visualize and validate the representational content of EEG in the context of dynamic vision.

Authors:Yuchen He, Jianbing Lv, Liqi Cheng, Lingyu Meng, Dazhen Deng, Yingcai Wu
Title: ProTAL: A Drag-and-Link Video Programming Framework for Temporal Action Localization
Abstract:
Temporal Action Localization (TAL) aims to detect the start and end timestamps of actions in a video. However, the training of TAL models requires a substantial amount of manually annotated data. Data programming is an efficient method to create training labels with a series of human-defined labeling functions. However, its application in TAL faces difficulties of defining complex actions in the context of temporal video frames. In this paper, we propose ProTAL, a drag-and-link video programming framework for TAL. ProTAL enables users to define \textbf{key events} by dragging nodes representing body parts and objects and linking them to constrain the relations (direction, distance, etc.). These definitions are used to generate action labels for large-scale unlabelled videos. A semi-supervised method is then employed to train TAL models with such labels. We demonstrate the effectiveness of ProTAL through a usage scenario and a user study, providing insights into designing video programming framework.

Authors:Omar Shaikh, Shardul Sapkota, Shan Rizvi, Eric Horvitz, Joon Sung Park, Diyi Yang, Michael S. Bernstein
Title: Creating General User Models from Computer Use
Abstract:
Human-computer interaction has long imagined technology that understands us-from our preferences and habits, to the timing and purpose of our everyday actions. Yet current user models remain fragmented, narrowly tailored to specific apps, and incapable of the flexible reasoning required to fulfill these visions. This paper presents an architecture for a general user model (GUM) that learns about you by observing any interaction you have with your computer. The GUM takes as input any unstructured observation of a user (e.g., device screenshots) and constructs confidence-weighted propositions that capture user knowledge and preferences. GUMs can infer that a user is preparing for a wedding they're attending from messages with a friend. Or recognize that a user is struggling with a collaborator's feedback on a draft by observing multiple stalled edits and a switch to reading related work. GUMs introduce an architecture that infers new propositions about a user from multimodal observations, retrieves related propositions for context, and continuously revises existing propositions. To illustrate the breadth of applications that GUMs enable, we demonstrate how they augment chat-based assistants with context, manage OS notifications to selectively surface important information, and enable interactive agents that adapt to preferences across apps. We also instantiate proactive assistants (GUMBOs) that discover and execute useful suggestions on a user's behalf using their GUM. In our evaluations, we find that GUMs make calibrated and accurate inferences about users, and that assistants built on GUMs proactively identify and perform actions that users wouldn't think to request explicitly. Altogether, GUMs introduce methods that leverage multimodal models to understand unstructured context, enabling long-standing visions of HCI and entirely new interactive systems that anticipate user needs.

Authors:Xiaozhou Ye, Kevin I-Kai Wang
Title: Domain-Adversarial Anatomical Graph Networks for Cross-User Human Activity Recognition
Abstract:
Cross-user variability in Human Activity Recognition (HAR) remains a critical challenge due to differences in sensor placement, body dynamics, and behavioral patterns. Traditional methods often fail to capture biomechanical invariants that persist across users, limiting their generalization capability. We propose an Edge-Enhanced Graph-Based Adversarial Domain Generalization (EEG-ADG) framework that integrates anatomical correlation knowledge into a unified graph neural network (GNN) architecture. By modeling three biomechanically motivated relationships together-Interconnected Units, Analogous Units, and Lateral Units-our method encodes domain-invariant features while addressing user-specific variability through Variational Edge Feature Extractor. A Gradient Reversal Layer (GRL) enforces adversarial domain generalization, ensuring robustness to unseen users. Extensive experiments on OPPORTUNITY and DSADS datasets demonstrate state-of-the-art performance. Our work bridges biomechanical principles with graph-based adversarial learning by integrating information fusion techniques. This fusion of information underpins our unified and generalized model for cross-user HAR.

Authors:Julian F. Schumann, Jeroen Hagenus, Frederik Baymler Mathiesen, Arkady Zgonnikov
Title: Realistic Adversarial Attacks for Robustness Evaluation of Trajectory Prediction Models via Future State Perturbation
Abstract:
Trajectory prediction is a key element of autonomous vehicle systems, enabling them to anticipate and react to the movements of other road users. Evaluating the robustness of prediction models against adversarial attacks is essential to ensure their reliability in real-world traffic. However, current approaches tend to focus on perturbing the past positions of surrounding agents, which can generate unrealistic scenarios and overlook critical vulnerabilities. This limitation may result in overly optimistic assessments of model performance in real-world conditions. In this work, we demonstrate that perturbing not just past but also future states of adversarial agents can uncover previously undetected weaknesses and thereby provide a more rigorous evaluation of model robustness. Our novel approach incorporates dynamic constraints and preserves tactical behaviors, enabling more effective and realistic adversarial attacks. We introduce new performance measures to assess the realism and impact of these adversarial trajectories. Testing our method on a state-of-the-art prediction model revealed significant increases in prediction errors and collision rates under adversarial conditions. Qualitative analysis further showed that our attacks can expose critical weaknesses, such as the inability of the model to detect potential collisions in what appear to be safe predictions. These results underscore the need for more comprehensive adversarial testing to better evaluate and improve the reliability of trajectory prediction models for autonomous vehicles.

Authors:Cristina Ioana Muntean, Franco Maria Nardini, Raffaele Perego, Guido Rocchietti, Cosimo Rulli
Title: Efficient Conversational Search via Topical Locality in Dense Retrieval
Abstract:
Pre-trained language models have been widely exploited to learn dense representations of documents and queries for information retrieval. While previous efforts have primarily focused on improving effectiveness and user satisfaction, response time remains a critical bottleneck of conversational search systems. To address this, we exploit the topical locality inherent in conversational queries, i.e., the tendency of queries within a conversation to focus on related topics. By leveraging query embedding similarities, we dynamically restrict the search space to semantically relevant document clusters, reducing computational complexity without compromising retrieval quality. We evaluate our approach on the TREC CAsT 2019 and 2020 datasets using multiple embedding models and vector indexes, achieving improvements in processing speed of up to 10.4X with little loss in performance (4.4X without any loss). Our results show that the proposed system effectively handles complex, multiturn queries with high precision and efficiency, offering a practical solution for real-time conversational search.

Authors:Yuhan Liu, Emmy Song, Owen Xingjian Zhang, Jewel Merriman, Lei Zhang, Andrés Monroy-Hernández
Title: Understanding Decentralized Social Feed Curation on Mastodon
Abstract:
As centralized social media platforms face growing concerns, more users are seeking greater control over their social feeds and turning to decentralized alternatives such as Mastodon. The decentralized nature of Mastodon creates unique opportunities for customizing feeds, yet user perceptions and curation strategies on these platforms remain unknown. This paper presents findings from a two-part interview study with 21 Mastodon users, exploring how they perceive, interact with, and manage their current feeds, and how we can better empower users to personalize their feeds on Mastodon. We use the qualitative findings of the first part of the study to guide the creation of Braids, a web-based prototype for feed curation. Results from the second part of our study, using Braids, highlighted opportunities and challenges for future research, particularly in using seamful design to enhance people's acceptance of algorithmic curation and nuanced trade-offs between machine learning-based and rule-based curation algorithms. To optimize user experience, we also discuss the tension between creating new apps and building add-ons in the decentralized social media realm.

Authors:Zipeng Ji, Pengcheng An, Jian Zhao
Title: ClassComet: Exploring and Designing AI-generated Danmaku in Educational Videos to Enhance Online Learning
Abstract:
Danmaku, users' live comments synchronized with, and overlaying on videos, has recently shown potential in promoting online video-based learning. However, user-generated danmaku can be scarce-especially in newer or less viewed videos and its quality is unpredictable, limiting its educational impact. This paper explores how large multimodal models (LMM) can be leveraged to automatically generate effective, high-quality danmaku. We first conducted a formative study to identify the desirable characteristics of content- and emotion-related danmaku in educational videos. Based on the obtained insights, we developed ClassComet, an educational video platform with novel LMM-driven techniques for generating relevant types of danmaku to enhance video-based learning. Through user studies, we examined the quality of generated danmaku and their influence on learning experiences. The results indicate that our generated danmaku is comparable to human-created ones, and videos with both content- and emotion-related danmaku showed significant improvement in viewers' engagement and learning outcome.

Authors:Mateo Espinosa Zarlenga, Gabriele Dominici, Pietro Barbiero, Zohreh Shams, Mateja Jamnik
Title: Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts
Abstract:
In this paper, we investigate how concept-based models (CMs) respond to out-of-distribution (OOD) inputs. CMs are interpretable neural architectures that first predict a set of high-level concepts (e.g., stripes, black) and then predict a task label from those concepts. In particular, we study the impact of concept interventions (i.e., operations where a human expert corrects a CM's mispredicted concepts at test time) on CMs' task predictions when inputs are OOD. Our analysis reveals a weakness in current state-of-the-art CMs, which we term leakage poisoning, that prevents them from properly improving their accuracy when intervened on for OOD inputs. To address this, we introduce MixCEM, a new CM that learns to dynamically exploit leaked information missing from its concepts only when this information is in-distribution. Our results across tasks with and without complete sets of concept annotations demonstrate that MixCEMs outperform strong baselines by significantly improving their accuracy for both in-distribution and OOD samples in the presence and absence of concept interventions.

Authors:Markus Amann, Malte Probst, Raphael Wenzel, Thomas H. Weisswange, Miguel Ángel Sotelo
Title: Optimal Behavior Planning for Implicit Communication using a Probabilistic Vehicle-Pedestrian Interaction Model
Abstract:
In interactions between automated vehicles (AVs) and crossing pedestrians, modeling implicit vehicle communication is crucial. In this work, we present a combined prediction and planning approach that allows to consider the influence of the planned vehicle behavior on a pedestrian and predict a pedestrian's reaction. We plan the behavior by solving two consecutive optimal control problems (OCPs) analytically, using variational calculus. We perform a validation step that assesses whether the planned vehicle behavior is adequate to trigger a certain pedestrian reaction, which accounts for the closed-loop characteristics of prediction and planning influencing each other. In this step, we model the influence of the planned vehicle behavior on the pedestrian using a probabilistic behavior acceptance model that returns an estimate for the crossing probability. The probabilistic modeling of the pedestrian reaction facilitates considering the pedestrian's costs, thereby improving cooperative behavior planning. We demonstrate the performance of the proposed approach in simulated vehicle-pedestrian interactions with varying initial settings and highlight the decision making capabilities of the planning approach.

Authors:Rui Qiu, Shijie Chen, Yu Su, Po-Yin Yen, Han-Wei Shen
Title: Completing A Systematic Review in Hours instead of Months with Interactive AI Agents
Abstract:
Systematic reviews (SRs) are vital for evidence-based practice in high stakes disciplines, such as healthcare, but are often impeded by intensive labors and lengthy processes that can take months to complete. Due to the high demand for domain expertise, existing automatic summarization methods fail to accurately identify relevant studies and generate high-quality summaries. To that end, we introduce InsightAgent, a human-centered interactive AI agent powered by large language models that revolutionize this workflow. InsightAgent partitions a large literature corpus based on semantics and employs a multi-agent design for more focused processing of literature, leading to significant improvement in the quality of generated SRs. InsightAgent also provides intuitive visualizations of the corpus and agent trajectories, allowing users to effortlessly monitor the actions of the agent and provide real-time feedback based on their expertise. Our user studies with 9 medical professionals demonstrate that the visualization and interaction mechanisms can effectively improve the quality of synthesized SRs by 27.2%, reaching 79.7% of human-written quality. At the same time, user satisfaction is improved by 34.4%. With InsightAgent, it only takes a clinician about 1.5 hours, rather than months, to complete a high-quality systematic review.

Authors:Shreya Shankar, Bhavya Chopra, Mawil Hasan, Stephen Lee, Björn Hartmann, Joseph M. Hellerstein, Aditya G. Parameswaran, Eugene Wu
Title: Steering Semantic Data Processing With DocWrangler
Abstract:
Unstructured text has long been difficult to automatically analyze at scale. Large language models (LLMs) now offer a way forward by enabling {\em semantic data processing}, where familiar data processing operators (e.g., map, reduce, filter) are powered by LLMs instead of code. However, building effective semantic data processing pipelines presents a departure from traditional data pipelines: users need to understand their data to write effective pipelines, yet they need to construct pipelines to extract the data necessary for that understanding -- all while navigating LLM idiosyncrasies and inconsistencies. We present \docwrangler, a mixed-initiative integrated development environment (IDE) for semantic data processing with three novel features to address the gaps between the user, their data, and their pipeline: {\em (i) In-Situ User Notes} that allows users to inspect, annotate, and track observations across documents and LLM outputs, {\em (ii) LLM-Assisted Prompt Refinement} that transforms user notes into improved operations, and {\em (iii) LLM-Assisted Operation Decomposition} that identifies when operations or documents are too complex for the LLM to correctly process and suggests decompositions. Our evaluation combines a think-aloud study with 10 participants and a public-facing deployment (available at \href{https://docetl.org/playground}{docetl.org/playground}) with 1,500+ recorded sessions, revealing how users develop systematic strategies for their semantic data processing tasks; e.g., transforming open-ended operations into classifiers for easier validation and intentionally using vague prompts to learn more about their data or LLM capabilities.

Authors:Weirui Peng, Yinuo Yang, Zheng Zhang, Toby Jia-Jun Li
Title: GLITTER: An AI-assisted Platform for Material-Grounded Asynchronous Discussion in Flipped Learning
Abstract:
Flipped classrooms promote active learning by having students engage with materials independently before class, allowing in-class time for collaborative problem-solving. During this pre-class phase, asynchronous online discussions help students build knowledge and clarify concepts with peers. However, it remains difficult to engage with temporally dispersed peer contributions, connect discussions with static learning materials, and prepare for in-class sessions based on their self-learning outcome. Our formative study identified cognitive challenges students encounter, including navigation barriers, reflection gaps, and contribution difficulty and anxiety. We present GLITTER, an AI-assisted discussion platform for pre-class learning in flipped classrooms. GLITTER helps students identify posts with shared conceptual dimensions, scaffold knowledge integration through conceptual blending, and enhance metacognition via personalized reflection reports. A lab study within subjects (n = 12) demonstrates that GLITTER improves discussion engagement, sparks new ideas, supports reflection, and increases preparedness for in-class activities.

Authors:Tace McNamara, Jon McCormack, Maria Teresa Llano
Title: Mixer Metaphors: audio interfaces for non-musical applications
Abstract:
The NIME conference traditionally focuses on interfaces for music and musical expression. In this paper we reverse this tradition to ask, can interfaces developed for music be successfully appropriated to non-musical applications? To help answer this question we designed and developed a new device, which uses interface metaphors borrowed from analogue synthesisers and audio mixing to physically control the intangible aspects of a Large Language Model. We compared two versions of the device, with and without the audio-inspired augmentations, with a group of artists who used each version over a one week period. Our results show that the use of audio-like controls afforded more immediate, direct and embodied control over the LLM, allowing users to creatively experiment and play with the device over its non-mixer counterpart. Our project demonstrates how cross-sensory metaphors can support creative thinking and embodied practice when designing new technological interfaces.

Authors:Akash V. Maharaj, David Arbour, Daniel Lee, Uttaran Bhattacharya, Anup Rao, Austin Zane, Avi Feller, Kun Qian, Yunyao Li
Title: Evaluation and Incident Prevention in an Enterprise AI Assistant
Abstract:
Enterprise AI Assistants are increasingly deployed in domains where accuracy is paramount, making each erroneous output a potentially significant incident. This paper presents a comprehensive framework for monitoring, benchmarking, and continuously improving such complex, multi-component systems under active development by multiple teams. Our approach encompasses three key elements: (1) a hierarchical ``severity'' framework for incident detection that identifies and categorizes errors while attributing component-specific error rates, facilitating targeted improvements; (2) a scalable and principled methodology for benchmark construction, evaluation, and deployment, designed to accommodate multiple development teams, mitigate overfitting risks, and assess the downstream impact of system modifications; and (3) a continual improvement strategy leveraging multidimensional evaluation, enabling the identification and implementation of diverse enhancement opportunities. By adopting this holistic framework, organizations can systematically enhance the reliability and performance of their AI Assistants, ensuring their efficacy in critical enterprise environments. We conclude by discussing how this multifaceted evaluation approach opens avenues for various classes of enhancements, paving the way for more robust and trustworthy AI systems.

Authors:Aditya Bhattacharya, Tim Vanherwegen, Katrien Verbert
Title: Show Me How: Benefits and Challenges of Agent-Augmented Counterfactual Explanations for Non-Expert Users
Abstract:
Counterfactual explanations offer actionable insights by illustrating how changes to inputs can lead to different outcomes. However, these explanations often suffer from ambiguity and impracticality, limiting their utility for non-expert users with limited AI knowledge. Augmenting counterfactual explanations with Large Language Models (LLMs) has been proposed as a solution, but little research has examined their benefits and challenges for non-experts. To address this gap, we developed a healthcare-focused system that leverages conversational AI agents to enhance counterfactual explanations, offering clear, actionable recommendations to help patients at high risk of cardiovascular disease (CVD) reduce their risk. Evaluated through a mixed-methods study with 34 participants, our findings highlight the effectiveness of agent-augmented counterfactuals in improving actionable recommendations. Results further indicate that users with prior experience using conversational AI demonstrated greater effectiveness in utilising these explanations compared to novices. Furthermore, this paper introduces a set of generic guidelines for creating augmented counterfactual explanations, incorporating safeguards to mitigate common LLM pitfalls, such as hallucinations, and ensuring the explanations are both actionable and contextually relevant for non-expert users.

Authors:Ionut Anghel, Tudor Cioara, Roberta Bevilacqua, Federico Barbarossa, Terje Grimstad, Riitta Hellman, Arnor Solberg, Lars Thomas Boye, Ovidiu Anchidin, Ancuta Nemes, Camilla Gabrielsen
Title: New care pathways for supporting transitional care from hospitals to home using AI and personalized digital assistance
Abstract:
Transitional care may play a vital role for the sustainability of Europe future healthcare system, offering solutions for relocating patient care from hospital to home therefore addressing the growing demand for medical care as the population is ageing. However, to be effective, it is essential to integrate innovative Information and Communications Technology technologies to ensure that patients with comorbidities experience a smooth and coordinated transition from hospitals or care centers to home, thereby reducing the risk of rehospitalization. In this paper, we present an overview of the integration of Internet of Things, artificial intelligence, and digital assistance technologies with traditional care pathways to address the challenges and needs of healthcare systems in Europe. We identify the current gaps in transitional care and define the technology mapping to enhance the care pathways, aiming to improve patient outcomes, safety, and quality of life avoiding hospital readmissions. Finally, we define the trial setup and evaluation methodology needed to provide clinical evidence that supports the positive impact of technology integration on patient care and discuss the potential effects on the healthcare system.

Authors:Yi-Fan Cao, Reza Hadi Mogavi, Meng Xia, Leo Yu-Ho Lo, Xiao-Qing Zhang, Mei-Jia Luo, Lennart E. Nacke, Yang Wang, Huamin Qu
Title: The Jade Gateway to Trust: Exploring How Socio-Cultural Perspectives Shape Trust Within Chinese NFT Communities
Abstract:
Today's world is witnessing an unparalleled rate of technological transformation. The emergence of non-fungible tokens (NFTs) has transformed how we handle digital assets and value. Despite their initial popularity, NFTs face declining adoption influenced not only by cryptocurrency volatility but also by trust dynamics within communities. From a social computing perspective, understanding these trust dynamics offers valuable insights for the development of both the NFT ecosystem and the broader digital economy. China presents a compelling context for examining these dynamics, offering a unique intersection of technological innovation and traditional cultural values. Through a content analysis of eight Chinese NFT-focused WeChat groups and 21 semi-structured interviews, we examine how socio-cultural factors influence trust formation and development. We found that trust in Chinese NFT communities is significantly molded by local cultural values. To be precise, Confucian virtues, such as benevolence, propriety, and integrity, play a crucial role in shaping these trust relationships. Our research identifies three critical trust dimensions in China's NFT market: (1) technological, (2) institutional, and (3) social. We examined the challenges in cultivating each dimension. Based on these insights, we developed tailored trust-building guidelines for Chinese NFT stakeholders. These guidelines address trust issues that factor into NFT's declining popularity and could offer valuable strategies for CSCW researchers, developers, and designers aiming to enhance trust in global NFT communities. Our research urges CSCW scholars to take into account the unique socio-cultural contexts when developing trust-enhancing strategies for digital innovations and online interactions.

Authors:Xiaoshan Zhou, Carol C. Menassa, Vineet R. Kamat
Title: Siamese Network with Dual Attention for EEG-Driven Social Learning: Bridging the Human-Robot Gap in Long-Tail Autonomous Driving
Abstract:
Robots with wheeled, quadrupedal, or humanoid forms are increasingly integrated into built environments. However, unlike human social learning, they lack a critical pathway for intrinsic cognitive development, namely, learning from human feedback during interaction. To understand human ubiquitous observation, supervision, and shared control in dynamic and uncertain environments, this study presents a brain-computer interface (BCI) framework that enables classification of Electroencephalogram (EEG) signals to detect cognitively demanding and safety-critical events. As a timely and motivating co-robotic engineering application, we simulate a human-in-the-loop scenario to flag risky events in semi-autonomous robotic driving-representative of long-tail cases that pose persistent bottlenecks to the safety performance of smart mobility systems and robotic vehicles. Drawing on recent advances in few-shot learning, we propose a dual-attention Siamese convolutional network paired with Dynamic Time Warping Barycenter Averaging approach to generate robust EEG-encoded signal representations. Inverse source localization reveals activation in Broadman areas 4 and 9, indicating perception-action coupling during task-relevant mental imagery. The model achieves 80% classification accuracy under data-scarce conditions and exhibits a nearly 100% increase in the utility of salient features compared to state-of-the-art methods, as measured through integrated gradient attribution. Beyond performance, this study contributes to our understanding of the cognitive architecture required for BCI agents-particularly the role of attention and memory mechanisms-in categorizing diverse mental states and supporting both inter- and intra-subject adaptation. Overall, this research advances the development of cognitive robotics and socially guided learning for service robots in complex built environments.

Authors:Andreas Naoum, Parag Khanna, Elmira Yadollahi, Mårten Björkman, Christian Smith
Title: Adapting Robot's Explanation for Failures Based on Observed Human Behavior in Human-Robot Collaboration
Abstract:
This work aims to interpret human behavior to anticipate potential user confusion when a robot provides explanations for failure, allowing the robot to adapt its explanations for more natural and efficient collaboration. Using a dataset that included facial emotion detection, eye gaze estimation, and gestures from 55 participants in a user study, we analyzed how human behavior changed in response to different types of failures and varying explanation levels. Our goal is to assess whether human collaborators are ready to accept less detailed explanations without inducing confusion. We formulate a data-driven predictor to predict human confusion during robot failure explanations. We also propose and evaluate a mechanism, based on the predictor, to adapt the explanation level according to observed human behavior. The promising results from this evaluation indicate the potential of this research in adapting a robot's explanations for failures to enhance the collaborative experience.

Authors:Yao Lyu, Jie Cai, John M. Carroll
Title: A Systematic Literature Review of Infrastructure Studies in SIGCHI
Abstract:
Infrastructure is an indispensable part of human life. Over the past decades, the Human-Computer Interaction (HCI) community has paid increasing attention to human interactions with infrastructure. In this paper, we conducted a systematic literature review on infrastructure studies in SIGCHI, one of the most influential communities in HCI. We collected a total of 190 primary studies, covering works published between 2006 and 2024. Most of these studies are inspired by Susan Leigh Star's notion of infrastructure. We identify three major themes in infrastructure studies: growing infrastructure, appropriating infrastructure, and coping with infrastructure. Our review highlights a prevailing trend in SIGCHI's infrastructure research: a focus on informal infrastructural activities across various sociotechnical contexts. In particular, we examine studies that problematize infrastructure and alert the HCI community to its potentially harmful aspects.

Authors:Carolina Carreira, Nuno Saavedra, Alexandra Mendes, João F. Ferreira
Title: From "Worse is Better" to Better: Lessons from a Mixed Methods Study of Ansible's Challenges
Abstract:
Infrastructure as Code (IaC) tools have transformed the way IT infrastructure is automated and managed, but their growing adoption has also exposed numerous challenges for practitioners. In this paper, we investigate these challenges through the lens of Ansible, a popular IaC tool. Using a mixed methods approach, we investigate challenges, obstacles, and issues faced by practitioners. We analyze 59,157 posts from Stack Overflow, Reddit, and the Ansible Forum to identify common pain points, complemented by 16 semi-structured interviews with practitioners of varying expertise levels. Based on our findings, we propose four main recommendations to improve Ansible: 1) refactoring to mitigate performance issues, 2) restructuring higher-level language concepts, 3) improved debugging and error reporting tools, and 4) better documentation and learning resources. By highlighting the real-world struggles of Ansible users, we provide actionable insights for tool designers, educators, and the broader IaC community, contributing to a deeper understanding of the trade-offs inherent in IaC tools.

Authors:Carolina Carreira, João F. Ferreira, Alexandra Mendes, Nicolas Christin
Title: Are Users More Willing to Use Formally Verified Password Managers?
Abstract:
Formal verification has recently been increasingly used to prove the correctness and security of many applications. It is attractive because it can prove the absence of errors with the same certainty as mathematicians proving theorems. However, while most security experts recognize the value of formal verification, the views of non-technical users on this topic are unknown. To address this issue, we designed and implemented two experiments to understand how formal verification impacts users. Our approach started with a formative study involving 15 participants, followed by the main quantitative study with 200 individuals. We focus on the application domain of password managers since it has been documented that the lack of trust in password managers might lead to lower adoption. Moreover, recent efforts have focused on formally verifying (parts of) password managers. We conclude that formal verification is seen as desirable by users and identify three actional recommendations to improve formal verification communication efforts.

Authors:Carolina Carreira, Alexandra Mendes, João F. Ferreira, Nicolas Christin
Title: A Systematic Review of Security Communication Strategies: Guidelines and Open Challenges
Abstract:
Cybersecurity incidents such as data breaches have become increasingly common, affecting millions of users and organizations worldwide. The complexity of cybersecurity threats challenges the effectiveness of existing security communication strategies. Through a systematic review of over 3,400 papers, we identify specific user difficulties including information overload, technical jargon comprehension, and balancing security awareness with comfort. Our findings reveal consistent communication paradoxes: users require technical details for credibility yet struggle with jargon and need risk awareness without experiencing anxiety. We propose seven evidence-based guidelines to improve security communication and identify critical research gaps including limited studies with older adults, children, and non-US populations, insufficient longitudinal research, and limited protocol sharing for reproducibility. Our guidelines emphasize user-centric communication adapted to cultural and demographic differences while ensuring security advice remains actionable. This work contributes to more effective security communication practices that enable users to recognize and respond to cybersecurity threats appropriately.

Authors:Mona Zavichi, André Santos, Catarina Moreira, Anderson Maciel, Joaquim Jorge
Title: Gaze-Hand Steering for Travel and Multitasking in Virtual Environments
Abstract:
As head-mounted displays (HMDs) with eye-tracking become increasingly accessible, the need for effective gaze-based interfaces in virtual reality (VR) grows. Traditional gaze- or hand-based navigation often limits user precision or impairs free viewing, making multitasking difficult. We present a gaze-hand steering technique that combines eye-tracking with hand-pointing: users steer only when gaze aligns with a hand-defined target, reducing unintended actions and enabling free look. Speed is controlled via either a joystick or a waist-level speed circle. We evaluated our method in a user study (N=20) across multitasking and single-task scenarios, comparing it to a similar technique. Results show that gaze-hand steering maintains performance and enhances user comfort and spatial awareness during multitasking. Our findings support the use of gaze-hand steering in gaze-dominant VR applications requiring precision and simultaneous interaction. Our method significantly improves VR navigation in gaze-dominant, multitasking-intensive applications, supporting immersion and efficient control.

Authors:Hyoungwook Jin, Yoonsu Kim, Dongyun Jung, Seungju Kim, Kiyoon Choi, Jinho Son, Juho Kim
Title: Investigating Large Language Models in Diagnosing Students' Cognitive Skills in Math Problem-solving
Abstract:
Mathematics learning entails mastery of both content knowledge and cognitive processing of knowing, applying, and reasoning with it. Automated math assessment primarily has focused on grading students' exhibition of content knowledge by finding textual evidence, such as specific numbers, formulas, and statements. Recent advancements in problem-solving, image recognition, and reasoning capabilities of large language models (LLMs) show promise for nuanced evaluation of students' cognitive skills. Diagnosing cognitive skills needs to infer students' thinking processes beyond textual evidence, which is an underexplored task in LLM-based automated assessment. In this work, we investigate how state-of-the-art LLMs diagnose students' cognitive skills in mathematics. We constructed MathCog, a novel benchmark dataset comprising 639 student responses to 110 expert-curated middle school math problems, each annotated with detailed teachers' diagnoses based on cognitive skill checklists. Using MathCog, we evaluated 16 closed and open LLMs of varying model sizes and vendors. Our evaluation reveals that even the state-of-the-art LLMs struggle with the task, all F1 scores below 0.5, and tend to exhibit strong false confidence for incorrect cases ($r_s=.617$). We also found that model size positively correlates with the diagnosis performance ($r_s=.771$). Finally, we discuss the implications of these findings, the overconfidence issue, and directions for improving automated cognitive skill diagnosis.

Authors:Derrick M. Wang, Sebastian Cmentowski, Reza Hadi Mogavi, Kaushall Senthil Nathan, Eugene Kukshinov, Joseph Tu, Lennart E. Nacke
Title: From Solo to Social: Exploring the Dynamics of Player Cooperation in a Co-located Cooperative Exergame
Abstract:
Digital games offer rich social experiences and promote valuable skills, but they fall short in addressing physical inactivity. Exergames, which combine exercise with gameplay, have the potential to tackle this issue. However, current exergames are primarily single-player or competitive. To explore the social benefits of cooperative exergaming, we designed a custom co-located cooperative exergame that features three distinct forms of cooperation: Free (baseline), Coupled, and Concurrent. We conducted a within-participants, mixed-methods study (N = 24) to evaluate these designs and their impact on players' enjoyment, motivation, and performance. Our findings reveal that cooperative play improves social experiences. It drives increased team identification and relatedness. Furthermore, our qualitative findings support cooperative exergame play. This has design implications for creating exergames that effectively address players' exercise and social needs. Our research contributes guidance for developers and researchers who want to create more socially enriching exergame experiences.

Authors:Jingwen Cheng, Kshitish Ghate, Wenyue Hua, William Yang Wang, Hong Shen, Fei Fang
Title: REALM: A Dataset of Real-World LLM Use Cases
Abstract:
Large Language Models (LLMs), such as the GPT series, have driven significant industrial applications, leading to economic and societal transformations. However, a comprehensive understanding of their real-world applications remains limited. To address this, we introduce REALM, a dataset of over 94,000 LLM use cases collected from Reddit and news articles. REALM captures two key dimensions: the diverse applications of LLMs and the demographics of their users. It categorizes LLM applications and explores how users' occupations relate to the types of applications they use. By integrating real-world data, REALM offers insights into LLM adoption across different domains, providing a foundation for future research on their evolving societal roles.

Authors:Maria A. Larrazabal, Zhiyuan Wang, Mark Rucker, Emma R. Toner, Mehdi Boukhechba, Bethany A. Teachman, Laura E. Barnes
Title: Understanding State Social Anxiety in Virtual Social Interactions using Multimodal Wearable Sensing Indicators
Abstract:
Mobile sensing is ubiquitous and offers opportunities to gain insight into state mental health functioning. Detecting state elevations in social anxiety would be especially useful given this phenomenon is highly prevalent and impairing, but often not disclosed. Although anxiety is highly dynamic, fluctuating rapidly over the course of minutes, most work to date has examined anxiety at a scale of hours, days, or longer. In the present work, we explore the feasibility of detecting fluctuations in state social anxiety among N = 46 undergraduate students with elevated symptoms of trait social anxiety. Participants engaged in two dyadic and two group social interactions via Zoom. We evaluated participants' state anxiety levels as they anticipated, immediately after experiencing, and upon reflecting on each social interaction, spanning a time frame of 2-6 minutes. We collected biobehavioral features (i.e., PPG, EDA, skin temperature, and accelerometer) via Empatica E4 devices as they participated in the varied social contexts (e.g., dyadic vs. group; anticipating vs. experiencing the interaction; experiencing varying levels of social evaluation). We additionally measured their trait mental health functioning. Mixed-effect logistic regression and leave-one-subject-out machine learning modeling indicated biobehavioral features significantly predict state fluctuations in anxiety, though balanced accuracy tended to be modest (59%). However, our capacity to identify instances of heightened versus low state anxiety significantly increased (with balanced accuracy ranging from 69% to 84% across different operationalizations of state anxiety) when we integrated contextual data alongside trait mental health functioning into our predictive models.. We discuss these and other findings in the context of the broader anxiety detection literature.

Authors:Taylor Sorensen, Pushkar Mishra, Roma Patel, Michael Henry Tessler, Michiel Bakker, Georgina Evans, Iason Gabriel, Noah Goodman, Verena Rieser
Title: Value Profiles for Encoding Human Variation
Abstract:
Modelling human variation in rating tasks is crucial for enabling AI systems for personalization, pluralistic model alignment, and computational social science. We propose representing individuals using value profiles -- natural language descriptions of underlying values compressed from in-context demonstrations -- along with a steerable decoder model to estimate ratings conditioned on a value profile or other rater information. To measure the predictive information in rater representations, we introduce an information-theoretic methodology. We find that demonstrations contain the most information, followed by value profiles and then demographics. However, value profiles offer advantages in terms of scrutability, interpretability, and steerability due to their compressed natural language format. Value profiles effectively compress the useful information from demonstrations (>70% information preservation). Furthermore, clustering value profiles to identify similarly behaving individuals better explains rater variation than the most predictive demographic groupings. Going beyond test set performance, we show that the decoder models interpretably change ratings according to semantic profile differences, are well-calibrated, and can help explain instance-level disagreement by simulating an annotator population. These results demonstrate that value profiles offer novel, predictive ways to describe individual variation beyond demographics or group information.

Authors:Wan Ju Kang, Eunki Kim, Na Min An, Sangryul Kim, Haemin Choi, Ki Hoon Kwak, James Thorne
Title: Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
Abstract:
Often, the needs and visual abilities differ between the annotator group and the end user group. Generating detailed diagram descriptions for blind and low-vision (BLV) users is one such challenging domain. Sighted annotators could describe visuals with ease, but existing studies have shown that direct generations by them are costly, bias-prone, and somewhat lacking by BLV standards. In this study, we ask sighted individuals to assess -- rather than produce -- diagram descriptions generated by vision-language models (VLM) that have been guided with latent supervision via a multi-pass inference. The sighted assessments prove effective and useful to professional educators who are themselves BLV and teach visually impaired learners. We release Sightation, a collection of diagram description datasets spanning 5k diagrams and 137k samples for completion, preference, retrieval, question answering, and reasoning training purposes and demonstrate their fine-tuning potential in various downstream tasks.

Authors:Zhiyuan Wang, Katharine E. Daniel, Laura E. Barnes, Philip I. Chow
Title: CALLM: Understanding Cancer Survivors' Emotions and Intervention Opportunities via Mobile Diaries and Context-Aware Language Models
Abstract:
Cancer survivors face unique emotional challenges that impact their quality of life. Mobile diary entries provide a promising method for tracking emotional states, improving self-awareness, and promoting well-being outcome. This paper aims to, through mobile diaries, understand cancer survivors' emotional states and key variables related to just-in-time intervention opportunities, including the desire to regulate emotions and the availability to engage in interventions. Although emotion analysis tools show potential for recognizing emotions from text, current methods lack the contextual understanding necessary to interpret brief mobile diary narratives. Our analysis of diary entries from cancer survivors (N=407) reveals systematic relationships between described contexts and emotional states, with administrative and health-related contexts associated with negative affect and regulation needs, while leisure activities promote positive emotions. We propose CALLM, a Context-Aware framework leveraging Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) to analyze these brief entries by integrating retrieved peer experiences and personal diary history. CALLM demonstrates strong performance with balanced accuracies reaching 72.96% for positive affect, 73.29% for negative affect, 73.72% for emotion regulation desire, and 60.09% for intervention availability, outperforming language model baselines. Post-hoc analysis reveals that model confidence strongly predicts accuracy, with longer diary entries generally enhancing performance, and brief personalization periods yielding meaningful improvements. Our findings demonstrate how contextual information in mobile diaries can be effectively leveraged to understand emotional experiences, predict key states, and identify optimal intervention moments for personalized just-in-time support.

Authors:Pierre Sermanet, Anirudha Majumdar, Alex Irpan, Dmitry Kalashnikov, Vikas Sindhwani
Title: Generating Robot Constitutions & Benchmarks for Semantic Safety
Abstract:
Until recently, robotics safety research was predominantly about collision avoidance and hazard reduction in the immediate vicinity of a robot. Since the advent of large vision and language models (VLMs), robots are now also capable of higher-level semantic scene understanding and natural language interactions with humans. Despite their known vulnerabilities (e.g. hallucinations or jail-breaking), VLMs are being handed control of robots capable of physical contact with the real world. This can lead to dangerous behaviors, making semantic safety for robots a matter of immediate concern. Our contributions in this paper are two fold: first, to address these emerging risks, we release the ASIMOV Benchmark, a large-scale and comprehensive collection of datasets for evaluating and improving semantic safety of foundation models serving as robot brains. Our data generation recipe is highly scalable: by leveraging text and image generation techniques, we generate undesirable situations from real-world visual scenes and human injury reports from hospitals. Secondly, we develop a framework to automatically generate robot constitutions from real-world data to steer a robot's behavior using Constitutional AI mechanisms. We propose a novel auto-amending process that is able to introduce nuances in written rules of behavior; this can lead to increased alignment with human preferences on behavior desirability and safety. We explore trade-offs between generality and specificity across a diverse set of constitutions of different lengths, and demonstrate that a robot is able to effectively reject unconstitutional actions. We measure a top alignment rate of 84.3% on the ASIMOV Benchmark using generated constitutions, outperforming no-constitution baselines and human-written constitutions. Data is available at asimov-benchmark.github.io

Authors:Francesco Iodice, Elena De Momi, Arash Ajoudani
Title: Intelligent Framework for Human-Robot Collaboration: Dynamic Ergonomics and Adaptive Decision-Making
Abstract:
The integration of collaborative robots into industrial environments has improved productivity, but has also highlighted significant challenges related to operator safety and ergonomics. This paper proposes an innovative framework that integrates advanced visual perception, continuous ergonomic monitoring, and adaptive Behaviour Tree decision-making to overcome the limitations of traditional methods that typically operate as isolated components. Our approach synthesizes deep learning models, advanced tracking algorithms, and dynamic ergonomic assessments into a modular, scalable, and adaptive system. Experimental validation demonstrates the framework's superiority over existing solutions across multiple dimensions: the visual perception module outperformed previous detection models with 72.4% mAP@50:95; the system achieved high accuracy in recognizing operator intentions (92.5%); it promptly classified ergonomic risks with minimal latency (0.57 seconds); and it dynamically managed robotic interventions with exceptionally responsive decision-making capabilities (0.07 seconds), representing a 56% improvement over benchmark systems. This comprehensive solution provides a robust platform for enhancing human-robot collaboration in industrial environments by prioritizing ergonomic safety, operational efficiency, and real-time adaptability.

Authors:Alex Calderwood, John Joon Young Chung, Yuqian Sun, Melissa Roemmele, Max Kreminski
Title: Phraselette: A Poet's Procedural Palette
Abstract:
According to the recently introduced theory of artistic support tools, creativity support tools exert normative influences over artistic production, instantiating a normative ground that shapes both the process and product of artistic expression. We argue that the normative ground of most existing automated writing tools is misaligned with writerly values and identify a potential alternative frame-material writing support-for experimental poetry tools that flexibly support the finding, processing, transforming, and shaping of text(s). Based on this frame, we introduce Phraselette, an artistic material writing support interface that helps experimental poets search for words and phrases. To provide material writing support, Phraselette is designed to counter the dominant mode of automated writing tools, while offering language model affordances in line with writerly values. We further report on an extended expert evaluation involving 10 published poets that indicates support for both our framing of material writing support and for Phraselette itself.

Authors:Luke Guerdan, Solon Barocas, Kenneth Holstein, Hanna Wallach, Zhiwei Steven Wu, Alexandra Chouldechova
Title: Validating LLM-as-a-Judge Systems under Rating Indeterminacy
Abstract:
The LLM-as-a-judge paradigm, in which a judge LLM system replaces human raters in rating the outputs of other generative AI (GenAI) systems, plays a critical role in scaling and standardizing GenAI evaluations. To validate such judge systems, evaluators assess human--judge agreement by first collecting multiple human ratings for each item in a validation corpus, then aggregating the ratings into a single, per-item gold label rating. For many items, however, rating criteria may admit multiple valid interpretations, so a human or LLM rater may deem multiple ratings "reasonable" or "correct". We call this condition rating indeterminacy. Problematically, many rating tasks that contain rating indeterminacy rely on forced-choice elicitation, whereby raters are instructed to select only one rating for each item. In this paper, we introduce a framework for validating LLM-as-a-judge systems under rating indeterminacy. We draw theoretical connections between different measures of judge system performance under different human--judge agreement metrics, and different rating elicitation and aggregation schemes. We demonstrate that differences in how humans and LLMs resolve rating indeterminacy while responding to forced-choice rating instructions heavily bias LLM-as-a-judge validation. Through extensive experiments involving 11 real-world rating tasks and 8 commercial LLMs, we show that standard validation approaches that rely upon forced-choice ratings select judge systems that are highly suboptimal, performing as much as 30% worse than judge systems selected by our approach that uses multi-label "response set" ratings to account for rating indeterminacy. We conclude with concrete recommendations for more principled approaches to LLM-as-a-judge validation.

Authors:Yue Lyu, Pengcheng An, Yage Xiao, Zibo Selena Zhang, Huan Zhang, Keiko Katsuragawa, Jian Zhao
Title: Eggly: Designing Mobile Augmented Reality Neurofeedback Training Games for Children with Autism Spectrum Disorder
Abstract:
Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that affects how children communicate and relate to other people and the world around them. Emerging studies have shown that neurofeedback training (NFT) games are an effective and playful intervention to enhance social and attentional capabilities for autistic children. However, NFT is primarily available in a clinical setting that is hard to scale. Also, the intervention demands deliberately-designed gamified feedback with fun and enjoyment, where little knowledge has been acquired in the HCI community. Through a ten-month iterative design process with four domain experts, we developed Eggly, a mobile NFT game based on a consumer-grade EEG headband and a tablet. Eggly uses novel augmented reality (AR) techniques to offer engagement and personalization, enhancing their training experience. We conducted two field studies (a single-session study and a three-week multi-session study) with a total of five autistic children to assess Eggly in practice at a special education center. Both quantitative and qualitative results indicate the effectiveness of the approach as well as contribute to the design knowledge of creating mobile AR NFT games.

Authors:Alex Liu, Lief Esbenshade, Min Sun, Shawon Sarkar, Jian He, Victor Tian, Zachary Zhang
Title: Adapting to Educate: Conversational AI's Role in Mathematics Education Across Different Educational Contexts
Abstract:
As educational settings increasingly integrate artificial intelligence (AI), understanding how AI tools identify -- and adapt their responses to -- varied educational contexts becomes paramount. This study examines conversational AI's effectiveness in supporting K-12 mathematics education across various educational contexts. Through qualitative content analysis, we identify educational contexts and key instructional needs present in educator prompts and assess AI's responsiveness. Our findings indicate that educators focus their AI conversations on assessment methods, how to set the cognitive demand level of their instruction, and strategies for making meaningful real-world connections. However, educators' conversations with AI about instructional practices do vary across revealed educational contexts; they shift their emphasis to tailored, rigorous content that addresses their students' unique needs. Educators often seek actionable guidance from AI and reject responses that do not align with their inquiries. While AI can provide accurate, relevant, and useful information when educational contexts or instructional practices are specified in conversation queries, its ability to consistently adapt responses along these evaluation dimensions varies across different educational settings. Significant work remains to realize the response-differentiating potential of conversational AI tools in complex educational use cases. This research contributes insights into developing AI tools that are responsive, proactive, and anticipatory, adapting to evolving educational needs before they are explicitly stated, and provides actionable recommendations for both developers and educators to enhance AI integration in educational practices.

Authors:Tin Nguyen, Logan Bolton, Mohammad Reza Taesiri, Trung Bui, Anh Totti Nguyen
Title: HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs
Abstract:
An Achilles heel of Large Language Models (LLMs) is their tendency to hallucinate non-factual statements. A response mixed of factual and non-factual statements poses a challenge for humans to verify and accurately base their decisions on. To combat this problem, we propose Highlighted Chain-of-Thought Prompting (HoT), a technique for prompting LLMs to generate responses with XML tags that ground facts to those provided in the query. That is, given an input question, LLMs would first re-format the question to add XML tags highlighting key facts, and then, generate a response with highlights over the facts referenced from the input. Interestingly, in few-shot settings, HoT outperforms vanilla chain of thought prompting (CoT) on a wide range of 17 tasks from arithmetic, reading comprehension to logical reasoning. When asking humans to verify LLM responses, highlights help time-limited participants to more accurately and efficiently recognize when LLMs are correct. Yet, surprisingly, when LLMs are wrong, HoTs tend to make users believe that an answer is correct.

Authors:Ruijing Zhao, Brian Diep, Jiaxin Pei, Dongwook Yoon, David Jurgens, Jian Zhu
Title: Who Reaps All the Superchats? A Large-Scale Analysis of Income Inequality in Virtual YouTuber Livestreaming
Abstract:
The explosive growth of Virtual YouTubers (VTubers)-streamers who perform behind virtual anime avatars-has created a unique digital economy with profound implications for content creators, platforms, and viewers. Understanding the economic landscape of VTubers is crucial for designing equitable platforms, supporting content creator livelihoods, and fostering sustainable digital communities. To this end, we conducted a large-scale study of over 1 million hours of publicly available streaming records from 1,923 VTubers on YouTube, covering tens of millions of dollars in actual profits. Our analysis reveals stark inequality within the VTuber community and characterizes the sources of income for VTubers from multiple perspectives. Furthermore, we also found that the VTuber community is increasingly monopolized by two agencies, driving the financial disparity. This research illuminates the financial dynamics of VTuber communities, informing the design of equitable platforms and sustainable support systems for digital content creators.

Authors:Benedetta Muscato, Praveen Bushipaka, Gizem Gezici, Lucia Passaro, Fosca Giannotti, Tommaso Cucinotta
Title: Embracing Diversity: A Multi-Perspective Approach with Soft Labels
Abstract:
Prior studies show that adopting the annotation diversity shaped by different backgrounds and life experiences and incorporating them into the model learning, i.e. multi-perspective approach, contribute to the development of more responsible models. Thus, in this paper we propose a new framework for designing and further evaluating perspective-aware models on stance detection task,in which multiple annotators assign stances based on a controversial topic. We also share a new dataset established through obtaining both human and LLM annotations. Results show that the multi-perspective approach yields better classification performance (higher F1-scores), outperforming the traditional approaches that use a single ground-truth, while displaying lower model confidence scores, probably due to the high level of subjectivity of the stance detection task.

Authors:Megha Srivastava, Reihaneh Iranmanesh, Yuchen Cui, Deepak Gopinath, Emily Sumner, Andrew Silva, Laporsha Dees, Guy Rosman, Dorsa Sadigh
Title: Shared Autonomy for Proximal Teaching
Abstract:
Motor skill learning often requires experienced professionals who can provide personalized instruction. Unfortunately, the availability of high-quality training can be limited for specialized tasks, such as high performance racing. Several recent works have leveraged AI-assistance to improve instruction of tasks ranging from rehabilitation to surgical robot tele-operation. However, these works often make simplifying assumptions on the student learning process, and fail to model how a teacher's assistance interacts with different individuals' abilities when determining optimal teaching strategies. Inspired by the idea of scaffolding from educational psychology, we leverage shared autonomy, a framework for combining user inputs with robot autonomy, to aid with curriculum design. Our key insight is that the way a student's behavior improves in the presence of assistance from an autonomous agent can highlight which sub-skills might be most ``learnable'' for the student, or within their Zone of Proximal Development. We use this to design Z-COACH, a method for using shared autonomy to provide personalized instruction targeting interpretable task sub-skills. In a user study (n=50), where we teach high performance racing in a simulated environment of the Thunderhill Raceway Park with the CARLA Autonomous Driving simulator, we show that Z-COACH helps identify which skills each student should first practice, leading to an overall improvement in driving time, behavior, and smoothness. Our work shows that increasingly available semi-autonomous capabilities (e.g. in vehicles, robots) can not only assist human users, but also help *teach* them.

Authors:JaeWon Kim, Robert Wolfe, Ramya Bhagirathi Subramanian, Mei-Hsuan Lee, Jessica Colnago, Alexis Hiniker
Title: Trust-Enabled Privacy: Social Media Designs to Support Adolescent User Boundary Regulation
Abstract:
Adolescents heavily rely on social media to build and maintain close relationships, yet current platform designs often make self-disclosure feel risky or uncomfortable. Through a three-part study involving 19 teens aged 13-18, we identify key barriers to meaningful self-disclosure on social media. Our findings reveal that while these adolescents seek casual, frequent sharing to strengthen relationships, existing platform norms often discourage such interactions. Based on our co-design interview findings, we propose platform design ideas to foster a more dynamic and nuanced privacy experience for teen social media users. We then introduce \textbf{\textit{trust-enabled privacy}} as a framework that recognizes trust -- whether building or eroding -- as central to boundary regulation, and foregrounds the role of platform design in shaping the very norms and interaction patterns that influence how trust unfolds. When trust is supported, boundary regulation becomes more adaptive and empowering; when it erodes, users resort to self-censorship or disengagement. This work provides empirical insights and actionable guidelines for designing social media spaces where teens feel empowered to engage in meaningful relationship-building processes.

Authors:Parag Khanna, Mårten Björkman, Christian Smith
Title: Impact of Object Weight in Handovers: Inspiring Robotic Grip Release and Motion from Human Handovers
Abstract:
This work explores the effect of object weight on human motion and grip release during handovers to enhance the naturalness, safety, and efficiency of robot-human interactions. We introduce adaptive robotic strategies based on the analysis of human handover behavior with varying object weights. The key contributions of this work includes the development of an adaptive grip-release strategy for robots, a detailed analysis of how object weight influences human motion to guide robotic motion adaptations, and the creation of handover-datasets incorporating various object weights, including the YCB handover dataset. By aligning robotic grip release and motion with human behavior, this work aims to improve robot-human handovers for different weighted objects. We also evaluate these human-inspired adaptive robotic strategies in robot-to-human handovers to assess their effectiveness and performance and demonstrate that they outperform the baseline approaches in terms of naturalness, efficiency, and user perception.

Authors:Yaman Yu, Yiren Liu, Jacky Zhang, Yun Huang, Yang Wang
Title: Understanding Generative AI Risks for Youth: A Taxonomy Based on Empirical Data
Abstract:
Generative AI (GAI) is reshaping the way young users engage with technology. This study introduces a taxonomy of risks associated with youth-GAI interactions, derived from an analysis of 344 chat transcripts between youth and GAI chatbots, 30,305 Reddit discussions concerning youth engagement with these systems, and 153 documented AI-related incidents. We categorize risks into six overarching themes, identifying 84 specific risks, which we further align with four distinct interaction pathways. Our findings highlight emerging concerns, such as risks to mental wellbeing, behavioral and social development, and novel forms of toxicity, privacy breaches, and misuse/exploitation that are not fully addressed in existing frameworks on child online safety or AI risks. By systematically grounding our taxonomy in empirical data, this work offers a structured approach to aiding AI developers, educators, caregivers, and policymakers in comprehending and mitigating risks associated with youth-GAI interactions.

Authors:Yong Ma, Yuchong Zhang, Di Fu, Stephanie Zubicueta Portales, Danica Kragic, Morten Fjeld
Title: Advancing User-Voice Interaction: Exploring Emotion-Aware Voice Assistants Through a Role-Swapping Approach
Abstract:
As voice assistants (VAs) become increasingly integrated into daily life, the need for emotion-aware systems that can recognize and respond appropriately to user emotions has grown. While significant progress has been made in speech emotion recognition (SER) and sentiment analysis, effectively addressing user emotions-particularly negative ones-remains a challenge. This study explores human emotional response strategies in VA interactions using a role-swapping approach, where participants regulate AI emotions rather than receiving pre-programmed responses. Through speech feature analysis and natural language processing (NLP), we examined acoustic and linguistic patterns across various emotional scenarios. Results show that participants favor neutral or positive emotional responses when engaging with negative emotional cues, highlighting a natural tendency toward emotional regulation and de-escalation. Key acoustic indicators such as root mean square (RMS), zero-crossing rate (ZCR), and jitter were identified as sensitive to emotional states, while sentiment polarity and lexical diversity (TTR) distinguished between positive and negative responses. These findings provide valuable insights for developing adaptive, context-aware VAs capable of delivering empathetic, culturally sensitive, and user-aligned responses. By understanding how humans naturally regulate emotions in AI interactions, this research contributes to the design of more intuitive and emotionally intelligent voice assistants, enhancing user trust and engagement in human-AI interactions.

Authors:Di Liu, Jingwen Bai, Zhuoyi Zhang, Yilin Zhang, Zhenhao Zhang, Jian Zhao, Pengcheng An
Title: TherAIssist: Assisting Art Therapy Homework and Client-Practitioner Collaboration through Human-AI Interaction
Abstract:
Art therapy homework is essential for fostering clients' reflection on daily experiences between sessions. However, current practices present challenges: clients often lack guidance for completing tasks that combine art-making and verbal expression, while therapists find it difficult to track and tailor homework. How HCI systems might support art therapy homework remains underexplored. To address this, we present TherAIssist, comprising a client-facing application leveraging human-AI co-creative art-making and conversational agents to facilitate homework, and a therapist-facing application enabling customization of homework agents and AI-compiled homework history. A 30-day field study with 24 clients and 5 therapists showed how TherAIssist supported clients' homework and reflection in their everyday settings. Results also revealed how therapists infused their practice principles and personal touch into the agents to offer tailored homework, and how AI-compiled homework history became a meaningful resource for in-session interactions. Implications for designing human-AI systems to facilitate asynchronous client-practitioner collaboration are discussed.

Authors:Jiaqi Jiang, Shanghao Li, Xian Li, Yingxin Xu, Jian Zhao, Pengcheng An
Title: BIG-AOME: Designing Bodily Interaction Gamification towards Anti-sedentary Online Meeting Environments
Abstract:
Online meetings have become an integral part of daily life, but prolonged screen time poses significant health risks. While various interventions address sedentary lifestyles, few focus on mitigating sedentary behavior during online meetings. Design opportunities in this context remain underexplored. This study investigates the design of gamified bodily interactions as anti-sedentary measures during online meetings using a research through design approach. In collaboration with 11 users, we co-designed and iterated three prototypes, resulting in the BIG-AOME (Bodily Interaction Gamification towards Anti-sedentary Online Meeting Environments) framework. User studies with 15 participants across three groups evaluated these prototypes through semi-structured interviews analyzed using Hsieh's qualitative content analysis. Findings show that gamified bodily interactions encourage physical movement while reducing awkwardness during online meetings. Participants valued the social engagement fostered by cooperative and competitive elements in these games, enhancing social dynamics while encouraging physical movement. Such games can also serve as online icebreakers or playful decision-making tools. This study offers a comprehensive analysis of design dimensions within the BIG-AOME framework, including body engagement, attention, bodily interplay, timeliness, and virtual/physical environments, highlighting the potential of anti-sedentary bodily interactions to mitigate sedentary behavior and enhance social connections in online meetings.

Authors:Vardaan Pahuja, Yadong Lu, Corby Rosset, Boyu Gou, Arindam Mitra, Spencer Whitehead, Yu Su, Ahmed Awadallah
Title: Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
Abstract:
Recent success in large multimodal models (LMMs) has sparked promising applications of agents capable of autonomously completing complex web tasks. While open-source LMM agents have made significant advances in offline evaluation benchmarks, their performance still falls substantially short of human-level capabilities in more realistic online settings. A key bottleneck is the lack of diverse and large-scale trajectory-level datasets across various domains, which are expensive to collect. In this paper, we address this challenge by developing a scalable recipe to synthesize the largest and most diverse trajectory-level dataset to date, containing over 94K successful multimodal web trajectories, spanning 49K unique URLs, 720K screenshots, and 33M web elements. In particular, we leverage extensive web exploration and refinement to obtain diverse task intents. The average cost is 28 cents per successful trajectory, making it affordable to a wide range of users in the community. Leveraging this dataset, we train Explorer, a multimodal web agent, and demonstrate strong performance on both offline and online web agent benchmarks such as Mind2Web-Live, Multimodal-Mind2Web, and MiniWob++. Additionally, our experiments highlight data scaling as a key driver for improving web agent capabilities. We hope this study makes state-of-the-art LMM-based agent research at a larger scale more accessible.

Authors:Ehsan-Ul Haq, Shalini Jangra, Suparna De, Nishanth Sastry, Gareth Tyson
Title: Unpacking the Layers: Exploring Self-Disclosure Norms, Engagement Dynamics, and Privacy Implications
Abstract:
This paper characterizes the self-disclosure behavior of Reddit users across 11 different types of self-disclosure. We find that at least half of the users share some type of disclosure in at least 10% of their posts, with half of these posts having more than one type of disclosure. We show that different types of self-disclosure are likely to receive varying levels of engagement. For instance, a Sexual Orientation disclosure garners more comments than other self-disclosures. We also explore confounding factors that affect future self-disclosure. We show that users who receive interactions from (self-disclosure) specific subreddit members are more likely to disclose in the future. We also show that privacy risks due to self-disclosure extend beyond Reddit users themselves to include their close contacts, such as family and friends, as their information is also revealed. We develop a browser plugin for end-users to flag self-disclosure in their content.

Authors:Haichuan Lin, Yilin Ye, Jiazhi Xia, Wei Zeng
Title: SketchFlex: Facilitating Spatial-Semantic Coherence in Text-to-Image Generation with Region-Based Sketches
Abstract:
Text-to-image models can generate visually appealing images from text descriptions. Efforts have been devoted to improving model controls with prompt tuning and spatial conditioning. However, our formative study highlights the challenges for non-expert users in crafting appropriate prompts and specifying fine-grained spatial conditions (e.g., depth or canny references) to generate semantically cohesive images, especially when multiple objects are involved. In response, we introduce SketchFlex, an interactive system designed to improve the flexibility of spatially conditioned image generation using rough region sketches. The system automatically infers user prompts with rational descriptions within a semantic space enriched by crowd-sourced object attributes and relationships. Additionally, SketchFlex refines users' rough sketches into canny-based shape anchors, ensuring the generation quality and alignment of user intentions. Experimental results demonstrate that SketchFlex achieves more cohesive image generations than end-to-end models, meanwhile significantly reducing cognitive load and better matching user intentions compared to region-based generation baseline.

Authors:Unggi Lee, Hansung Kim, Juhong Eom, Hyeonseo Jeong, Seungyeon Lee, Gyuri Byun, Yunseo Lee, Minji Kang, Gospel Kim, Jihoi Na, Jewoong Moon, Hyeoncheol Kim
Title: Echo-Teddy: Preliminary Design and Development of Large Language Model-based Social Robot for Autistic Students
Abstract:
Autistic students often face challenges in social interaction, which can hinder their educational and personal development. This study introduces Echo-Teddy, a Large Language Model (LLM)-based social robot designed to support autistic students in developing social and communication skills. Unlike previous chatbot-based solutions, Echo-Teddy leverages advanced LLM capabilities to provide more natural and adaptive interactions. The research addresses two key questions: (1) What are the design principles and initial prototype characteristics of an effective LLM-based social robot for autistic students? (2) What improvements can be made based on developer reflection-on-action and expert interviews? The study employed a mixed-methods approach, combining prototype development with qualitative analysis of developer reflections and expert interviews. Key design principles identified include customizability, ethical considerations, and age-appropriate interactions. The initial prototype, built on a Raspberry Pi platform, features custom speech components and basic motor functions. Evaluation of the prototype revealed potential improvements in areas such as user interface, educational value, and practical implementation in educational settings. This research contributes to the growing field of AI-assisted special education by demonstrating the potential of LLM-based social robots in supporting autistic students. The findings provide valuable insights for future developments in accessible and effective social support tools for special education.

Authors:Hannah Rose Kirk, Iason Gabriel, Chris Summerfield, Bertie Vidgen, Scott A. Hale
Title: Why human-AI relationships need socioaffective alignment
Abstract:
Humans strive to design safe AI systems that align with our goals and remain under our control. However, as AI capabilities advance, we face a new challenge: the emergence of deeper, more persistent relationships between humans and AI systems. We explore how increasingly capable AI agents may generate the perception of deeper relationships with users, especially as AI becomes more personalised and agentic. This shift, from transactional interaction to ongoing sustained social engagement with AI, necessitates a new focus on socioaffective alignment-how an AI system behaves within the social and psychological ecosystem co-created with its user, where preferences and perceptions evolve through mutual influence. Addressing these dynamics involves resolving key intrapersonal dilemmas, including balancing immediate versus long-term well-being, protecting autonomy, and managing AI companionship alongside the desire to preserve human social bonds. By framing these challenges through a notion of basic psychological needs, we seek AI systems that support, rather than exploit, our fundamental nature as social and emotional beings.

Authors:Yuan Tian, Dazhen Deng, Sen Yang, Huawei Zheng, Bowen Shi, Kai Xiong, Xinjing Yi, Yingcai Wu
Title: NoteFlow: Recommending Charts as Sight Glasses for Tracing Data Flow in Computational Notebooks
Abstract:
Exploratory Data Analysis (EDA) is a routine task for data analysts, often conducted using flexible computational notebooks. During EDA, data workers process, visualize, and interpret data tables, making decisions about subsequent analysis. However, the cell-by-cell programming approach, while flexible, can lead to disorganized code, making it difficult to trace the state of data tables across cells and increasing the cognitive load on data workers. This paper introduces NoteFlow, a notebook library that recommends charts as ``sight glasses'' for data tables, allowing users to monitor their dynamic updates throughout the EDA process. To ensure visual consistency and effectiveness, NoteFlow adapts chart encodings in response to data transformations, maintaining a coherent and insightful representation of the data. The proposed method was evaluated through user studies, demonstrating its ability to provide an overview of the EDA process and convey critical insights in the data tables.

Authors:Yiluo Wei, Gareth Tyson
Title: Virtual Stars, Real Fans: Understanding the VTuber Ecosystem
Abstract:
Livestreaming by VTubers -- animated 2D/3D avatars controlled by real individuals -- have recently garnered substantial global followings and achieved significant monetary success. Despite prior research highlighting the importance of realism in audience engagement, VTubers deliberately conceal their identities, cultivating dedicated fan communities through virtual personas. While previous studies underscore that building a core fan community is essential to a streamer's success, we lack an understanding of the characteristics of viewers of this new type of streamer. Gaining a deeper insight into these viewers is critical for VTubers to enhance audience engagement, foster a more robust fan base, and attract a larger viewership. To address this gap, we conduct a comprehensive analysis of VTuber viewers on Bilibili, a leading livestreaming platform where nearly all VTubers in China stream. By compiling a first-of-its-kind dataset covering 2.7M livestreaming sessions, we investigate the characteristics, engagement patterns, and influence of VTuber viewers. Our research yields several valuable insights, which we then leverage to develop a tool to "recommend" future subscribers to VTubers. By reversing the typical approach of recommending streams to viewers, this tool assists VTubers in pinpointing potential future fans to pay more attention to, and thereby effectively growing their fan community.

Authors:Seokhyeon Park, Yumin Song, Soohyun Lee, Jaeyoung Kim, Jinwook Seo
Title: Leveraging Multimodal LLM for Inspirational User Interface Search
Abstract:
Inspirational search, the process of exploring designs to inform and inspire new creative work, is pivotal in mobile user interface (UI) design. However, exploring the vast space of UI references remains a challenge. Existing AI-based UI search methods often miss crucial semantics like target users or the mood of apps. Additionally, these models typically require metadata like view hierarchies, limiting their practical use. We used a multimodal large language model (MLLM) to extract and interpret semantics from mobile UI images. We identified key UI semantics through a formative study and developed a semantic-based UI search system. Through computational and human evaluations, we demonstrate that our approach significantly outperforms existing UI retrieval methods, offering UI designers a more enriched and contextually relevant search experience. We enhance the understanding of mobile UI design semantics and highlight MLLMs' potential in inspirational search, providing a rich dataset of UI semantics for future studies.

Authors:Natalie Friedman, Alexandra Bremers, Adelaide Nyanyo, Ian Clark, Yasmine Kotturi, Laura Dabbish, Wendy Ju, Nikolas Martelaro
Title: Understanding the Challenges of Maker Entrepreneurship
Abstract:
The maker movement embodies a resurgence in DIY creation, merging physical craftsmanship and arts with digital technology support. However, mere technological skills and creativity are insufficient for economically and psychologically sustainable practice. By illuminating and smoothing the path from ``maker" to ``maker entrepreneur," we can help broaden the viability of making as a livelihood. Our research centers on makers who design, produce, and sell physical goods. In this work, we explore the transition to entrepreneurship for these makers and how technology can facilitate this transition online and offline. We present results from interviews with 20 USA-based maker entrepreneurs {(i.e., lamps, stickers)}, six creative service entrepreneurs {(i.e., photographers, fabrication)}, and seven support personnel (i.e., art curator, incubator director). Our findings reveal that many maker entrepreneurs 1) are makers first and entrepreneurs second; 2) struggle with business logistics and learn business skills as they go; and 3) are motivated by non-monetary values. We discuss training and technology-based design implications and opportunities for addressing challenges in developing economically sustainable businesses around making.

Authors:John Joon Young Chung, Melissa Roemmele, Max Kreminski
Title: Toyteller: AI-powered Visual Storytelling Through Toy-Playing with Character Symbols
Abstract:
We introduce Toyteller, an AI-powered storytelling system where users generate a mix of story text and visuals by directly manipulating character symbols like they are toy-playing. Anthropomorphized symbol motions can convey rich and nuanced social interactions; Toyteller leverages these motions (1) to let users steer story text generation and (2) as a visual output format that accompanies story text. We enabled motion-steered text generation and text-steered motion generation by mapping motions and text onto a shared semantic space so that large language models and motion generation models can use it as a translational layer. Technical evaluations showed that Toyteller outperforms a competitive baseline, GPT-4o. Our user study identified that toy-playing helps express intentions difficult to verbalize. However, only motions could not express all user intentions, suggesting combining it with other modalities like language. We discuss the design space of toy-playing interactions and implications for technical HCI research on human-AI interaction.

Authors:Yuqi Niu, Weidong Qiu, Peng Tang, Lifan Wang, Shuo Chen, Shujun Li, Nadin Kokciyan, Ben Niu
Title: Everyone's Privacy Matters! An Analysis of Privacy Leakage from Real-World Facial Images on Twitter and Associated User Behaviors
Abstract:
Online users often post facial images of themselves and other people on online social networks (OSNs) and other Web 2.0 platforms, which can lead to potential privacy leakage of people whose faces are included in such images. There is limited research on understanding face privacy in social media while considering user behavior. It is crucial to consider privacy of subjects and bystanders separately. This calls for the development of privacy-aware face detection classifiers that can distinguish between subjects and bystanders automatically. This paper introduces such a classifier trained on face-based features, which outperforms the two state-of-the-art methods with a significant margin (by 13.1% and 3.1% for OSN images, and by 17.9% and 5.9% for non-OSN images). We developed a semi-automated framework for conducting a large-scale analysis of the face privacy problem by using our novel bystander-subject classifier. We collected 27,800 images, each including at least one face, shared by 6,423 Twitter users. We then applied our framework to analyze this dataset thoroughly. Our analysis reveals eight key findings of different aspects of Twitter users' real-world behaviors on face privacy, and we provide quantitative and qualitative results to better explain these findings. We share the practical implications of our study to empower online platforms and users in addressing the face privacy problem efficiently.

Authors:Yuqian Sun, Phoebe J. Wang, John Joon Young Chung, Melissa Roemmele, Taewook Kim, Max Kreminski
Title: Drama Llama: An LLM-Powered Storylets Framework for Authorable Responsiveness in Interactive Narrative
Abstract:
In this paper, we present Drama Llama, an LLM-powered storylets framework that supports the authoring of responsive, open-ended interactive stories. DL combines the structural benefits of storylet-based systems with the generative capabilities of large language models, enabling authors to create responsive interactive narratives while maintaining narrative control. Rather than crafting complex logical preconditions in a general-purpose or domain-specific programming language, authors define triggers in natural language that fire at appropriate moments in the story. Through a preliminary authoring study with six content authors, we present initial evidence that DL can generate coherent and meaningful narratives with believable character interactions. This work suggests directions for hybrid approaches that enhance authorial control while supporting emergent narrative generation through LLMs.

Authors:Yerong Li, Yiren Liu, Yun Huang
Title: VicSim: Enhancing Victim Simulation with Emotional and Linguistic Fidelity
Abstract:
Scenario-based training has been widely adopted in many public service sectors. Recent advancements in Large Language Models (LLMs) have shown promise in simulating diverse personas to create these training scenarios. However, little is known about how LLMs can be developed to simulate victims for scenario-based training purposes. In this paper, we introduce VicSim (victim simulator), a novel model that addresses three key dimensions of user simulation: informational faithfulness, emotional dynamics, and language style (e.g., grammar usage). We pioneer the integration of scenario-based victim modeling with GAN-based training workflow and key-information-based prompting, aiming to enhance the realism of simulated victims. Our adversarial training approach teaches the discriminator to recognize grammar and emotional cues as reliable indicators of synthetic content. According to evaluations by human raters, the VicSim model outperforms GPT-4 in terms of human-likeness.

Authors:Aditya Bhattacharya, Simone Stumpf, Robin De Croon, Katrien Verbert
Title: Explanatory Debiasing: Involving Domain Experts in the Data Generation Process to Mitigate Representation Bias in AI Systems
Abstract:
Representation bias is one of the most common types of biases in artificial intelligence (AI) systems, causing AI models to perform poorly on underrepresented data segments. Although AI practitioners use various methods to reduce representation bias, their effectiveness is often constrained by insufficient domain knowledge in the debiasing process. To address this gap, this paper introduces a set of generic design guidelines for effectively involving domain experts in representation debiasing. We instantiated our proposed guidelines in a healthcare-focused application and evaluated them through a comprehensive mixed-methods user study with 35 healthcare experts. Our findings show that involving domain experts can reduce representation bias without compromising model accuracy. Based on our findings, we also offer recommendations for developers to build robust debiasing systems guided by our generic design guidelines, ensuring more effective inclusion of domain experts in the debiasing process.

Authors:Nathaniel Dennler, Stefanos Nikolaidis, Maja Matarić
Title: Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference Elicitation
Abstract:
People have a variety of preferences for how robots behave. To understand and reason about these preferences, robots aim to learn a reward function that describes how aligned robot behaviors are with a user's preferences. Good representations of a robot's behavior can significantly reduce the time and effort required for a user to teach the robot their preferences. Specifying these representations -- what "features" of the robot's behavior matter to users -- remains a difficult problem; Features learned from raw data lack semantic meaning and features learned from user data require users to engage in tedious labeling processes. Our key insight is that users tasked with customizing a robot are intrinsically motivated to produce labels through exploratory search; they explore behaviors that they find interesting and ignore behaviors that are irrelevant. To harness this novel data source of exploratory actions, we propose contrastive learning from exploratory actions (CLEA) to learn trajectory features that are aligned with features that users care about. We learned CLEA features from exploratory actions users performed in an open-ended signal design activity (N=25) with a Kuri robot, and evaluated CLEA features through a second user study with a different set of users (N=42). CLEA features outperformed self-supervised features when eliciting user preferences over four metrics: completeness, simplicity, minimality, and explainability.

Authors:Lena John, Kheir Eddine Farfar, Sören Auer, Oliver Karras
Title: SciMantify -- A Hybrid Approach for the Evolving Semantification of Scientific Knowledge
Abstract:
Scientific publications, primarily digitized as PDFs, remain static and unstructured, limiting the accessibility and reusability of the contained knowledge. At best, scientific knowledge from publications is provided in tabular formats, which lack semantic context. A more flexible, structured, and semantic representation is needed to make scientific knowledge understandable and processable by both humans and machines. We propose an evolution model of knowledge representation, inspired by the 5-star Linked Open Data (LOD) model, with five stages and defined criteria to guide the stepwise transition from a digital artifact, such as a PDF, to a semantic representation integrated in a knowledge graph (KG). Based on an exemplary workflow implementing the entire model, we developed a hybrid approach, called SciMantify, leveraging tabular formats of scientific knowledge, e.g., results from secondary studies, to support its evolving semantification. In the approach, humans and machines collaborate closely by performing semantic annotation tasks (SATs) and refining the results to progressively improve the semantic representation of scientific knowledge. We implemented the approach in the Open Research Knowledge Graph (ORKG), an established platform for improving the findability, accessibility, interoperability, and reusability of scientific knowledge. A preliminary user experiment showed that the approach simplifies the preprocessing of scientific knowledge, reduces the effort for the evolving semantification, and enhances the knowledge representation through better alignment with the KG structures.

Authors:Joerg Deigmoeller, Stephan Hasler, Nakul Agarwal, Daniel Tanneberg, Anna Belardinelli, Reza Ghoddoosian, Chao Wang, Felix Ocker, Fan Zhang, Behzad Dariush, Michael Gienger
Title: CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition
Abstract:
We introduce CARMA, a system for situational grounding in human-robot group interactions. Effective collaboration in such group settings requires situational awareness based on a consistent representation of present persons and objects coupled with an episodic abstraction of events regarding actors and manipulated objects. This calls for a clear and consistent assignment of instances, ensuring that robots correctly recognize and track actors, objects, and their interactions over time. To achieve this, CARMA uniquely identifies physical instances of such entities in the real world and organizes them into grounded triplets of actors, objects, and actions. To validate our approach, we conducted three experiments, where multiple humans and a robot interact: collaborative pouring, handovers, and sorting. These scenarios allow the assessment of the system's capabilities as to role distinction, multi-actor awareness, and consistent instance identification. Our experiments demonstrate that the system can reliably generate accurate actor-action-object triplets, providing a structured and robust foundation for applications requiring spatiotemporal reasoning and situated decision-making in collaborative settings.

Authors:Matti Krüger, Daniel Tanneberg, Chao Wang, Stephan Hasler, Michael Gienger
Title: Mirror Eyes: Explainable Human-Robot Interaction at a Glance
Abstract:
The gaze of a person tends to reflect their interest. This work explores what happens when this statement is taken literally and applied to robots. Here we present a robot system that employs a moving robot head with a screen-based eye model that can direct the robot's gaze to points in physical space and present a reflection-like mirror image of the attended region on top of each eye. We conducted a user study with 33 participants, who were asked to instruct the robot to perform pick-and-place tasks, monitor the robot's task execution, and interrupt it in case of erroneous actions. Despite a deliberate lack of instructions about the role of the eyes and a very brief system exposure, participants felt more aware about the robot's information processing, detected erroneous actions earlier, and rated the user experience higher when eye-based mirroring was enabled compared to non-reflective eyes. These results suggest a beneficial and intuitive utilization of the introduced method in cooperative human-robot interaction.

Authors:Yashothara Shanmugarasa, Ming Ding, M. A. P Chamikara, Thierry Rakotoarivelo
Title: SoK: The Privacy Paradox of Large Language Models: Advancements, Privacy Risks, and Mitigation
Abstract:
Large language models (LLMs) are sophisticated artificial intelligence systems that enable machines to generate human-like text with remarkable precision. While LLMs offer significant technological progress, their development using vast amounts of user data scraped from the web and collected from extensive user interactions poses risks of sensitive information leakage. Most existing surveys focus on the privacy implications of the training data but tend to overlook privacy risks from user interactions and advanced LLM capabilities. This paper aims to fill that gap by providing a comprehensive analysis of privacy in LLMs, categorizing the challenges into four main areas: (i) privacy issues in LLM training data, (ii) privacy challenges associated with user prompts, (iii) privacy vulnerabilities in LLM-generated outputs, and (iv) privacy challenges involving LLM agents. We evaluate the effectiveness and limitations of existing mitigation mechanisms targeting these proposed privacy challenges and identify areas for further research.

Authors:Jennifer Grannen, Siddharth Karamcheti, Blake Wulfe, Dorsa Sadigh
Title: ProVox: Personalization and Proactive Planning for Situated Human-Robot Collaboration
Abstract:
Collaborative robots must quickly adapt to their partner's intent and preferences to proactively identify helpful actions. This is especially true in situated settings where human partners can continually teach robots new high-level behaviors, visual concepts, and physical skills (e.g., through demonstration), growing the robot's capabilities as the human-robot pair work together to accomplish diverse tasks. In this work, we argue that robots should be able to infer their partner's goals from early interactions and use this information to proactively plan behaviors ahead of explicit instructions from the user. Building from the strong commonsense priors and steerability of large language models, we introduce ProVox ("Proactive Voice"), a novel framework that enables robots to efficiently personalize and adapt to individual collaborators. We design a meta-prompting protocol that empowers users to communicate their distinct preferences, intent, and expected robot behaviors ahead of starting a physical interaction. ProVox then uses the personalized prompt to condition a proactive language model task planner that anticipates a user's intent from the current interaction context and robot capabilities to suggest helpful actions; in doing so, we alleviate user burden, minimizing the amount of time partners spend explicitly instructing and supervising the robot. We evaluate ProVox through user studies grounded in household manipulation tasks (e.g., assembling lunch bags) that measure the efficiency of the collaboration, as well as features such as perceived helpfulness, ease of use, and reliability. Our analysis suggests that both meta-prompting and proactivity are critical, resulting in 38.7% faster task completion times and 31.9% less user burden relative to non-active baselines. Supplementary material, code, and videos can be found at https://provox-2025.github.io.

Authors:Anup Sathya, Jiasheng Li, Zeyu Yan, Adriane Fang, Bill Kules, Jonathan David Martin, Huaishu Peng
Title: Cybernetic Marionette: Channeling Collective Agency Through a Wearable Robot in a Live Dancer-Robot Duet
Abstract:
We describe DANCE^2, an interactive dance performance in which audience members channel their collective agency into a dancer-robot duet by voting on the behavior of a wearable robot affixed to the dancer's body. At key moments during the performance, the audience is invited to either continue the choreography or override it, shaping the unfolding interaction through real-time collective input. While post-performance surveys revealed that participants felt their choices meaningfully influenced the performance, voting data across four public performances exhibited strikingly consistent patterns. This tension between what audience members do, what they feel, and what actually changes highlights a complex interplay between agentive behavior, the experience of agency, and power. We reflect on how choreography, interaction design, and the structure of the performance mediate this relationship, offering a live analogy for algorithmically curated digital systems where agency is felt, but not exercised.

Authors:Stina Klein, Pooja Prajod, Katharina Weitz, Matteo Lavit Nicora, Dimitra Tsovaltzi, Elisabeth André
Title: Communicating Through Avatars in Industry 5.0: A Focus Group Study on Human-Robot Collaboration
Abstract:
The integration of collaborative robots (cobots) in industrial settings raises concerns about worker well-being, particularly due to reduced social interactions. Avatars - designed to facilitate worker interactions and engagement - are promising solutions to enhance the human-robot collaboration (HRC) experience. However, real-world perspectives on avatar-supported HRC remain unexplored. To address this gap, we conducted a focus group study with employees from a German manufacturing company that uses cobots. Before the discussion, participants engaged with a scripted, industry-like HRC demo in a lab setting. This qualitative approach provided valuable insights into the avatar's potential roles, improvements to its behavior, and practical considerations for deploying them in industrial workcells. Our findings also emphasize the importance of personalized communication and task assistance. Although our study's limitations restrict its generalizability, it serves as an initial step in recognizing the potential of adaptive, context-aware avatar interactions in real-world industrial environments.

Authors:Alvaro Becerra, Daniel Andres, Pablo Villegas, Roberto Daza, Ruth Cobos
Title: MOSAIC-F: A Framework for Enhancing Students' Oral Presentation Skills through Personalized Feedback
Abstract:
In this article, we present a novel multimodal feedback framework called MOSAIC-F, an acronym for a data-driven Framework that integrates Multimodal Learning Analytics (MMLA), Observations, Sensors, Artificial Intelligence (AI), and Collaborative assessments for generating personalized feedback on student learning activities. This framework consists of four key steps. First, peers and professors' assessments are conducted through standardized rubrics (that include both quantitative and qualitative evaluations). Second, multimodal data are collected during learning activities, including video recordings, audio capture, gaze tracking, physiological signals (heart rate, motion data), and behavioral interactions. Third, personalized feedback is generated using AI, synthesizing human-based evaluations and data-based multimodal insights such as posture, speech patterns, stress levels, and cognitive load, among others. Finally, students review their own performance through video recordings and engage in self-assessment and feedback visualization, comparing their own evaluations with peers and professors' assessments, class averages, and AI-generated recommendations. By combining human-based and data-based evaluation techniques, this framework enables more accurate, personalized and actionable feedback. We tested MOSAIC-F in the context of improving oral presentation skills.

Authors:Changshuo Hu, Qiang Yang, Yang Liu, Tobias Röddiger, Kayla-Jade Butkow, Mathias Ciliberto, Adam Luke Pullin, Jake Stuchbury-Wass, Mahbub Hassan, Cecilia Mascolo, Dong Ma
Title: A Survey of Earable Technology: Trends, Tools, and the Road Ahead
Abstract:
Earable devices, wearables positioned in or around the ear, are undergoing a rapid transformation from audio-centric accessories into multifunctional systems for interaction, contextual awareness, and health monitoring. This evolution is driven by commercial trends emphasizing sensor integration and by a surge of academic interest exploring novel sensing capabilities. Building on the foundation established by earlier surveys, this work presents a timely and comprehensive review of earable research published since 2022. We analyze over one hundred recent studies to characterize this shifting research landscape, identify emerging applications and sensing modalities, and assess progress relative to prior efforts. In doing so, we address three core questions: how has earable research evolved in recent years, what enabling resources are now available, and what opportunities remain for future exploration. Through this survey, we aim to provide both a retrospective and forward-looking view of earable technology as a rapidly expanding frontier in ubiquitous computing. In particular, this review reveals that over the past three years, researchers have discovered a variety of novel sensing principles, developed many new earable sensing applications, enhanced the accuracy of existing sensing tasks, and created substantial new resources to advance research in the field. Based on this, we further discuss open challenges and propose future directions for the next phase of earable research.

Authors:Ramteja Sajja, Yusuf Sermet, Brian Fodale, Ibrahim Demir
Title: Evaluating AI-Powered Learning Assistants in Engineering Higher Education: Student Engagement, Ethical Challenges, and Policy Implications
Abstract:
As generative AI tools become increasingly integrated into higher education, understanding how students interact with and perceive these technologies is essential for responsible and effective adoption. This study evaluates the use of the Educational AI Hub, an AI-powered learning framework, in undergraduate civil and environmental engineering courses at a large R1 public university. Using a mixed-methods approach that combines pre- and post-surveys, system usage logs, and qualitative analysis of the open-ended prompts and questions students posed to the AI chatbot, the research explores students' perceptions of trust, ethical concerns, usability, and learning outcomes. Findings reveal that students appreciated the AI assistant for its convenience and comfort, with nearly half reporting greater ease in using the AI tool compared to seeking help from instructors or teaching assistants. The tool was seen as most helpful for completing homework and understanding course concepts, though perceptions of its instructional quality were mixed. Ethical concerns emerged as a key barrier to full engagement: while most students viewed AI use as ethically acceptable, many expressed uncertainties about institutional policies and apprehension about potential academic misconduct. This study contributes to the growing body of research on AI in education by highlighting the importance of usability, policy clarity, and faculty guidance in fostering meaningful AI engagement. The findings suggest that while students are ready to embrace AI as a supplement to human instruction, thoughtful integration and transparent institutional frameworks are critical for ensuring student confidence, trust, and learning effectiveness.

Authors:Julia Barnett, Kimon Kieslich, Jasmine Sinchai, Nicholas Diakopoulos
Title: Scenarios in Computing Research: A Systematic Review of the Use of Scenario Methods for Exploring the Future of Computing Technologies in Society
Abstract:
Scenario building is an established method to anticipate the future of emerging technologies. Its primary goal is to use narratives to map future trajectories of technology development and sociotechnical adoption. Following this process, risks and benefits can be identified early on, and strategies can be developed that strive for desirable futures. In recent years, computer science has adopted this method and applied it to various technologies, including Artificial Intelligence (AI). Because computing technologies play such an important role in shaping modern societies, it is worth exploring how scenarios are being used as an anticipatory tool in the field -- and what possible traditional uses of scenarios are not yet covered but have the potential to enrich the field. We address this gap by conducting a systematic literature review on the use of scenario building methods in computer science over the last decade (n = 59). We guide the review along two main questions. First, we aim to uncover how scenarios are used in computing literature, focusing especially on the rationale for why scenarios are used. Second, in following the potential of scenario building to enhance inclusivity in research, we dive deeper into the participatory element of the existing scenario building literature in computer science.

Authors:Matthew Russell, Aman Shah, Giles Blaney, Judith Amores, Mary Czerwinski, Robert J. K. Jacob
Title: Neural and Cognitive Impacts of AI: The Influence of Task Subjectivity on Human-LLM Collaboration
Abstract:
AI-based interactive assistants are advancing human-augmenting technology, yet their effects on users' mental and physiological states remain under-explored. We address this gap by analyzing how Copilot for Microsoft Word, a LLM-based assistant, impacts users. Using tasks ranging from objective (SAT reading comprehension) to subjective (personal reflection), and with measurements including fNIRS, Empatica E4, NASA-TLX, and questionnaires, we measure Copilot's effects on users. We also evaluate users' performance with and without Copilot across tasks. In objective tasks, participants reported a reduction of workload and an increase in enjoyment, which was paired with objective performance increases. Participants reported reduced workload and increased enjoyment with no change in performance in a creative poetry writing task. However, no benefits due to Copilot use were reported in a highly subjective self-reflection task. Although no physiological changes were recorded due to Copilot use, task-dependent differences in prefrontal cortex activation offer complementary insights into the cognitive processes associated with successful and unsuccessful human-AI collaboration. These findings suggest that AI assistants' effectiveness varies with task type-particularly showing decreased usefulness in tasks that engage episodic memory-and presents a brain-network based hypothesis of human-AI collaboration.

Authors:Dominik Mimra, Dominik Kaar, Enrico Del Re, Novel Certad, Joshua Cherian Varughese, David Seibt, Cristina Olaverri-Monreal
Title: Understanding Visually Impaired Tramway Passengers Interaction with Public Transport Systems
Abstract:
Designing inclusive public transport services is crucial to developing modern, barrier-free smart city infrastructure. This research contributes to the design of inclusive public transport by considering accessibility challenges emerging from socio-technical systems, thus demanding the integration of technological and social solutions. Using Actor-Network Theory (ANT) as a theoretical framework and a mixed-method approach, including shadowing and a focus group, this study examines the socio-technical networks that shape accessibility experiences for visually impaired passengers utilizing the tram in Linz, Austria. Key dimensions that influence public transport accessibility are identified: network configuration, mobility patterns, technology integration, and warning systems. The results show that accessibility emerges from complex interactions between human actors (passengers, staff) and non-human actors (assistive devices, infrastructure) rather than being an inherent property of transport systems. Digital technologies serve multiple functions, from navigational assistance to broader social inclusion, although users comfort with technology varies. Participants emphasized the importance of the two-sense principle for warning signals, with directional audio and tactile feedback particularly valuable.

Authors:Yeseon Hong, Junhyuk Choi, Minju Kim, Bugeun Kim
Title: Can LLMs and humans be friends? Uncovering factors affecting human-AI intimacy formation
Abstract:
Large language models (LLMs) are increasingly being used in conversational roles, yet little is known about how intimacy emerges in human-LLM interactions. Although previous work emphasized the importance of self-disclosure in human-chatbot interaction, it is questionable whether gradual and reciprocal self-disclosure is also helpful in human-LLM interaction. Thus, this study examined three possible aspects contributing to intimacy formation: gradual self-disclosure, reciprocity, and naturalness. Study 1 explored the impact of mutual, gradual self-disclosure with 29 users and a vanilla LLM. Study 2 adopted self-criticism methods for more natural responses and conducted a similar experiment with 53 users. Results indicate that gradual self-disclosure significantly enhances perceived social intimacy, regardless of persona reciprocity. Moreover, participants perceived utterances generated with self-criticism as more natural compared to those of vanilla LLMs; self-criticism fostered higher intimacy in early stages. Also, we observed that excessive empathetic expressions occasionally disrupted immersion, pointing to the importance of response calibration during intimacy formation.

Authors:Hyungjun Doh, Jingyu Shi, Rahul Jain, Heesoo Kim, Karthik Ramani
Title: An Exploratory Study on Multi-modal Generative AI in AR Storytelling
Abstract:
Storytelling in AR has gained attention due to its multi-modality and interactivity. However, generating multi-modal content for AR storytelling requires expertise and efforts for high-quality conveyance of the narrator's intention. Recently, Generative-AI (GenAI) has shown promising applications in multi-modal content generation. Despite the potential benefit, current research calls for validating the effect of AI-generated content (AIGC) in AR Storytelling. Therefore, we conducted an exploratory study to investigate the utilization of GenAI. Analyzing 223 AR videos, we identified a design space for multi-modal AR Storytelling. Based on the design space, we developed a testbed facilitating multi-modal content generation and atomic elements in AR Storytelling. Through two studies with N=30 experienced storytellers and live presenters, we 1. revealed participants' preferences for modalities, 2. evaluated the interactions with AI to generate content, and 3. assessed the quality of the AIGC for AR Storytelling. We further discussed design considerations for future AR Storytelling with GenAI.

Authors:Bufang Yang, Lilin Xu, Liekang Zeng, Kaiwei Liu, Siyang Jiang, Wenrui Lu, Hongkai Chen, Xiaofan Jiang, Guoliang Xing, Zhenyu Yan
Title: ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions
Abstract:
Recent advances in Large Language Models (LLMs) have propelled intelligent agents from reactive responses to proactive support. While promising, existing proactive agents either rely exclusively on observations from enclosed environments (e.g., desktop UIs) with direct LLM inference or employ rule-based proactive notifications, leading to suboptimal user intent understanding and limited functionality for proactive service. In this paper, we introduce ContextAgent, the first context-aware proactive agent that incorporates extensive sensory contexts to enhance the proactive capabilities of LLM agents. ContextAgent first extracts multi-dimensional contexts from massive sensory perceptions on wearables (e.g., video and audio) to understand user intentions. ContextAgent then leverages the sensory contexts and the persona contexts from historical data to predict the necessity for proactive services. When proactive assistance is needed, ContextAgent further automatically calls the necessary tools to assist users unobtrusively. To evaluate this new task, we curate ContextAgentBench, the first benchmark for evaluating context-aware proactive LLM agents, covering 1,000 samples across nine daily scenarios and twenty tools. Experiments on ContextAgentBench show that ContextAgent outperforms baselines by achieving up to 8.5% and 6.0% higher accuracy in proactive predictions and tool calling, respectively. We hope our research can inspire the development of more advanced, human-centric, proactive AI assistants.

Authors:Ava Elizabeth Scott, Lev Tankelevitch, Payod Panda, Rishi Vanukuru, Xinyue Chen, Sean Rintel
Title: What Does Success Look Like? Catalyzing Meeting Intentionality with AI-Assisted Prospective Reflection
Abstract:
Despite decades of HCI and Meeting Science research, complaints about ineffective meetings are still pervasive. We argue that meeting technologies lack support for prospective reflection, that is, thinking about why a meeting is needed and what might happen. To explore this, we designed a Meeting Purpose Assistant (MPA) technology probe to coach users to articulate their meeting's purpose and challenges, and act accordingly. The MPA used Generative AI to support personalized and actionable prospective reflection across the diversity of meeting contexts. Using a participatory prompting methodology, 18 employees of a global technology company reflected with the MPA on upcoming meetings. Observed impacts were: clarifying meeting purposes, challenges, and success conditions; changing perspectives and flexibility; improving preparation and communication; and proposing changed plans. We also identify perceived social, temporal, and technological barriers to using the MPA. We present system and workflow design considerations for developing AI-assisted reflection support for meetings.

Authors:Evangelos Pournaras, Srijoni Majumdar, Thomas Wellings, Joshua C. Yang, Fatemeh B. Heravan, Regula Hänggli Fricker, Dirk Helbing
Title: Upgrading Democracies with Fairer Voting Methods
Abstract:
Voting methods are instrumental design element of democracies. Citizens use them to express and aggregate their preferences to reach a collective decision. However, voting outcomes can be as sensitive to voting rules as they are to people's voting choices. Despite the significance and inter-disciplinary scientific progress on voting methods, several democracies keep relying on outdated voting methods that do not fit modern, pluralistic societies well, while lacking social innovation. Here, we demonstrate how one can upgrade real-world democracies, namely by using alternative preferential voting methods such as cumulative voting and the method of equal shares designed for a proportional representation of voters' preferences. By rigorously assessing a new participatory budgeting approach applied in the city of Aarau, Switzerland, we unravel the striking voting outcomes of fair voting methods: more winning projects with the same budget and broader geographic and preference representation of citizens by the elected projects, in particular for voters who used to be under-represented, while promoting novel project ideas. We provide profound causal evidence showing that citizens prefer proportional voting methods, which possess strong legitimacy without the need of very technical specialized explanations. We also reveal strong underlying democratic values exhibited by citizens who support fair voting methods such as altruism and compromise. These findings come with a global momentum to unleash a new and long-awaited participation blueprint of how to upgrade democracies.

Authors:Florian Lehmann, Daniel Buschek
Title: StudyAlign: A Software System for Conducting Web-Based User Studies with Functional Interactive Prototypes
Abstract:
Interactive systems are commonly prototyped as web applications. This approach enables studies with functional prototypes on a large scale. However, setting up these studies can be complex due to implementing experiment procedures, integrating questionnaires, and data logging. To enable such user studies, we developed the software system StudyAlign which offers: 1) a frontend for participants, 2) an admin panel to manage studies, 3) the possibility to integrate questionnaires, 4) a JavaScript library to integrate data logging into prototypes, and 5) a backend server for persisting log data, and serving logical functions via an API to the different parts of the system. With our system, researchers can set up web-based experiments and focus on the design and development of interactions and prototypes. Furthermore, our systematic approach facilitates the replication of studies and reduces the required effort to execute web-based user studies. We conclude with reflections on using StudyAlign for conducting HCI studies online.

Authors:Runlin Duan, Chenfei Zhu, Yuzhao Chen, Yichen Hu, Jingyu Shi, Karthik Ramani
Title: DesignFromX: Empowering Consumer-Driven Design Space Exploration through Feature Composition of Referenced Products
Abstract:
Industrial products are designed to satisfy the needs of consumers. The rise of generative artificial intelligence (GenAI) enables consumers to easily modify a product by prompting a generative model, opening up opportunities to incorporate consumers in exploring the product design space. However, consumers often struggle to articulate their preferred product features due to their unfamiliarity with terminology and their limited understanding of the structure of product features. We present DesignFromX, a system that empowers consumer-driven design space exploration by helping consumers to design a product based on their preferences. Leveraging an effective GenAI-based framework, the system allows users to easily identify design features from product images and compose those features to generate conceptual images and 3D models of a new product. A user study with 24 participants demonstrates that DesignFromX lowers the barriers and frustration for consumer-driven design space explorations by enhancing both engagement and enjoyment for the participants.

Authors:Shusen Liu, Haichao Miao, Peer-Timo Bremer
Title: ParaView-MCP: An Autonomous Visualization Agent with Direct Tool Use
Abstract:
While powerful and well-established, tools like ParaView present a steep learning curve that discourages many potential users. This work introduces ParaView-MCP, an autonomous agent that integrates modern multimodal large language models (MLLMs) with ParaView to not only lower the barrier to entry but also augment ParaView with intelligent decision support. By leveraging the state-of-the-art reasoning, command execution, and vision capabilities of MLLMs, ParaView-MCP enables users to interact with ParaView through natural language and visual inputs. Specifically, our system adopted the Model Context Protocol (MCP) - a standardized interface for model-application communication - that facilitates direct interaction between MLLMs with ParaView's Python API to allow seamless information exchange between the user, the language model, and the visualization tool itself. Furthermore, by implementing a visual feedback mechanism that allows the agent to observe the viewport, we unlock a range of new capabilities, including recreating visualizations from examples, closed-loop visualization parameter updates based on user-defined goals, and even cross-application collaboration involving multiple tools. Broadly, we believe such an agent-driven visualization paradigm can profoundly change the way we interact with visualization tools. We expect a significant uptake in the development of such visualization tools, in both visualization research and industry.

Authors:Nelusa Pathmanathan, Seyda Öney, Maurice Koch, Daniel Weiskopf, Kuno Kurzhals
Title: Uncertainty-Aware Scarf Plots
Abstract:
Multiple challenges emerge when analyzing eye-tracking data with areas of interest (AOIs) because recordings are subject to different sources of uncertainties. Previous work often presents gaze data without considering those inaccuracies in the data. To address this issue, we developed uncertainty-aware scarf plot visualizations that aim to make analysts aware of uncertainties with respect to the position-based mapping of gaze to AOIs and depth dependency in 3D scenes. Additionally, we also consider uncertainties in automatic AOI annotation. We showcase our approach in comparison to standard scarf plots in an augmented reality scenario.

Authors:Maurice Koch, Tobias Rau, Vladimir Mikheev, Seyda Öney, Michael Becher, Xiangyu Wang, Nelusa Pathmanathan, Patrick Gralka, Daniel Weiskopf, Kuno Kurzhals
Title: Group Gaze-Sharing with Projection Displays
Abstract:
The eyes play an important role in human collaboration. Mutual and shared gaze help communicate visual attention to each other or to a specific object of interest. Shared gaze was typically investigated for pair collaborations in remote settings and with people in virtual and augmented reality. With our work, we expand this line of research by a new technique to communicate gaze between groups in tabletop workshop scenarios. To achieve this communication, we use an approach based on projection mapping to unify gaze data from multiple participants into a common visualization space on a tabletop. We showcase our approach with a collaborative puzzle-solving task that displays shared visual attention on individual pieces and provides hints to solve the problem at hand.

Authors:Yasra Chandio, Diana Romero, Salma Elmalaki, Fatima Anwar
Title: What Sensors See, What People Feel: Exploring Subjective Collaboration Perception in Mixed Reality
Abstract:
Mixed Reality (MR) enables rich, embodied collaboration, yet it's uncertain if sensor and system-logged behavioral signals capture how users experience that collaboration. This disconnect stems from a fundamental gap: behavioral signals are observable and continuous, while collaboration is interpreted subjectively, shaped by internal states like presence, cognitive availability, and social awareness. Our core insight is that sensor signals serve as observable manifestations of subjective experiences in MR collaboration, and they can be captured through sensor data such as shared gaze, speech, spatial movement, and other system-logged performance metrics. We propose the Sensor-to-Subjective (S2S) Mapping Framework, a conceptual model that links observable interaction patterns to users' subjective perceptions of collaboration and internal cognitive states through sensor-based indicators and task performance metrics. To validate this model, we conducted a study with 48 participants across 12 MR groups engaged in a collaborative image-sorting task. Our findings show a correlation between sensed behavior and perceived collaboration, particularly through shared attention and proximity.

Authors:Michael A. Hedderich, Anyi Wang, Raoyuan Zhao, Florian Eichin, Jonas Fischer, Barbara Plank
Title: What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns
Abstract:
Prompt engineering for large language models is challenging, as even small prompt perturbations or model changes can significantly impact the generated output texts. Existing evaluation methods of LLM outputs, either automated metrics or human evaluation, have limitations, such as providing limited insights or being labor-intensive. We propose Spotlight, a new approach that combines both automation and human analysis. Based on data mining techniques, we automatically distinguish between random (decoding) variations and systematic differences in language model outputs. This process provides token patterns that describe the systematic differences and guide the user in manually analyzing the effects of their prompts and changes in models efficiently. We create three benchmarks to quantitatively test the reliability of token pattern extraction methods and demonstrate that our approach provides new insights into established prompt data. From a human-centric perspective, through demonstration studies and a user study, we show that our token pattern approach helps users understand the systematic differences of language model outputs. We are further able to discover relevant differences caused by prompt and model changes (e.g. related to gender or culture), thus supporting the prompt engineering process and human-centric model behavior research.

Authors:Chenyu Tang, Josée Mallah, Dominika Kazieczko, Wentian Yi, Tharun Reddy Kandukuri, Edoardo Occhipinti, Bhaskar Mishra, Sunita Mehta, Luigi G. Occhipinti
Title: Wireless Silent Speech Interface Using Multi-Channel Textile EMG Sensors Integrated into Headphones
Abstract:
This paper presents a novel wireless silent speech interface (SSI) integrating multi-channel textile-based EMG electrodes into headphone earmuff for real-time, hands-free communication. Unlike conventional patch-based EMG systems, which require large-area electrodes on the face or neck, our approach ensures comfort, discretion, and wearability while maintaining robust silent speech decoding. The system utilizes four graphene/PEDOT:PSS-coated textile electrodes to capture speech-related neuromuscular activity, with signals processed via a compact ESP32-S3-based wireless readout module. To address the challenge of variable skin-electrode coupling, we propose a 1D SE-ResNet architecture incorporating squeeze-and-excitation (SE) blocks to dynamically adjust per-channel attention weights, enhancing robustness against motion-induced impedance variations. The proposed system achieves 96% accuracy on 10 commonly used voice-free control words, outperforming conventional single-channel and non-adaptive baselines. Experimental validation, including XAI-based attention analysis and t-SNE feature visualization, confirms the adaptive channel selection capability and effective feature extraction of the model. This work advances wearable EMG-based SSIs, demonstrating a scalable, low-power, and user-friendly platform for silent communication, assistive technologies, and human-computer interaction.

Authors:Novel Certad, Enrico Del Re, Joshua Varughese, Cristina Olaverri-Monreal
Title: V2P Collision Warnings for Distracted Pedestrians: A Comparative Study with Traditional Auditory Alerts
Abstract:
This study assesses a Vehicle-to-Pedestrian (V2P) collision warning system compared to conventional vehicle-issued auditory alerts in a real-world scenario simulating a vehicle on a fixed track, characterized by limited maneuverability and the need for timely pedestrian response. The results from analyzing speed variations show that V2P warnings are particularly effective for pedestrians distracted by phone use (gaming or listening to music), highlighting the limitations of auditory alerts in noisy environments. The findings suggest that V2P technology offers a promising approach to improving pedestrian safety in urban areas

Authors:George Fatouros, Georgios Makridis, George Kousiouris, John Soldatos, Anargyros Tsadimas, Dimosthenis Kyriazis
Title: Towards Conversational AI for Human-Machine Collaborative MLOps
Abstract:
This paper presents a Large Language Model (LLM) based conversational agent system designed to enhance human-machine collaboration in Machine Learning Operations (MLOps). We introduce the Swarm Agent, an extensible architecture that integrates specialized agents to create and manage ML workflows through natural language interactions. The system leverages a hierarchical, modular design incorporating a KubeFlow Pipelines (KFP) Agent for ML pipeline orchestration, a MinIO Agent for data management, and a Retrieval-Augmented Generation (RAG) Agent for domain-specific knowledge integration. Through iterative reasoning loops and context-aware processing, the system enables users with varying technical backgrounds to discover, execute, and monitor ML pipelines; manage datasets and artifacts; and access relevant documentation, all via intuitive conversational interfaces. Our approach addresses the accessibility gap in complex MLOps platforms like Kubeflow, making advanced ML tools broadly accessible while maintaining the flexibility to extend to other platforms. The paper describes the architecture, implementation details, and demonstrates how this conversational MLOps assistant reduces complexity and lowers barriers to entry for users across diverse technical skill levels.

Authors:Run Luo, Lu Wang, Wanwei He, Xiaobo Xia
Title: GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
Abstract:
Existing efforts in building Graphical User Interface (GUI) agents largely rely on the training paradigm of supervised fine-tuning on Large Vision-Language Models (LVLMs). However, this approach not only demands extensive amounts of training data but also struggles to effectively understand GUI screenshots and generalize to unseen interfaces. The issue significantly limits its application in real-world scenarios, especially for high-level tasks. Inspired by Reinforcement Fine-Tuning (RFT) in large reasoning models (e.g., DeepSeek-R1), which efficiently enhances the problem-solving capabilities of large language models in real-world settings, we propose \name, the first reinforcement learning framework designed to enhance the GUI capabilities of LVLMs in high-level real-world task scenarios, through unified action space rule modeling. By leveraging a small amount of carefully curated high-quality data across multiple platforms (including Windows, Linux, MacOS, Android, and Web) and employing policy optimization algorithms such as Group Relative Policy Optimization (GRPO) to update the model, \name achieves superior performance using only 0.02\% of the data (3K vs. 13M) compared to previous state-of-the-art methods like OS-Atlas across eight benchmarks spanning three different platforms (mobile, desktop, and web). These results demonstrate the immense potential of reinforcement learning based on unified action space rule modeling in improving the execution capabilities of LVLMs for real-world GUI agent tasks.
中文: 现有基于监督微调的GUI智能体存在数据效率低和泛化能力差的问题,因此提出首个强化学习框架,通过统一动作空间建模仅用0.02%数据即在多平台基准测试中实现最优性能。
English: Current GUI agents relying on supervised fine-tuning of LVLMs face limitations in data efficiency and generalization, prompting the proposal of a reinforcement learning framework that achieves superior performance with minimal data through unified action space modeling.

Authors:Yashothara Shanmugarasa, Shidong Pan, Ming Ding, Dehai Zhao, Thierry Rakotoarivelo
Title: Privacy Meets Explainability: Managing Confidential Data and Transparency Policies in LLM-Empowered Science
Abstract:
As Large Language Models (LLMs) become integral to scientific workflows, concerns over the confidentiality and ethical handling of confidential data have emerged. This paper explores data exposure risks through LLM-powered scientific tools, which can inadvertently leak confidential information, including intellectual property and proprietary data, from scientists' perspectives. We propose "DataShield", a framework designed to detect confidential data leaks, summarize privacy policies, and visualize data flow, ensuring alignment with organizational policies and procedures. Our approach aims to inform scientists about data handling practices, enabling them to make informed decisions and protect sensitive information. Ongoing user studies with scientists are underway to evaluate the framework's usability, trustworthiness, and effectiveness in tackling real-world privacy challenges.

Authors:Asiful Arefeen, Saman Khamesian, Maria Adela Grando, Bithika Thompson, Hassan Ghasemzadeh
Title: GlyTwin: Digital Twin for Glucose Control in Type 1 Diabetes Through Optimal Behavioral Modifications Using Patient-Centric Counterfactuals
Abstract:
Frequent and long-term exposure to hyperglycemia (i.e., high blood glucose) increases the risk of chronic complications such as neuropathy, nephropathy, and cardiovascular disease. Current technologies like continuous subcutaneous insulin infusion (CSII) and continuous glucose monitoring (CGM) primarily model specific aspects of glycemic control-like hypoglycemia prediction or insulin delivery. Similarly, most digital twin approaches in diabetes management simulate only physiological processes. These systems lack the ability to offer alternative treatment scenarios that support proactive behavioral interventions. To address this, we propose GlyTwin, a novel digital twin framework that uses counterfactual explanations to simulate optimal treatments for glucose regulation. Our approach helps patients and caregivers modify behaviors like carbohydrate intake and insulin dosing to avoid abnormal glucose events. GlyTwin generates behavioral treatment suggestions that proactively prevent hyperglycemia by recommending small adjustments to daily choices, reducing both frequency and duration of these events. Additionally, it incorporates stakeholder preferences into the intervention design, making recommendations patient-centric and tailored. We evaluate GlyTwin on AZT1D, a newly constructed dataset with longitudinal data from 21 type 1 diabetes (T1D) patients on automated insulin delivery systems over 26 days. Results show GlyTwin outperforms state-of-the-art counterfactual methods, generating 76.6% valid and 86% effective interventions. These findings demonstrate the promise of counterfactual-driven digital twins in delivering personalized healthcare.

Authors:Shirley Zhang, Bengisu Cagiltay, Jennica Li, Dakota Sullivan, Bilge Mutlu, Heather Kirkorian, Kassem Fawaz
Title: Exploring Families' Use and Mediation of Generative AI: A Multi-User Perspective
Abstract:
Applications of Generative AI (GenAI), such as ChatGPT, have gained popularity among the public due to their ease of access, use, and support of educational and creative activities. Despite these benefits, GenAI poses unique risks for families, such as lacking sufficient safeguards tailored to protect children under 16 years of age and not offering parental control features. This study explores families' use and co-use of GenAI, the perceived risks and opportunities of ChatGPT, and how parents mediate their children's use of GenAI. Through semi-structured interviews with 12 families, we identified ways families used and mediated GenAI and factors that influenced parents' GenAI mediation strategies. We contextualize our findings with a modified model of family mediation strategies, drawing from previous family media and mediation frameworks. We provide insights for future research on family-GenAI interactions and highlight the need for more robust protective measures on GenAI platforms for families.

Authors:Cansu Koyuturk, Emily Theophilou, Sabrina Patania, Gregor Donabauer, Andrea Martinenghi, Chiara Antico, Alessia Telari, Alessia Testa, Sathya Bursic, Franca Garzotto, Davinia Hernandez-Leo, Udo Kruschwitz, Davide Taibi, Simona Amenta, Martin Ruskov, Dimitri Ognibene
Title: Understanding Learner-LLM Chatbot Interactions and the Impact of Prompting Guidelines
Abstract:
Large Language Models (LLMs) have transformed human-computer interaction by enabling natural language-based communication with AI-powered chatbots. These models are designed to be intuitive and user-friendly, allowing users to articulate requests with minimal effort. However, despite their accessibility, studies reveal that users often struggle with effective prompting, resulting in inefficient responses. Existing research has highlighted both the limitations of LLMs in interpreting vague or poorly structured prompts and the difficulties users face in crafting precise queries. This study investigates learner-AI interactions through an educational experiment in which participants receive structured guidance on effective prompting. We introduce and compare three types of prompting guidelines: a task-specific framework developed through a structured methodology and two baseline approaches. To assess user behavior and prompting efficacy, we analyze a dataset of 642 interactions from 107 users. Using Von NeuMidas, an extended pragmatic annotation schema for LLM interaction analysis, we categorize common prompting errors and identify recurring behavioral patterns. We then evaluate the impact of different guidelines by examining changes in user behavior, adherence to prompting strategies, and the overall quality of AI-generated responses. Our findings provide a deeper understanding of how users engage with LLMs and the role of structured prompting guidance in enhancing AI-assisted communication. By comparing different instructional frameworks, we offer insights into more effective approaches for improving user competency in AI interactions, with implications for AI literacy, chatbot usability, and the design of more responsive AI systems.

Authors:Anku Rani, Valdemar Danry, Andy Lippman, Pattie Maes
Title: Can dialogues with AI systems help humans better discern visual misinformation?
Abstract:
The widespread emergence of manipulated news media content poses significant challenges to online information integrity. This study investigates whether dialogues with AI about AI-generated images and associated news statements can increase human discernment abilities and foster short-term learning in detecting misinformation. We conducted a study with 80 participants who engaged in structured dialogues with an AI system about news headline-image pairs, generating 1,310 human-AI dialogue exchanges. Results show that AI interaction significantly boosts participants' accuracy in identifying real versus fake news content from approximately 60\% to 90\% (p$<$0.001). However, these improvements do not persist when participants are presented with new, unseen image-statement pairs without AI assistance, with accuracy returning to baseline levels (~60\%, p=0.88). These findings suggest that while AI systems can effectively change immediate beliefs about specific content through persuasive dialogue, they may not produce lasting improvements that transfer to novel examples, highlighting the need for developing more effective interventions that promote durable learning outcomes.

Authors:Lixing He, Bufang Yang, Di Duan, Zhenyu Yan, Guoliang Xing
Title: EmbodiedSense: Understanding Embodied Activities with Earphones
Abstract:
In this paper, we propose EmbodiedSense, a sensing system based on commercial earphones, which enables fine-grained activity logs using existing sensors. The activity logs record both user activities and the scenario in which the activities took place, benefiting detailed behavior understanding. By understanding both the user and the environment, EmbodiedSense addresses three main challenges: the limited recognition capability caused by information-hungry configurations (i.e., limited sensors available), the ineffective fusion to extract ambient information such as contextual scenarios, and the interference from ambient noise. Specifically, EmbodiedSense consists of a context-aware scenario recognition module and spatial-aware activity detection, which is further integrated with other attributes by expert knowledge. We implement our system on commercial earphones equipped with binaural microphones and an Inertial Measurement Unit (IMU). By distinguishing usage scenarios and identifying the source of sounds, EmbodiedSense enables fine-grained activity logs in a zero-shot manner (evaluated with up to 41 categories) and outperforms strong baselines like ImageBind-LLM by 38% F1-score. Extensive evaluations demonstrate that EmbodiedSense is a promising solution for long-term and short-term activity logs and provides significant benefits in monitoring the wearer's daily life.

Authors:Xinyue Chen, Lev Tankelevitch, Rishi Vanukuru, Ava Elizabeth Scott, Payod Panda, Sean Rintel
Title: Are We On Track? AI-Assisted Active and Passive Goal Reflection During Meetings
Abstract:
Meetings often suffer from a lack of intentionality, such as unclear goals and straying off-topic. Identifying goals and maintaining their clarity throughout a meeting is challenging, as discussions and uncertainties evolve. Yet meeting technologies predominantly fail to support meeting intentionality. AI-assisted reflection is a promising approach. To explore this, we conducted a technology probe study with 15 knowledge workers, integrating their real meeting data into two AI-assisted reflection probes: a passive and active design. Participants identified goal clarification as a foundational aspect of reflection. Goal clarity enabled people to assess when their meetings were off-track and reprioritize accordingly. Passive AI intervention helped participants maintain focus through non-intrusive feedback, while active AI intervention, though effective at triggering immediate reflection and action, risked disrupting the conversation flow. We identify three key design dimensions for AI-assisted reflection systems, and provide insights into design trade-offs, emphasizing the need to adapt intervention intensity and timing, balance democratic input with efficiency, and offer user control to foster intentional, goal-oriented behavior during meetings and beyond.

Authors:Mingyang Gu, Jiamin Zhu, Qipeng Wang, Fengjie Wang, Xiaolin Wen, Yong Wang, Min Zhu
Title: IntelliCircos: A Data-driven and AI-powered Authoring Tool for Circos Plots
Abstract:
Genomics data is essential in biological and medical domains, and bioinformatics analysts often manually create circos plots to analyze the data and extract valuable insights. However, creating circos plots is complex, as it requires careful design for multiple track attributes and positional relationships between them. Typically, analysts often seek inspiration from existing circos plots, and they have to iteratively adjust and refine the plot to achieve a satisfactory final design, making the process both tedious and time-intensive. To address these challenges, we propose IntelliCircos, an AI-powered interactive authoring tool that streamlines the process from initial visual design to the final implementation of circos plots. Specifically, we build a new dataset containing 4396 circos plots with corresponding annotations and configurations, which are extracted and labeled from published papers. With the dataset, we further identify track combination patterns, and utilize Large Language Model (LLM) to provide domain-specific design recommendations and configuration references to navigate the design of circos plots. We conduct a user study with 8 bioinformatics analysts to evaluate IntelliCircos, and the results demonstrate its usability and effectiveness in authoring circos plots.

Authors:Alice Qian Zhang, Jina Suh, Mary L. Gray, Hong Shen
Title: Effective Automation to Support the Human Infrastructure in AI Red Teaming
Abstract:
As artificial intelligence (AI) systems become increasingly embedded in critical societal functions, the need for robust red teaming methodologies continues to grow. In this forum piece, we examine emerging approaches to automating AI red teaming, with a particular focus on how the application of automated methods affects human-driven efforts. We discuss the role of labor in automated red teaming processes, the benefits and limitations of automation, and its broader implications for AI safety and labor practices. Drawing on existing frameworks and case studies, we argue for a balanced approach that combines human expertise with automated tools to strengthen AI risk assessment. Finally, we highlight key challenges in scaling automated red teaming, including considerations around worker proficiency, agency, and context-awareness.

Authors:Madhusudan Basak, Omar Sharif, Jessica Hulsey, Elizabeth C. Saunders, Daisy J. Goodman, Luke J. Archibald, Sarah M. Preum
Title: Socially Constructed Treatment Plans: Analyzing Online Peer Interactions to Understand How Patients Navigate Complex Medical Conditions
Abstract:
When faced with complex and uncertain medical conditions (e.g., cancer, mental health conditions, recovery from substance dependency), millions of patients seek online peer support. In this study, we leverage content analysis of online discourse and ethnographic studies with clinicians and patient representatives to characterize how treatment plans for complex conditions are "socially constructed." Specifically, we ground online conversation on medication-assisted recovery treatment to medication guidelines and subsequently surface when and why people deviate from the clinical guidelines. We characterize the implications and effectiveness of socially constructed treatment plans through in-depth interviews with clinical experts. Finally, given the enthusiasm around AI-powered solutions for patient communication, we investigate whether and how socially constructed treatment-related knowledge is reflected in a state-of-the-art large language model (LLM). Leveraging a novel mixed-method approach, this study highlights critical research directions for patient-centered communication in online health communities.

Authors:Brian Keith, Fausto German, Eric Krokos, Sarah Joseph, Chris North
Title: Explainable AI Components for Narrative Map Extraction
Abstract:
As narrative extraction systems grow in complexity, establishing user trust through interpretable and explainable outputs becomes increasingly critical. This paper presents an evaluation of an Explainable Artificial Intelligence (XAI) system for narrative map extraction that provides meaningful explanations across multiple levels of abstraction. Our system integrates explanations based on topical clusters for low-level document relationships, connection explanations for event relationships, and high-level structure explanations for overall narrative patterns. In particular, we evaluate the XAI system through a user study involving 10 participants that examined narratives from the 2021 Cuban protests. The analysis of results demonstrates that participants using the explanations made the users trust in the system's decisions, with connection explanations and important event detection proving particularly effective at building user confidence. Survey responses indicate that the multi-level explanation approach helped users develop appropriate trust in the system's narrative extraction capabilities. This work advances the state-of-the-art in explainable narrative extraction while providing practical insights for developing reliable narrative extraction systems that support effective human-AI collaboration.

Authors:Yuzhi Lai, Shenghai Yuan, Boya Zhang, Benjamin Kiefer, Peizheng Li, Tianchen Deng, Andreas Zell
Title: FAM-HRI: Foundation-Model Assisted Multi-Modal Human-Robot Interaction Combining Gaze and Speech
Abstract:
Effective Human-Robot Interaction (HRI) is crucial for enhancing accessibility and usability in real-world robotics applications. However, existing solutions often rely on gestures or language commands, making interaction inefficient and ambiguous, particularly for users with physical impairments. In this paper, we introduce FAM-HRI, an efficient multi-modal framework for human-robot interaction that integrates language and gaze inputs via foundation models. By leveraging lightweight Meta ARIA glasses, our system captures real-time multi-modal signals and utilizes large language models (LLMs) to fuse user intention with scene context, enabling intuitive and precise robot manipulation. Our method accurately determines gaze fixation time interval, reducing noise caused by the gaze dynamic nature. Experimental evaluations demonstrate that FAM-HRI achieves a high success rate in task execution while maintaining a low interaction time, providing a practical solution for individuals with limited physical mobility or motor impairments.

Authors:Methusela Sulle, Judith Mwakalonge, Gurcan Comert, Saidi Siuhi, Nana Kankam Gyimah, Jaylen Roberts, Denis Ruganuza
Title: Analysis of Distracted Pedestrians Crossing Behavior: An Immersive Virtual Reality Application
Abstract:
Pedestrian safety is a critical public health priority, with pedestrian fatalities accounting for 18% of all U.S. traffic deaths in 2022. The rising prevalence of distracted walking, exacerbated by mobile device use, poses significant risks at signalized intersections. This study utilized an immersive virtual reality (VR) environment to simulate real-world traffic scenarios and assess pedestrian behavior under three conditions: undistracted crossing, crossing while using a mobile device, and crossing with Light-emitting diode (LED) safety interventions. Analysis using ANOVA models identified speed and mobile-focused eye-tracking as significant predictors of crossing duration, revealing how distractions impair situational awareness and response times. While LED measures reduced delays, their limited effectiveness highlights the need for integrated strategies addressing both behavioral and physical factors. This study showcases VRs potential to analyze complex pedestrian behaviors, offering actionable insights for urban planners and policymakers aiming to enhance pedestrian safety.

Authors:Lama Ahmad, Sandhini Agarwal, Michael Lampe, Pamela Mishkin
Title: OpenAI's Approach to External Red Teaming for AI Models and Systems
Abstract:
Red teaming has emerged as a critical practice in assessing the possible risks of AI models and systems. It aids in the discovery of novel risks, stress testing possible gaps in existing mitigations, enriching existing quantitative safety metrics, facilitating the creation of new safety measurements, and enhancing public trust and the legitimacy of AI risk assessments. This white paper describes OpenAI's work to date in external red teaming and draws some more general conclusions from this work. We describe the design considerations underpinning external red teaming, which include: selecting composition of red team, deciding on access levels, and providing guidance required to conduct red teaming. Additionally, we show outcomes red teaming can enable such as input into risk assessment and automated evaluations. We also describe the limitations of external red teaming, and how it can fit into a broader range of AI model and system evaluations. Through these contributions, we hope that AI developers and deployers, evaluation creators, and policymakers will be able to better design red teaming campaigns and get a deeper look into how external red teaming can fit into model deployment and evaluation processes. These methods are evolving and the value of different methods continues to shift as the ecosystem around red teaming matures and models themselves improve as tools for red teaming.

Authors:Zachary Englhardt, Felix Hähnlein, Yuxuan Mei, Tong Lin, Connor Masahiro Sun, Zhihan Zhang, Adriana Schulz, Shwetak Patel, Vikram Iyer
Title: Incorporating Sustainability in Electronics Design: Obstacles and Opportunities
Abstract:
Life cycle assessment (LCA) is a methodology for holistically measuring the environmental impact of a product from initial manufacturing to end-of-life disposal. However, the extent to which LCA informs the design of computing devices remains unclear. To understand how this information is collected and applied, we interviewed 17 industry professionals with experience in LCA or electronics design, systematically coded the interviews, and investigated common themes. These themes highlight the challenge of LCA data collection and reveal distributed decision-making processes where responsibility for sustainable design choices, and their associated costs, is often ambiguous. Our analysis identifies opportunities for HCI technologies to support LCA computation and its integration into the design process to facilitate sustainability-oriented decision-making. While this work provides a nuanced discussion about sustainable design in the information and communication technologies (ICT) hardware industry, we hope our insights will also be valuable to other sectors.

Authors:Vincent Schorp, Frédéric Giraud, Gianluca Pargätzi, Michael Wäspe, Lorenzo von Ritter-Zahony, Marcel Wegmann, Nicola A. Cavalcanti, John Garcia Henao, Nicholas Bünger, Dominique Cachin, Sebastiano Caprara, Philipp Fürnstahl, Fabio Carrillo
Title: A Modular Edge Device Network for Surgery Digitalization
Abstract:
Future surgical care demands real-time, integrated data to drive informed decision-making and improve patient outcomes. The pressing need for seamless and efficient data capture in the OR motivates our development of a modular solution that bridges the gap between emerging machine learning techniques and interventional medicine. We introduce a network of edge devices, called Data Hubs (DHs), that interconnect diverse medical sensors, imaging systems, and robotic tools via optical fiber and a centralized network switch. Built on the NVIDIA Jetson Orin NX, each DH supports multiple interfaces (HDMI, USB-C, Ethernet) and encapsulates device-specific drivers within Docker containers using the Isaac ROS framework and ROS2. A centralized user interface enables straightforward configuration and real-time monitoring, while an Nvidia DGX computer provides state-of-the-art data processing and storage. We validate our approach through an ultrasound-based 3D anatomical reconstruction experiment that combines medical imaging, pose tracking, and RGB-D data acquisition.

Authors:Andreas Bauer, William Bosl, Oliver Aalami, Paul Schmiedmayer
Title: Toward Scalable Access to Neurodevelopmental Screening: Insights, Implementation, and Challenges
Abstract:
Children with neurodevelopmental disorders require timely intervention to improve long-term outcomes, yet early screening remains inaccessible in many regions. A scalable solution integrating standardized assessments with physiological data collection, such as electroencephalogram (EEG) recordings, could enable early detection in routine settings by non-specialists. To address this, we introduce NeuroNest, a mobile and cloud-based platform for large-scale EEG data collection, neurodevelopmental screening, and research. We provide a comprehensive review of existing behavioral and biomarker-based approaches, consumer-grade EEG devices, and emerging machine learning techniques. NeuroNest integrates low-cost EEG devices with digital screening tools, establishing a scalable, open-source infrastructure for non-invasive data collection, automated analysis, and interoperability across diverse hardware. Beyond the system architecture and reference implementation, we highlight key challenges in EEG data standardization, device interoperability, and bridging behavioral and physiological assessments. Our findings emphasize the need for future research on standardized data exchange, algorithm validation, and ecosystem development to expand screening accessibility. By providing an extensible, open-source system, NeuroNest advances machine learning-based early detection while fostering collaboration in screening technologies, clinical applications, and public health.

Authors:Zhen Chen, Zhihao Peng, Xusheng Liang, Cheng Wang, Peigan Liang, Linsheng Zeng, Minjie Ju, Yixuan Yuan
Title: MAP: Evaluation and Multi-Agent Enhancement of Large Language Models for Inpatient Pathways
Abstract:
Inpatient pathways demand complex clinical decision-making based on comprehensive patient information, posing critical challenges for clinicians. Despite advancements in large language models (LLMs) in medical applications, limited research focused on artificial intelligence (AI) inpatient pathways systems, due to the lack of large-scale inpatient datasets. Moreover, existing medical benchmarks typically concentrated on medical question-answering and examinations, ignoring the multifaceted nature of clinical decision-making in inpatient settings. To address these gaps, we first developed the Inpatient Pathway Decision Support (IPDS) benchmark from the MIMIC-IV database, encompassing 51,274 cases across nine triage departments and 17 major disease categories alongside 16 standardized treatment options. Then, we proposed the Multi-Agent Inpatient Pathways (MAP) framework to accomplish inpatient pathways with three clinical agents, including a triage agent managing the patient admission, a diagnosis agent serving as the primary decision maker at the department, and a treatment agent providing treatment plans. Additionally, our MAP framework includes a chief agent overseeing the inpatient pathways to guide and promote these three clinician agents. Extensive experiments showed our MAP improved the diagnosis accuracy by 25.10% compared to the state-of-the-art LLM HuatuoGPT2-13B. It is worth noting that our MAP demonstrated significant clinical compliance, outperforming three board-certified clinicians by 10%-12%, establishing a foundation for inpatient pathways systems.

Authors:Pierre Sermanet, Anirudha Majumdar, Vikas Sindhwani
Title: SciFi-Benchmark: Leveraging Science Fiction To Improve Robot Behavior
Abstract:
Given the recent rate of progress in artificial intelligence (AI) and robotics, a tantalizing question is emerging: would robots controlled by emerging AI systems be strongly aligned with human values? In this work, we propose a scalable way to probe this question by generating a benchmark spanning the key moments in 824 major pieces of science fiction literature (movies, tv, novels and scientific books) where an agent (AI or robot) made critical decisions (good or bad). We use a state-of-the-art LLM's recollection of each key moment to generate questions in similar situations, the decisions made by the agent, and alternative decisions it could have made (good or bad). We then measure an approximation of how well models align with human values on a set of human-voted answers. We also generate rules that can be automatically improved via an amendment process in order to generate the first Sci-Fi inspired constitutions for promoting ethical behavior in AIs and robots in the real world. Our first finding is that modern LLMs paired with constitutions turn out to be well-aligned with human values (95.8%), contrary to unsettling decisions typically made in Sci-Fi (only 21.2% alignment). Secondly, we find that generated constitutions substantially increase alignment compared to the base model (79.4% to 95.8%), and show resilience to an adversarial prompt setting (23.3% to 92.3%). Additionally, we find that those constitutions are among the top performers on the ASIMOV Benchmark which is derived from real-world images and hospital injury reports. Sci-Fi-inspired constitutions are thus highly aligned and applicable in real-world situations. We release SciFi-Benchmark: a large-scale dataset to advance robot ethics and safety research. It comprises 9,056 questions and 53,384 answers generated through a novel LLM-introspection process, in addition to a smaller human-labeled evaluation set.

Authors:Huiyun Tang, Björn Rohles, Yuwei Chuai, Gabriele Lenzini, Anastasia Sergeeva
Title: More Than Just Warnings:Exploring the Ways of Communicating Credibility Assessment on Social Media
Abstract:
Reducing the spread of misinformation is challenging. AI-based fact verification systems offer a promising solution by addressing the high costs and slow pace of traditional fact-checking. However, the problem of how to effectively communicate the results to users remains unsolved. Warning labels may seem an easy solution, but they fail to account for fuzzy misinformation that is not entirely fake. Additionally, users' limited attention spans and social media information should be taken into account while designing the presentation. The online experiment (n = 537) investigates the impact of sources and granularity on users' perception of information veracity and the system's usefulness and trustworthiness. Findings show that fine-grained indicators enhance nuanced opinions, information awareness, and the intention to use fact-checking systems. Source differences had minimal impact on opinions and perceptions, except for informativeness. Qualitative findings suggest the proposed indicators promote critical thinking. We discuss implications for designing concise, user-friendly AI fact-checking feedback.

Authors:Dimitri Ognibene, Sabrina Patania, Luca Annese, Cansu Koyuturk, Franca Garzotto, Giuseppe Vizzari, Azzurra Ruggeri, Simone Colombani
Title: SCOOP: A Framework for Proactive Collaboration and Social Continual Learning through Natural Language Interaction andCausal Reasoning
Abstract:
Multimodal information-gathering settings, where users collaborate with AI in dynamic environments, are increasingly common. These involve complex processes with textual and multimodal interactions, often requiring additional structural information via cost-incurring requests. AI helpers lack access to users' true goals, beliefs, and preferences and struggle to integrate diverse information effectively. We propose a social continual learning framework for causal knowledge acquisition and collaborative decision-making. It focuses on autonomous agents learning through dialogues, question-asking, and interaction in open, partially observable environments. A key component is a natural language oracle that answers the agent's queries about environmental mechanisms and states, refining causal understanding while balancing exploration or learning, and exploitation or knowledge use. Evaluation tasks inspired by developmental psychology emphasize causal reasoning and question-asking skills. They complement benchmarks by assessing the agent's ability to identify knowledge gaps, generate meaningful queries, and incrementally update reasoning. The framework also evaluates how knowledge acquisition costs are amortized across tasks within the same environment. We propose two architectures: 1) a system combining Large Language Models (LLMs) with the ReAct framework and question-generation, and 2) an advanced system with a causal world model, symbolic, graph-based, or subsymbolic, for reasoning and decision-making. The latter builds a causal knowledge graph for efficient inference and adaptability under constraints. Challenges include integrating causal reasoning into ReAct and optimizing exploration and question-asking in error-prone scenarios. Beyond applications, this framework models developmental processes combining causal reasoning, question generation, and social learning.

Authors:Diana Romero, Fatima Anwar, Salma Elmalaki
Title: MoCoMR: A Collaborative MR Simulator with Individual Behavior Modeling
Abstract:
Studying collaborative behavior in Mixed Reality (MR) often requires extensive, challenging data collection. This paper introduces MoCoMR, a novel simulator designed to address this by generating synthetic yet realistic collaborative MR data. MoCoMR captures individual behavioral modalities such as speaking, gaze, and locomotion during a collaborative image-sorting task with 48 participants to identify distinct behavioral patterns. MoCoMR simulates individual actions and interactions within a virtual space, enabling researchers to investigate the impact of individual behaviors on group dynamics and task performance. This simulator facilitates the development of more effective and human-centered MR applications by providing insights into user behavior and interaction patterns. The simulator's API allows for flexible configuration and data analysis, enabling researchers to explore various scenarios and generate valuable insights for optimizing collaborative MR experiences.

Authors:Michelle Vaccaro, Michael Caosun, Harang Ju, Sinan Aral, Jared R. Curhan
Title: Advancing AI Negotiations: New Theory and Evidence from a Large-Scale Autonomous Negotiations Competition
Abstract:
We conducted an International AI Negotiation Competition in which participants designed and refined prompts for AI negotiation agents. We then facilitated over 180,000 negotiations between these agents across multiple scenarios with diverse characteristics and objectives. Our findings revealed that principles from human negotiation theory remain crucial even in AI-AI contexts. Surprisingly, warmth--a traditionally human relationship-building trait--was consistently associated with superior outcomes across all key performance metrics. Dominant agents, meanwhile, were especially effective at claiming value. Our analysis also revealed unique dynamics in AI-AI negotiations not fully explained by existing theory, including AI-specific technical strategies like chain-of-thought reasoning, prompt injection, and strategic concealment. When we applied natural language processing (NLP) methods to the full transcripts of all negotiations we found positivity, gratitude and question-asking (associated with warmth) were strongly associated with reaching deals as well as objective and subjective value, whereas conversation lengths (associated with dominance) were strongly associated with impasses. The results suggest the need to establish a new theory of AI negotiation, which integrates classic negotiation theory with AI-specific negotiation theories to better understand autonomous negotiations and optimize agent performance.

Authors:Yingna Wang, Qingqin Liu, Xiaoying Wei, Mingming Fan
Title: Facilitating Daily Practice in Intangible Cultural Heritage through Virtual Reality: A Case Study of Traditional Chinese Flower Arrangement
Abstract:
The essence of intangible cultural heritage (ICH) lies in the living knowledge and skills passed down through generations. Daily practice plays a vital role in revitalizing ICH by fostering continuous learning and improvement. However, limited resources and accessibility pose significant challenges to sustaining such practice. Virtual reality (VR) has shown promise in supporting extensive skill training. Unlike technical skill training, ICH daily practice prioritizes cultivating a deeper understanding of cultural meanings and values. This study explores VR's potential in facilitating ICH daily practice through a case study of Traditional Chinese Flower Arrangement (TCFA). By investigating TCFA learners' challenges and expectations, we designed and evaluated FloraJing, a VR system enriched with cultural elements to support sustained TCFA practice. Findings reveal that FloraJing promotes progressive reflection, and continuous enhances technical improvement and cultural understanding. We further propose design implications for VR applications aimed at fostering ICH daily practice in both knowledge and skills.

Authors:Likith Kadiyala, Ramteja Sajja, Yusuf Sermet, Ibrahim Demir
Title: Enhancing Collective Intelligence in Large Language Models Through Emotional Integration
Abstract:
This research investigates the integration of emotional diversity into Large Language Models (LLMs) to enhance collective intelligence. Inspired by the human wisdom of crowds phenomenon, where group decisions often outperform individual judgments, we fine-tuned the DarkIdol-Llama-3.1-8B model using Google's GoEmotions dataset and Low-Rank Adaptation (LoRA) to simulate emotionally diverse responses. Evaluating the model on a distance estimation task between Fargo, ND, and Seattle, WA, across 15,064 unique persona configurations, we analyzed how emotional states and social attributes influence decision-making. Our findings demonstrate that emotional integration shapes response patterns while maintaining acceptable prediction accuracy, revealing its potential to enhance artificial collective intelligence. This study provides valuable insights into the interplay of emotional diversity and decision-making in LLMs, suggesting pathways for creating emotionally aware AI systems that balance emotional depth with analytical precision.

Authors:Kunal Handa, Alex Tamkin, Miles McCain, Saffron Huang, Esin Durmus, Sarah Heck, Jared Mueller, Jerry Hong, Stuart Ritchie, Tim Belonax, Kevin K. Troy, Dario Amodei, Jared Kaplan, Jack Clark, Deep Ganguli
Title: Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations
Abstract:
Despite widespread speculation about artificial intelligence's impact on the future of work, we lack systematic empirical evidence about how these systems are actually being used for different tasks. Here, we present a novel framework for measuring AI usage patterns across the economy. We leverage a recent privacy-preserving system to analyze over four million Claude.ai conversations through the lens of tasks and occupations in the U.S. Department of Labor's O*NET Database. Our analysis reveals that AI usage primarily concentrates in software development and writing tasks, which together account for nearly half of all total usage. However, usage of AI extends more broadly across the economy, with approximately 36% of occupations using AI for at least a quarter of their associated tasks. We also analyze how AI is being used for tasks, finding 57% of usage suggests augmentation of human capabilities (e.g., learning or iterating on an output) while 43% suggests automation (e.g., fulfilling a request with minimal human involvement). While our data and methods face important limitations and only paint a picture of AI usage on a single platform, they provide an automated, granular approach for tracking AI's evolving role in the economy and identifying leading indicators of future impact as these technologies continue to advance.

Authors:Maximilian Rettinger, Leander Hacker, Philipp Wolters, Gerhard Rigoll
Title: Optimizing Robot Programming: Mixed Reality Gripper Control
Abstract:
Conventional robot programming methods are complex and time-consuming for users. In recent years, alternative approaches such as mixed reality have been explored to address these challenges and optimize robot programming. While the findings of the mixed reality robot programming methods are convincing, most existing methods rely on gesture interaction for robot programming. Since controller-based interactions have proven to be more reliable, this paper examines three controller-based programming methods within a mixed reality scenario: 1) Classical Jogging, where the user positions the robot's end effector using the controller's thumbsticks, 2) Direct Control, where the controller's position and orientation directly corresponds to the end effector's, and 3) Gripper Control, where the controller is enhanced with a 3D-printed gripper attachment to grasp and release objects. A within-subjects study (n = 30) was conducted to compare these methods. The findings indicate that the Gripper Control condition outperforms the others in terms of task completion time, user experience, mental demand, and task performance, while also being the preferred method. Therefore, it demonstrates promising potential as an effective and efficient approach for future robot programming. Video available at https://youtu.be/83kWr8zUFIQ.

Authors:Manusha Karunathilaka, Shaolun Ruan, Lin-Ping Yuan, Jiannan Li, Zhiding Liang, Kavinda Athapaththu, Qiang Guan, Yong Wang
Title: Intuit: Explain Quantum Computing Concepts via AR-based Analogy
Abstract:
Quantum computing has shown great potential to revolutionize traditional computing and can provide an exponential speedup for a wide range of possible applications, attracting various stakeholders. However, understanding fundamental quantum computing concepts remains a significant challenge for novices because of their abstract and counterintuitive nature. Thus, we propose an analogy-based characterization framework to construct the mental mapping between quantum computing concepts and daily objects, informed by in-depth expert interviews and a literature review, covering key quantum concepts and characteristics like number of qubits, output state duality, quantum concept type, and probability quantification. Then, we developed an AR-based prototype system, Intuit, using situated analytics to explain quantum concepts through daily objects and phenomena (e.g., rotating coins, paper cutters). We thoroughly evaluated our approach through in-depth user and expert interviews. The Results demonstrate the effectiveness and usability of Intuit in helping learners understand abstract concepts in an intuitive and engaging manner.

Authors:Enrico Saccon, Ahmet Tikna, Davide De Martini, Edoardo Lamon, Luigi Palopoli, Marco Roveri
Title: A Temporal Planning Framework for Multi-Agent Systems via LLM-Aided Knowledge Base Management
Abstract:
This paper presents a novel framework, called PLANTOR (PLanning with Natural language for Task-Oriented Robots), that integrates Large Language Models (LLMs) with Prolog-based knowledge management and planning for multi-robot tasks. The system employs a two-phase generation of a robot-oriented knowledge base, ensuring reusability and compositional reasoning, as well as a three-step planning procedure that handles temporal dependencies, resource constraints, and parallel task execution via mixed-integer linear programming. The final plan is converted into a Behaviour Tree for direct use in ROS2. We tested the framework in multi-robot assembly tasks within a block world and an arch-building scenario. Results demonstrate that LLMs can produce accurate knowledge bases with modest human feedback, while Prolog guarantees formal correctness and explainability. This approach underscores the potential of LLM integration for advanced robotics tasks requiring flexible, scalable, and human-understandable planning.

Authors:Julia Barnett, Kimon Kieslich, Natali Helberger, Nicholas Diakopoulos
Title: Envisioning Stakeholder-Action Pairs to Mitigate Negative Impacts of AI: A Participatory Approach to Inform Policy Making
Abstract:
The potential for negative impacts of AI has rapidly become more pervasive around the world, and this has intensified a need for responsible AI governance. While many regulatory bodies endorse risk-based approaches and a multitude of risk mitigation practices are proposed by companies and academic scholars, these approaches are commonly expert-centered and thus lack the inclusion of a significant group of stakeholders. Ensuring that AI policies align with democratic expectations requires methods that prioritize the voices and needs of those impacted. In this work we develop a participative and forward-looking approach to inform policy-makers and academics that grounds the needs of lay stakeholders at the forefront and enriches the development of risk mitigation strategies. Our approach (1) maps potential mitigation and prevention strategies of negative AI impacts that assign responsibility to various stakeholders, (2) explores the importance and prioritization thereof in the eyes of laypeople, and (3) presents these insights in policy fact sheets, i.e., a digestible format for informing policy processes. We emphasize that this approach is not targeted towards replacing policy-makers; rather our aim is to present an informative method that enriches mitigation strategies and enables a more participatory approach to policy development.

Authors:Zeyu Yan, Advait Vartak, Jiasheng Li, Zining Zhang, Huaishu Peng
Title: PCB Renewal: Iterative Reuse of PCB Substrates for Sustainable Electronic Making
Abstract:
PCB (printed circuit board) substrates are often single-use, leading to material waste in electronics making. We introduce PCB Renewal, a novel technique that "erases" and "reconfigures" PCB traces by selectively depositing conductive epoxy onto outdated areas, transforming isolated paths into conductive planes that support new traces. We present the PCB Renewal workflow, evaluate its electrical performance and mechanical durability, and model its sustainability impact, including material usage, cost, energy consumption, and time savings. We develop a software plug-in that guides epoxy deposition, generates updated PCB profiles, and calculates resource usage. To demonstrate PCB Renewal's effectiveness and versatility, we repurpose a single PCB across four design iterations spanning three projects: a camera roller, a WiFi radio, and an ESPboy game console. We also show how an outsourced double-layer PCB can be reconfigured, transforming it from an LED watch to an interactive cat toy. The paper concludes with limitations and future directions.

Authors:Zeyu Yan, Mrunal Dhaygude, Huaishu Peng
Title: Make Making Sustainable: Exploring Sustainability Practices, Challenges, and Opportunities in Making Activities
Abstract:
The recent democratization of personal fabrication has significantly advanced the maker movement and reshaped applied research in HCI and beyond. However, this growth has also raised increasing sustainability concerns, as material waste is an inevitable byproduct of making and rapid prototyping. In this work, we examine the sustainability landscape within the modern maker community, focusing on grassroots makerspaces and maker-oriented research labs through in-depth interviews with diverse stakeholders involved in making and managing making-related activities. Our findings highlight four key themes: the various types of "waste" generated through the making process, the strategies (or lack thereof) for managing this waste, the motivations driving (un)sustainable practices, and the challenges faced. We synthesize these insights into design considerations and takeaways for technical HCI researchers and the broader community, focusing on future tools, infrastructures, and educational approaches to foster sustainable making.

Authors:Qianjie Wei, Xiaoying Wei, Yiqi Liang, Fan Lin, Nuonan Si, Mingming Fan
Title: RemoteChess: Enhancing Older Adults' Social Connectedness via Designing a Virtual Reality Chinese Chess (Xiangqi) Community
Abstract:
The decline of social connectedness caused by distance and physical limitations severely affects older adults' well-being and mental health. While virtual reality (VR) is promising for older adults to socialize remotely, existing social VR designs primarily focus on verbal communication (e.g., reminiscent, chat). Actively engaging in shared activities is also an important aspect of social connection. We designed RemoteChess, which constructs a social community and a culturally relevant activity (i.e., Chinese chess) for older adults to play while engaging in social interaction. We conducted a user study with groups of older adults interacting with each other through RemoteChess. Our findings indicate that RemoteChess enhanced participants' social connectedness by offering familiar environments, culturally relevant social catalysts, and asymmetric interactions. We further discussed design guidelines for designing culturally relevant social activities in VR to promote social connectedness for older adults.

Authors:Wonduk Seo, Seungyong Lee, Daye Kang, Hyunjin An, Zonghao Yuan, Seunghyun Lee
Title: Automated Visualization Code Synthesis via Multi-Path Reasoning and Feedback-Driven Optimization
Abstract:
Rapid advancements in Large Language Models (LLMs) have accelerated their integration into automated visualization code generation applications. Despite advancements through few-shot prompting and query expansion, existing methods remain limited in handling ambiguous and complex queries, thereby requiring manual intervention. To overcome these limitations, we propose VisPath: a Multi-Path Reasoning and Feedback-Driven Optimization Framework for Visualization Code Generation. VisPath handles underspecified queries through structured, multi-stage processing. It begins by reformulating the user input via Chain-of-Thought (CoT) prompting, which refers to the initial query while generating multiple extended queries in parallel, enabling the LLM to capture diverse interpretations of the user intent. These queries then generate candidate visualization scripts, which are executed to produce diverse images. By assessing the visual quality and correctness of each output, VisPath generates targeted feedback that is aggregated to synthesize an optimal final result. Extensive experiments on widely-used benchmarks including MatPlotBench and the Qwen-Agent Code Interpreter Benchmark show that VisPath outperforms state-of-the-art methods, offering a more reliable solution for AI-driven visualization code generation.

Authors:Hyunchul Lim, Nam Anh Dang, Dylan Lee, Tianhong Catherine Yu, Jane Lu, Franklin Mingzhe Li, Yiqi Jin, Yan Ma, Xiaojun Bi, François Guimbretière, Cheng Zhang
Title: SpellRing: Recognizing Continuous Fingerspelling in American Sign Language using a Ring
Abstract:
Fingerspelling is a critical part of American Sign Language (ASL) recognition and has become an accessible optional text entry method for Deaf and Hard of Hearing (DHH) individuals. In this paper, we introduce SpellRing, a single smart ring worn on the thumb that recognizes words continuously fingerspelled in ASL. SpellRing uses active acoustic sensing (via a microphone and speaker) and an inertial measurement unit (IMU) to track handshape and movement, which are processed through a deep learning algorithm using Connectionist Temporal Classification (CTC) loss. We evaluated the system with 20 ASL signers (13 fluent and 7 learners), using the MacKenzie-Soukoref Phrase Set of 1,164 words and 100 phrases. Offline evaluation yielded top-1 and top-5 word recognition accuracies of 82.45% (9.67%) and 92.42% (5.70%), respectively. In real-time, the system achieved a word error rate (WER) of 0.099 (0.039) on the phrases. Based on these results, we discuss key lessons and design implications for future minimally obtrusive ASL recognition wearables.

Authors:Tim Zindulka, Jannek Sekowski, Florian Lehmann, Daniel Buschek
Title: Exploring Mobile Touch Interaction with Large Language Models
Abstract:
Interacting with Large Language Models (LLMs) for text editing on mobile devices currently requires users to break out of their writing environment and switch to a conversational AI interface. In this paper, we propose to control the LLM via touch gestures performed directly on the text. We first chart a design space that covers fundamental touch input and text transformations. In this space, we then concretely explore two control mappings: spread-to-generate and pinch-to-shorten, with visual feedback loops. We evaluate this concept in a user study (N=14) that compares three feedback designs: no visualisation, text length indicator, and length + word indicator. The results demonstrate that touch-based control of LLMs is both feasible and user-friendly, with the length + word indicator proving most effective for managing text generation. This work lays the foundation for further research into gesture-based interaction with LLMs on touch devices.

Authors:Tim Zindulka, Sven Goller, Florian Lehmann, Daniel Buschek
Title: Content-Driven Local Response: Supporting Sentence-Level and Message-Level Mobile Email Replies With and Without AI
Abstract:
Mobile emailing demands efficiency in diverse situations, which motivates the use of AI. However, generated text does not always reflect how people want to respond. This challenges users with AI involvement tradeoffs not yet considered in email UIs. We address this with a new UI concept called Content-Driven Local Response (CDLR), inspired by microtasking. This allows users to insert responses into the email by selecting sentences, which additionally serves to guide AI suggestions. The concept supports combining AI for local suggestions and message-level improvements. Our user study (N=126) compared CDLR with manual typing and full reply generation. We found that CDLR supports flexible workflows with varying degrees of AI involvement, while retaining the benefits of reduced typing and errors. This work contributes a new approach to integrating AI capabilities: By redesigning the UI for workflows with and without AI, we can empower users to dynamically adjust AI involvement.

Authors:Chacha Chen, Han Liu, Jiamin Yang, Benjamin M. Mervak, Bora Kalaycioglu, Grace Lee, Emre Cakmakli, Matteo Bonatti, Sridhar Pudu, Osman Kahraman, Gul Gizem Pamuk, Aytekin Oto, Aritrick Chatterjee, Chenhao Tan
Title: Can Domain Experts Rely on AI Appropriately? A Case Study on AI-Assisted Prostate Cancer MRI Diagnosis
Abstract:
Despite the growing interest in human-AI decision making, experimental studies with domain experts remain rare, largely due to the complexity of working with domain experts and the challenges in setting up realistic experiments. In this work, we conduct an in-depth collaboration with radiologists in prostate cancer diagnosis based on MRI images. Building on existing tools for teaching prostate cancer diagnosis, we develop an interface and conduct two experiments to study how AI assistance and performance feedback shape the decision making of domain experts. In Study 1, clinicians were asked to provide an initial diagnosis (human), then view the AI's prediction, and subsequently finalize their decision (human-AI team). In Study 2 (after a memory wash-out period), the same participants first received aggregated performance statistics from Study 1, specifically their own performance, the AI's performance, and their human-AI team performance, and then directly viewed the AI's prediction before making their diagnosis (i.e., no independent initial diagnosis). These two workflows represent realistic ways that clinical AI tools might be used in practice, where the second study simulates a scenario where doctors can adjust their reliance and trust on AI based on prior performance feedback. Our findings show that, while human-AI teams consistently outperform humans alone, they still underperform the AI due to under-reliance, similar to prior studies with crowdworkers. Providing clinicians with performance feedback did not significantly improve the performance of human-AI teams, although showing AI decisions in advance nudges people to follow AI more. Meanwhile, we observe that the ensemble of human-AI teams can outperform AI alone, suggesting promising directions for human-AI collaboration.

Authors:Meenatchi Sundaram Muthu Selva Annamalai, Igor Bilogrevic, Emiliano De Cristofaro
Title: Beyond the Crawl: Unmasking Browser Fingerprinting in Real User Interactions
Abstract:
Browser fingerprinting is a pervasive online tracking technique used increasingly often for profiling and targeted advertising. Prior research on the prevalence of fingerprinting heavily relied on automated web crawls, which inherently struggle to replicate the nuances of human-computer interactions. This raises concerns about the accuracy of current understandings of real-world fingerprinting deployments. As a result, this paper presents a user study involving 30 participants over 10 weeks, capturing telemetry data from real browsing sessions across 3,000 top-ranked websites. Our evaluation reveals that automated crawls miss almost half (45%) of the fingerprinting websites encountered by real users. This discrepancy mainly stems from the crawlers' inability to access authentication-protected pages, circumvent bot detection, and trigger fingerprinting scripts activated by specific user interactions. We also identify potential new fingerprinting vectors present in real user data but absent from automated crawls. Finally, we evaluate the effectiveness of federated learning for training browser fingerprinting detection models on real user data, yielding improved performance than models trained solely on automated crawl data.

Authors:Zhihan Jiang, Running Zhao, Lin Lin, Yue Yu, Handi Chen, Xinchen Zhang, Xuhai Xu, Yifang Wang, Xiaojuan Ma, Edith C. H. Ngai
Title: DietGlance: Dietary Monitoring and Personalized Analysis at a Glance with Knowledge-Empowered AI Assistant
Abstract:
Growing awareness of wellness has prompted people to consider whether their dietary patterns align with their health and fitness goals. In response, researchers have introduced various wearable dietary monitoring systems and dietary assessment approaches. However, these solutions are either limited to identifying foods with simple ingredients or insufficient in providing analysis of individual dietary behaviors with domain-specific knowledge. In this paper, we present DietGlance, a system that automatically monitors dietary in daily routines and delivers personalized analysis from knowledge sources. DietGlance first detects ingestive episodes from multimodal inputs using eyeglasses, capturing privacy-preserving meal images of various dishes being consumed. Based on the inferred food items and consumed quantities from these images, DietGlance further provides nutritional analysis and personalized dietary suggestions, empowered by the retrieval augmentation generation module on a reliable nutrition library. A short-term user study (N=33) and a four-week longitudinal study (N=16) demonstrate the usability and effectiveness of DietGlance, offering insights and implications for future AI-assisted dietary monitoring and personalized healthcare intervention systems using eyewear.

Authors:Leonardo Pavanatto, Alexander Giovannelli, Brian Giera, Peer-Timo Bremer, Haichao Miao, Doug Bowman
Title: Exploring Multiscale Navigation of Homogeneous and Dense Objects with Progressive Refinement in Virtual Reality
Abstract:
Locating small features in a large, dense object in virtual reality (VR) poses a significant interaction challenge. While existing multiscale techniques support transitions between various levels of scale, they are not focused on handling dense, homogeneous objects with hidden features. We propose a novel approach that applies the concept of progressive refinement to VR navigation, enabling focused inspections. We conducted a user study where we varied two independent variables in our design, navigation style (STRUCTURED vs. UNSTRUCTURED) and display mode (SELECTION vs. EVERYTHING), to better understand their effects on efficiency and awareness during multiscale navigation. Our results showed that unstructured navigation can be faster than structured and that displaying only the selection can be faster than displaying the entire object. However, using an everything display mode can support better location awareness and object understanding.

Authors:Raymond Fok, Alexa Siu, Daniel S. Weld
Title: Toward Living Narrative Reviews: An Empirical Study of the Processes and Challenges in Updating Survey Articles in Computing Research
Abstract:
Surveying prior literature to establish a foundation for new knowledge is essential for scholarly progress. However, survey articles are resource-intensive and challenging to create, and can quickly become outdated as new research is published, risking information staleness and inaccuracy. Keeping survey articles current with the latest evidence is therefore desirable, though there is a limited understanding of why, when, and how these surveys should be updated. Toward this end, through a series of in-depth retrospective interviews with 11 researchers, we present an empirical examination of the work practices in authoring and updating survey articles in computing research. We find that while computing researchers acknowledge the value in maintaining an updated survey, continuous updating remains unmanageable and misaligned with academic incentives. Our findings suggest key leverage points within current workflows that present opportunities for enabling technologies to facilitate more efficient and effective updates.

Authors:Alexander Giovannelli, Leonardo Pavanatto, Shakiba Davari, Haichao Miao, Vuthea Chheang, Brian Giera, Peer-Timo Bremer, Doug Bowman
Title: Investigating the Influence of Playback Interactivity during Guided Tours for Asynchronous Collaboration in Virtual Reality
Abstract:
Collaborative virtual environments allow workers to contribute to team projects across space and time. While much research has closely examined the problem of working in different spaces at the same time, few have investigated the best practices for collaborating in those spaces at different times aside from textual and auditory annotations. We designed a system that allows experts to record a tour inside a virtual inspection space, preserving knowledge and providing later observers with insights through a 3D playback of the expert's inspection. We also created several interactions to ensure that observers are tracking the tour and remaining engaged. We conducted a user study to evaluate the influence of these interactions on an observing user's information recall and user experience. Findings indicate that independent viewpoint control during a tour enhances the user experience compared to fully passive playback and that additional interactivity can improve auditory and spatial recall of key information conveyed during the tour.

Authors:Rashid Mushkani, Hugo Berard, Allison Cohen, Shin Koeski
Title: The Right to AI
Abstract:
This paper proposes a Right to AI, which asserts that individuals and communities should meaningfully participate in the development and governance of the AI systems that shape their lives. Motivated by the increasing deployment of AI in critical domains and inspired by Henri Lefebvre's concept of the Right to the City, we reconceptualize AI as a societal infrastructure, rather than merely a product of expert design. In this paper, we critically evaluate how generative agents, large-scale data extraction, and diverse cultural values bring new complexities to AI oversight. The paper proposes that grassroots participatory methodologies can mitigate biased outcomes and enhance social responsiveness. It asserts that data is socially produced and should be managed and owned collectively. Drawing on Sherry Arnstein's Ladder of Citizen Participation and analyzing nine case studies, the paper develops a four-tier model for the Right to AI that situates the current paradigm and envisions an aspirational future. It proposes recommendations for inclusive data ownership, transparent design processes, and stakeholder-driven oversight. We also discuss market-led and state-centric alternatives and argue that participatory approaches offer a better balance between technical efficiency and democratic legitimacy.

Authors:Ziang Liu, Yuanchen Ju, Yu Da, Tom Silver, Pranav N. Thakkar, Jenna Li, Justin Guo, Katherine Dimitropoulou, Tapomayukh Bhattacharjee
Title: GRACE: Generalizing Robot-Assisted Caregiving with User Functionality Embeddings
Abstract:
Robot caregiving should be personalized to meet the diverse needs of care recipients -- assisting with tasks as needed, while taking user agency in action into account. In physical tasks such as handover, bathing, dressing, and rehabilitation, a key aspect of this diversity is the functional range of motion (fROM), which can vary significantly between individuals. In this work, we learn to predict personalized fROM as a way to generalize robot decision-making in a wide range of caregiving tasks. We propose a novel data-driven method for predicting personalized fROM using functional assessment scores from occupational therapy. We develop a neural model that learns to embed functional assessment scores into a latent representation of the user's physical function. The model is trained using motion capture data collected from users with emulated mobility limitations. After training, the model predicts personalized fROM for new users without motion capture. Through simulated experiments and a real-robot user study, we show that the personalized fROM predictions from our model enable the robot to provide personalized and effective assistance while improving the user's agency in action. See our website for more visualizations: https://emprise.cs.cornell.edu/grace/.

Authors:Jingyu Shi, Rahul Jain, Seungguen Chi, Hyungjun Doh, Hyunggun Chi, Alexander J. Quinn, Karthik Ramani
Title: CARING-AI: Towards Authoring Context-aware Augmented Reality INstruction through Generative Artificial Intelligence
Abstract:
Context-aware AR instruction enables adaptive and in-situ learning experiences. However, hardware limitations and expertise requirements constrain the creation of such instructions. With recent developments in Generative Artificial Intelligence (Gen-AI), current research tries to tackle these constraints by deploying AI-generated content (AIGC) in AR applications. However, our preliminary study with six AR practitioners revealed that the current AIGC lacks contextual information to adapt to varying application scenarios and is therefore limited in authoring. To utilize the strong generative power of GenAI to ease the authoring of AR instruction while capturing the context, we developed CARING-AI, an AR system to author context-aware humanoid-avatar-based instructions with GenAI. By navigating in the environment, users naturally provide contextual information to generate humanoid-avatar animation as AR instructions that blend in the context spatially and temporally. We showcased three application scenarios of CARING-AI: Asynchronous Instructions, Remote Instructions, and Ad Hoc Instructions based on a design space of AIGC in AR Instructions. With two user studies (N=12), we assessed the system usability of CARING-AI and demonstrated the easiness and effectiveness of authoring with Gen-AI.

Authors:Lin Duan, Yanming Xiu, Maria Gorlatova
Title: Advancing the Understanding and Evaluation of AR-Generated Scenes: When Vision-Language Models Shine and Stumble
Abstract:
Augmented Reality (AR) enhances the real world by integrating virtual content, yet ensuring the quality, usability, and safety of AR experiences presents significant challenges. Could Vision-Language Models (VLMs) offer a solution for the automated evaluation of AR-generated scenes? Could Vision-Language Models (VLMs) offer a solution for the automated evaluation of AR-generated scenes? In this study, we evaluate the capabilities of three state-of-the-art commercial VLMs -- GPT, Gemini, and Claude -- in identifying and describing AR scenes. For this purpose, we use DiverseAR, the first AR dataset specifically designed to assess VLMs' ability to analyze virtual content across a wide range of AR scene complexities. Our findings demonstrate that VLMs are generally capable of perceiving and describing AR scenes, achieving a True Positive Rate (TPR) of up to 93% for perception and 71% for description. While they excel at identifying obvious virtual objects, such as a glowing apple, they struggle when faced with seamlessly integrated content, such as a virtual pot with realistic shadows. Our results highlight both the strengths and the limitations of VLMs in understanding AR scenarios. We identify key factors affecting VLM performance, including virtual content placement, rendering quality, and physical plausibility. This study underscores the potential of VLMs as tools for evaluating the quality of AR experiences.

Authors:Dylan Gaines, Keith Vertanen
Title: Adapting Large Language Models for Character-based Augmentative and Alternative Communication
Abstract:
Users of Augmentative and Alternative Communication (AAC) may write letter-by-letter via an interface that uses a character language model. However, most state-of-the-art large pretrained language models predict subword tokens of variable length. We investigate how to practically use such models to make accurate and efficient character predictions. We fine-tune models using a large dataset of sentences we curated in which each sentence is rated according to how useful it might be for spoken or written AAC communication. We find that using an algorithm to produce character predictions from a subword large language model provides more accurate predictions than adding a classification layer or using a byte-level model. We also find that our domain adaptation procedure is effective at improving model performance on simple, conversational text.

Authors:Dylan Gaines, Keith Vertanen
Title: Identifying the Desired Word Suggestion in Simultaneous Audio
Abstract:
We explore a method for presenting word suggestions for non-visual text input using simultaneous voices. We conduct two perceptual studies and investigate the impact of different presentations of voices on a user's ability to detect which voice, if any, spoke their desired word. Our sets of words simulated the word suggestions of a predictive keyboard during real-world text input. We find that when voices are simultaneous, user accuracy decreases significantly with each added word suggestion. However, adding a slight 0.15 s delay between the start of each subsequent word allows two simultaneous words to be presented with no significant decrease in accuracy compared to presenting two words sequentially (84% simultaneous versus 86% sequential). This allows two word suggestions to be presented to the user 32% faster than sequential playback without decreasing accuracy.

Authors:Clayton Miller, Yun Xuan Chua, Matias Quintana, Binyu Lei, Filip Biljecki, Mario Frei
Title: Make yourself comfortable: Nudging urban heat and noise mitigation with smartwatch-based Just-in-time Adaptive Interventions (JITAI)
Abstract:
Humans can play a more active role in improving their comfort in the built environment if given the right information at the right place and time. This paper outlines the use of Just-in-Time Adaptive Interventions (JITAI) implemented in the context of the built environment to provide information that helps humans minimize the impact of heat and noise on their daily lives. This framework is based on the open-source Cozie iOS smartwatch platform. It includes data collection through micro-surveys and intervention messages triggered by environmental, contextual, and personal history conditions. An eight-month deployment of the method was completed in Singapore with 103 participants who submitted more than 12,000 micro-surveys and had more than 3,600 JITAI intervention messages delivered to them. A weekly survey conducted during two deployment phases revealed an overall increase in perceived usefulness ranging from 8-19% over the first three weeks of data collection. For noise-related interventions, participants showed an overall increase in location changes ranging from 4-11% and a 2-17% increase in earphone use to mitigate noise distractions. For thermal comfort-related interventions, participants demonstrated a 3-13\% increase in adjustments to their location or thermostat to feel more comfortable. The analysis found evidence that personality traits (such as conscientiousness), gender, and environmental preferences could be factors in determining the perceived helpfulness of JITAIs and influencing behavior change. These findings underscore the importance of tailoring intervention strategies to individual traits and environmental conditions, setting the stage for future research to refine the delivery, timing, and content of intervention messages.

Authors:Auste Simkute, Viktor Kewenig, Abigail Sellen, Sean Rintel, Lev Tankelevitch
Title: The New Calculator? Practices, Norms, and Implications of Generative AI in Higher Education
Abstract:
Generative AI (GenAI) has introduced myriad opportunities and challenges for higher education. Anticipating this potential transformation requires understanding students' contextualised practices and norms around GenAI. We conducted semi-structured interviews with 26 students and 11 educators from diverse departments across two universities. Grounded in Strong Structuration Theory, we find diversity in students' uses and motivations for GenAI. Occurring in the context of unclear university guidelines, institutional fixation on plagiarism, and inconsistent educator communication, students' practices are informed by unspoken rules around appropriate use, GenAI limitations and reliance strategies, and consideration of agency and skills. Perceived impacts include changes in confidence, and concerns about skill development, relationships with educators, and plagiarism. Both groups envision changes in universities' attitude to GenAI, responsible use training, assessments, and integration of GenAI into education. We discuss socio-technical implications in terms of current and anticipated changes in the external and internal structures that contextualise students' GenAI use.

Authors:Hao Chen, Gonzalo Esteban Constante-Flores, Krishna Sri Ipsit Mantri, Sai Madhukiran Kompalli, Akshdeep Singh Ahluwalia, Can Li
Title: OptiChat: Bridging Optimization Models and Practitioners with Large Language Models
Abstract:
Optimization models have been applied to solve a wide variety of decision-making problems. These models are usually developed by optimization experts but are used by practitioners without optimization expertise in various application domains. As a result, practitioners often struggle to interact with and draw useful conclusions from optimization models independently. To fill this gap, we introduce OptiChat, a natural language dialogue system designed to help practitioners interpret model formulation, diagnose infeasibility, analyze sensitivity, retrieve information, evaluate modifications, and provide counterfactual explanations. By augmenting large language models (LLMs) with functional calls and code generation tailored for optimization models, we enable seamless interaction and minimize the risk of hallucinations in OptiChat. We develop a new dataset to evaluate OptiChat's performance in explaining optimization models. Experiments demonstrate that OptiChat effectively bridges the gap between optimization models and practitioners, delivering autonomous, accurate, and instant responses.

Authors:Reza Jalayer, Yuxin Chen, Masoud Jalayer, Carlotta Orsenigo, Masayoshi Tomizuka
Title: Testing Human-Hand Segmentation on In-Distribution and Out-of-Distribution Data in Human-Robot Interactions Using a Deep Ensemble Model
Abstract:
Reliable detection and segmentation of human hands are critical for enhancing safety and facilitating advanced interactions in human-robot collaboration. Current research predominantly evaluates hand segmentation under in-distribution (ID) data, which reflects the training data of deep learning (DL) models. However, this approach fails to address out-of-distribution (OOD) scenarios that often arise in real-world human-robot interactions. In this study, we present a novel approach by evaluating the performance of pre-trained DL models under both ID data and more challenging OOD scenarios. To mimic realistic industrial scenarios, we designed a diverse dataset featuring simple and cluttered backgrounds with industrial tools, varying numbers of hands (0 to 4), and hands with and without gloves. For OOD scenarios, we incorporated unique and rare conditions such as finger-crossing gestures and motion blur from fast-moving hands, addressing both epistemic and aleatoric uncertainties. To ensure multiple point of views (PoVs), we utilized both egocentric cameras, mounted on the operator's head, and static cameras to capture RGB images of human-robot interactions. This approach allowed us to account for multiple camera perspectives while also evaluating the performance of models trained on existing egocentric datasets as well as static-camera datasets. For segmentation, we used a deep ensemble model composed of UNet and RefineNet as base learners. Performance evaluation was conducted using segmentation metrics and uncertainty quantification via predictive entropy. Results revealed that models trained on industrial datasets outperformed those trained on non-industrial datasets, highlighting the importance of context-specific training. Although all models struggled with OOD scenarios, those trained on industrial datasets demonstrated significantly better generalization.

Authors:Sanjit Kakarla, Conrad Borchers, Danielle Thomas, Shambhavi Bhushan, Kenneth R. Koedinger
Title: Comparing Few-Shot Prompting of GPT-4 LLMs with BERT Classifiers for Open-Response Assessment in Tutor Equity Training
Abstract:
Assessing learners in ill-defined domains, such as scenario-based human tutoring training, is an area of limited research. Equity training requires a nuanced understanding of context, but do contemporary large language models (LLMs) have a knowledge base that can navigate these nuances? Legacy transformer models like BERT, in contrast, have less real-world knowledge but can be more easily fine-tuned than commercial LLMs. Here, we study whether fine-tuning BERT on human annotations outperforms state-of-the-art LLMs (GPT-4o and GPT-4-Turbo) with few-shot prompting and instruction. We evaluate performance on four prediction tasks involving generating and explaining open-ended responses in advocacy-focused training lessons in a higher education student population learning to become middle school tutors. Leveraging a dataset of 243 human-annotated open responses from tutor training lessons, we find that BERT demonstrates superior performance using an offline fine-tuning approach, which is more resource-efficient than commercial GPT models. We conclude that contemporary GPT models may not adequately capture nuanced response patterns, especially in complex tasks requiring explanation. This work advances the understanding of AI-driven learner evaluation under the lens of fine-tuning versus few-shot prompting on the nuanced task of equity training, contributing to more effective training solutions and assisting practitioners in choosing adequate assessment methods.

Authors:Barbara Karpowicz, Maciej Grzeszczuk, Adam Kuzdraliński, Monika Kornacka, Aliaksandr Marozau, Wiktor Stawski, Pavlo Zinevych, Grzegorz Marcin Wójcik, Tomasz Kowalewski, Grzegorz Pochwatko, Wiesław Kopeć
Title: Immersive Technologies in Training and Healthcare: From Space Missions to Psychophysiological Research
Abstract:
Virtual, Augmented, and eXtended Reality (VR/AR/XR) technologies are increasingly recognized for their applications in training, diagnostics, and psychological research, particularly in high-risk and highly regulated environments. In this panel we discuss how immersive systems enhance human performance across multiple domains, including clinical psychology, space exploration, and medical education. In psychological research and training, XR can offer a controlled yet ecologically valid setting for measuring cognitive and affective processes. In space exploration, we discuss the development of VR-based astronaut training and diagnostic systems, allowing astronauts to perform real-time health assessments. In medical education and rehabilitation, we cover procedural training and patient engagement. From virtual surgical simulations to gamified rehabilitation exercises, immersive environments enhance both learning outcomes and treatment adherence.

Authors:Yuya Ide, Hailong Liu, Takahiro Wada
Title: Reducing Motion Sickness in Passengers of Autonomous Personal Mobility Vehicles by Presenting a Driving Path
Abstract:
Autonomous personal mobility vehicles (APMVs) are small mobility devices designed for individual automated transportation in shared spaces. In such environments, frequent pedestrian avoidance maneuvers may cause rapid steering adjustments and passive postural responses from passengers, thereby increasing the risk of motion sickness. This study investigated the effects of providing path information on 16 passengers' head movement behavior and motion sickness while riding an APMV. Through a controlled experiment comparing manual driving (MD), autonomous driving without path information (AD w/o path), and autonomous driving with path information (AD w/ path), we found that providing path cues significantly reduced MISC scores and delayed the onset of motion sickness symptoms. In addition, participants were more likely to proactively align their head movements with the direction of vehicle rotation in both MD and AD w/ path conditions. Although a small correlation was observed between the delay in yaw rotation of the passenger's head relative to the vehicle and the occurrence of motion sickness, the underlying physiological mechanism remains to be elucidated.

Authors:Samuel Reinders, Munazza Zaib, Matthew Butler, Bongshin Lee, Ingrid Zukerman, Lizhen Qu, Kim Marriott
Title: Accessible Data Access and Analysis by People who are Blind or Have Low Vision
Abstract:
Our work aims to develop new assistive technologies that enable blind or low vision (BLV) people to explore and analyze data readily. At present, barriers exist for BLV people to explore and analyze data, restricting access to government, health and personal data, and limiting employment opportunities. This work explores the co-design and development of an innovative system to support data access, with a focus on the use of refreshable tactile displays (RTDs) and conversational agents. The envisaged system will use a combination of tactile graphics and speech to communicate with BLV users, and proactively assist with data analysis tasks. As well as addressing significant equity gaps, our work expects to produce innovations in assistive technology, multimodal interfaces, dialogue systems, and natural language understanding and generation.

Authors:Akshay Nayak Kolgar, Yash Prakash, Sampath Jayarathna, Hae-Na Lee, Vikas Ashok
Title: Insights in Adaptation: Examining Self-reflection Strategies of Job Seekers with Visual Impairments in India
Abstract:
Significant changes in the digital employment landscape, driven by rapid technological advancements and the COVID-19 pandemic, have introduced new opportunities for blind and visually impaired (BVI) individuals in developing countries like India. However, a significant portion of the BVI population in India remains unemployed despite extensive accessibility advancements and job search interventions. Therefore, we conducted semi-structured interviews with 20 BVI persons who were either pursuing or recently sought employment in the digital industry. Our findings reveal that despite gaining digital literacy and extensive training, BVI individuals struggle to meet industry requirements for fulfilling job openings. While they engage in self-reflection to identify shortcomings in their approach and skills, they lack constructive feedback from peers and recruiters. Moreover, the numerous job intervention tools are limited in their ability to meet the unique needs of BVI job seekers. Our results therefore provide key insights that inform the design of future collaborative intervention systems that offer personalized feedback for BVI individuals, effectively guiding their self-reflection process and subsequent job search behaviors, and potentially leading to improved employment outcomes.

Authors:Maciej Grzeszczuk, Grzegorz Pochwatko, Barbara Karpowicz, Stanisław Knapiński, Wiesław Kopeć
Title: Building Trustworthy Cognitive Monitoring for Safety-Critical Human Tasks: A Phased Methodological Approach
Abstract:
Operators performing high-stakes, safety-critical tasks - such as air traffic controllers, surgeons, or mission control personnel - must maintain exceptional cognitive performance under variable and often stressful conditions. This paper presents a phased methodological approach to building cognitive monitoring systems for such environments. By integrating insights from human factors research, simulation-based training, sensor technologies, and fundamental psychological principles, the proposed framework supports real-time performance assessment with minimum intrusion. The approach begins with simplified simulations and evolves towards operational contexts. Key challenges addressed include variability in workload, the effects of fatigue and stress, thus the need for adaptive monitoring for early warning support mechanisms. The methodology aims to improve situational awareness, reduce human error, and support decision-making without undermining operator autonomy. Ultimately, the work contributes to the development of resilient and transparent systems in domains where human performance is critical to safety.

Authors:Chenglei Si, Tatsunori Hashimoto, Diyi Yang
Title: The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas
Abstract:
Large Language Models (LLMs) have shown promise in accelerating the scientific research pipeline. A key capability for this process is the ability to generate novel research ideas, and prior studies have found settings in which LLM-generated research ideas were judged as more novel than human-expert ideas. However, a good idea should not simply appear to be novel, it should also result in better research after being executed. To test whether AI-generated ideas lead to better research outcomes, we conduct an execution study by recruiting 43 expert researchers to execute randomly-assigned ideas, either written by experts or generated by an LLM. Each expert spent over 100 hours implementing the idea and wrote a 4-page short paper to document the experiments. All the executed projects are then reviewed blindly by expert NLP researchers. Comparing the review scores of the same ideas before and after execution, the scores of the LLM-generated ideas decrease significantly more than expert-written ideas on all evaluation metrics (novelty, excitement, effectiveness, and overall; p < 0.05), closing the gap between LLM and human ideas observed at the ideation stage. When comparing the aggregated review scores from the execution study, we even observe that for many metrics there is a flip in rankings where human ideas score higher than LLM ideas. This ideation-execution gap highlights the limitations of current LLMs in generating truly effective research ideas and the challenge of evaluating research ideas in the absence of execution outcomes.

Authors:Andrei Lupu, Timon Willi, Jakob Foerster
Title: The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind
Abstract:
As Large Language Models (LLMs) gain agentic abilities, they will have to navigate complex multi-agent scenarios, interacting with human users and other agents in cooperative and competitive settings. This will require new reasoning skills, chief amongst them being theory of mind (ToM), or the ability to reason about the "mental" states of other agents. However, ToM and other multi-agent abilities in LLMs are poorly understood, since existing benchmarks suffer from narrow scope, data leakage, saturation, and lack of interactivity. We thus propose Decrypto, a game-based benchmark for multi-agent reasoning and ToM drawing inspiration from cognitive science, computational pragmatics and multi-agent reinforcement learning. It is designed to be as easy as possible in all other dimensions, eliminating confounding factors commonly found in other benchmarks. To our knowledge, it is also the first platform for designing interactive ToM experiments. We validate the benchmark design through comprehensive empirical evaluations of frontier LLMs, robustness studies, and human-AI cross-play experiments. We find that LLM game-playing abilities lag behind humans and simple word-embedding baselines. We then create variants of two classic cognitive science experiments within Decrypto to evaluate three key ToM abilities. Surprisingly, we find that state-of-the-art reasoning models are significantly worse at those tasks than their older counterparts. This demonstrates that Decrypto addresses a crucial gap in current reasoning and ToM evaluations, and paves the path towards better artificial agents.

Authors:Ryo Takahashi, Takashi Sato, Wakako Yukita, Tomoyuki Yokota, Takao Someya, Yoshihiro Kawahara
Title: Full-body WPT: wireless powering with meandered e-textiles
Abstract:
We present Full-body WPT, wireless power networking around the human body using a meandered textile coil. Unlike traditional inductive systems that emit strong fields into the deep tissue inside the body, the meander coil enables localized generation of strong magnetic field constrained to the skin surface, even when scaled to the size of the human body. Such localized inductive system enhances both safety and efficiency of wireless power around the body. Furthermore, the use of low-loss conductive yarn achieve energy-efficient and lightweight design. We analyze the performance of our design through simulations and experimental prototypes, demonstrating high power transfer efficiency and adaptability to user movement and posture. Our system provides a safe and efficient distributed power network using meandered textile coils integrated into wearable materials, highlighting the potential of body-centric wireless power networking as a foundational layer for ubiquitous health monitoring, augmented reality, and human-machine interaction systems.

Authors:Hao Guo, Wei Fan, Shaohui Liu, Feng Jiang, Chunzhi Yi
Title: PPTP: Performance-Guided Physiological Signal-Based Trust Prediction in Human-Robot Collaboration
Abstract:
Trust prediction is a key issue in human-robot collaboration, especially in construction scenarios where maintaining appropriate trust calibration is critical for safety and efficiency. This paper introduces the Performance-guided Physiological signal-based Trust Prediction (PPTP), a novel framework designed to improve trust assessment. We designed a human-robot construction scenario with three difficulty levels to induce different trust states. Our approach integrates synchronized multimodal physiological signals (ECG, GSR, and EMG) with collaboration performance evaluation to predict human trust levels. Individual physiological signals are processed using collaboration performance information as guiding cues, leveraging the standardized nature of collaboration performance to compensate for individual variations in physiological responses. Extensive experiments demonstrate the efficacy of our cross-modality fusion method in significantly improving trust classification performance. Our model achieves over 81% accuracy in three-level trust classification, outperforming the best baseline method by 6.7%, and notably reaches 74.3% accuracy in high-resolution seven-level classification, which is a first in trust prediction research. Ablation experiments further validate the superiority of physiological signal processing guided by collaboration performance assessment.

Authors:Zachary D King, Maryam Khalid, Han Yu, Kei Shibuya, Khadija Zanna, Marzieh Majd, Ryan L Brown, Yufei Shen, Thomas Vaessen, George Kypriotakis, Christopher P Fagundes, Akane Sano
Title: Machine Learning-based Context-Aware EMAs: An Offline Feasibility Study
Abstract:
Mobile health (mHealth) systems help researchers monitor and care for patients in real-world settings. Studies utilizing mHealth applications use Ecological Momentary Assessment (EMAs), passive sensing, and contextual features to develop emotion recognition models, which rely on EMA responses as ground truth. Due to this, it is crucial to consider EMA compliance when conducting a successful mHealth study. Utilizing machine learning is one approach that can solve this problem by sending EMAs based on the predicted likelihood of a response. However, literature suggests that this approach may lead to prompting participants more frequently during emotions associated with responsiveness, thereby narrowing the range of emotions collected. We propose a multi-objective function that utilizes machine learning to identify optimal times for sending EMAs. The function identifies optimal moments by combining predicted response likelihood with model uncertainty in emotion predictions. Uncertainty would lead the function to prioritize time points when the model is less confident, which often corresponds to underrepresented emotions. We demonstrate that this objective function would result in EMAs being sent when participants are responsive and experiencing less commonly observed emotions. The evaluation is conducted offline using two datasets: (1) 91 spousal caregivers of individuals with Alzheimer's Disease and Related dementias (ADRD), (2) 45 healthy participants. Results show that the multi-objective function tends to be higher when participants respond to EMAs and report less commonly observed emotions. This suggests that using the proposed objective function to guide EMA delivery could improve receptivity rates and capture a broader range of emotions.

Authors:Burcu Sayin, Ipek Baris Schlicht, Ngoc Vo Hong, Sara Allievi, Jacopo Staiano, Pasquale Minervini, Andrea Passerini
Title: MedSyn: Enhancing Diagnostics with Human-AI Collaboration
Abstract:
Clinical decision-making is inherently complex, often influenced by cognitive biases, incomplete information, and case ambiguity. Large Language Models (LLMs) have shown promise as tools for supporting clinical decision-making, yet their typical one-shot or limited-interaction usage may overlook the complexities of real-world medical practice. In this work, we propose a hybrid human-AI framework, MedSyn, where physicians and LLMs engage in multi-step, interactive dialogues to refine diagnoses and treatment decisions. Unlike static decision-support tools, MedSyn enables dynamic exchanges, allowing physicians to challenge LLM suggestions while the LLM highlights alternative perspectives. Through simulated physician-LLM interactions, we assess the potential of open-source LLMs as physician assistants. Results show open-source LLMs are promising as physician assistants in the real world. Future work will involve real physician interactions to further validate MedSyn's usefulness in diagnostic accuracy and patient outcomes.

Authors:Jacob Miller, Markus Wallinger, Ludwig Felder, Timo Brand, Henry Förster, Johannes Zink, Chunyang Chen, Stephen Kobourov
Title: Exploring MLLMs Perception of Network Visualization Principles
Abstract:
In this paper, we test whether Multimodal Large Language Models (MLLMs) can match human-subject performance in tasks involving the perception of properties in network layouts. Specifically, we replicate a human-subject experiment about perceiving quality (namely stress) in network layouts using GPT-4o and Gemini-2.5. Our experiments show that giving MLLMs exactly the same study information as trained human participants results in a similar performance to human experts and exceeds the performance of untrained non-experts. Additionally, we show that prompt engineering that deviates from the human-subject experiment can lead to better-than-human performance in some settings. Interestingly, like human subjects, the MLLMs seem to rely on visual proxies rather than computing the actual value of stress, indicating some sense or facsimile of perception. Explanations from the models provide descriptions similar to those used by the human participants (e.g., even distribution of nodes and uniform edge lengths).

Authors:Fan Lei, David A. Sampson, Jiayi Hong, Yuxin Ma, Giuseppe Mascaro, Dave White, Rimjhim Agarwal, Ross Maciejewski
Title: FEWSim: A Visual Analytic Framework for Exploring the Nexus of Food-Energy-Water Simulations
Abstract:
The interdependencies of food, energy, and water (FEW) systems create a nexus opportunity to explore the strengths and vulnerabilities of individual and cross-sector interactions within FEW systems. However, the variables quantifying nexus interactions are hard to observe, which hinders the cross-sector analysis. To overcome such challenges, we present FEWSim, a visual analytics framework designed to support domain experts in exploring and interpreting simulation results from a coupled FEW model. FEWSim employs a three-layer asynchronous architecture: the model layer integrates food, energy, and water models to simulate the FEW nexus; the middleware layer manages scenario configuration and execution; and the visualization layer provides interactive visual exploration of simulated time-series results across FEW sectors. The visualization layer further facilitates the exploration across multiple scenarios and evaluates scenario differences in performance using sustainability indices of the FEW nexus. We demonstrate the utility of FEWSim through a case study for the Phoenix Active Management Area (AMA) in Arizona.

Authors:Eun Som Jeon, Sinjini Mitra, Jisoo Lee, Omik M. Save, Ankita Shukla, Hyunglae Lee, Pavan Turaga
Title: Ground Reaction Force Estimation via Time-aware Knowledge Distillation
Abstract:
Human gait analysis with wearable sensors has been widely used in various applications, such as daily life healthcare, rehabilitation, physical therapy, and clinical diagnostics and monitoring. In particular, ground reaction force (GRF) provides critical information about how the body interacts with the ground during locomotion. Although instrumented treadmills have been widely used as the gold standard for measuring GRF during walking, their lack of portability and high cost make them impractical for many applications. As an alternative, low-cost, portable, wearable insole sensors have been utilized to measure GRF; however, these sensors are susceptible to noise and disturbance and are less accurate than treadmill measurements. To address these challenges, we propose a Time-aware Knowledge Distillation framework for GRF estimation from insole sensor data. This framework leverages similarity and temporal features within a mini-batch during the knowledge distillation process, effectively capturing the complementary relationships between features and the sequential properties of the target and input data. The performance of the lightweight models distilled through this framework was evaluated by comparing GRF estimations from insole sensor data against measurements from an instrumented treadmill. Empirical results demonstrated that Time-aware Knowledge Distillation outperforms current baselines in GRF estimation from wearable sensor data.

Authors:Haochen Song, Dominik Hofer, Rania Islambouli, Laura Hawkins, Ananya Bhattacharjee, Meredith Franklin, Joseph Jay Williams
Title: Investigating the Relationship Between Physical Activity and Tailored Behavior Change Messaging: Connecting Contextual Bandit with Large Language Models
Abstract:
Machine learning approaches, such as contextual multi-armed bandit (cMAB) algorithms, offer a promising strategy to reduce sedentary behavior by delivering personalized interventions to encourage physical activity. However, cMAB algorithms typically require large participant samples to learn effectively and may overlook key psychological factors that are not explicitly encoded in the model. In this study, we propose a hybrid approach that combines cMAB for selecting intervention types with large language models (LLMs) to personalize message content. We evaluate four intervention types: behavioral self-monitoring, gain-framed, loss-framed, and social comparison, each delivered as a motivational message aimed at increasing motivation for physical activity and daily step count. Message content is further personalized using dynamic contextual factors including daily fluctuations in self-efficacy, social influence, and regulatory focus. Over a seven-day trial, participants receive daily messages assigned by one of four models: cMAB alone, LLM alone, combined cMAB with LLM personalization (cMABxLLM), or equal randomization (RCT). Outcomes include daily step count and message acceptance, assessed via ecological momentary assessments (EMAs). We apply a causal inference framework to evaluate the effects of each model. Our findings offer new insights into the complementary roles of LLM-based personalization and cMAB adaptation in promoting physical activity through personalized behavioral messaging.

Authors:Shijing He, Yaxiong Lei, Xiao Zhan, Chi Zhang, Juan Ye, Ruba Abu-Salma, Jose Such
Title: Privacy Perspectives and Practices of Chinese Smart Home Product Teams
Abstract:
Previous research has explored the privacy needs and concerns of device owners, primary users, and different bystander groups with regard to smart home devices like security cameras, smart speakers, and hubs, but little is known about the privacy views and practices of smart home product teams, particularly those in non-Western contexts. This paper presents findings from 27 semi-structured interviews with Chinese smart home product team members, including product/project managers, software/hardware engineers, user experience (UX) designers, legal/privacy experts, and marketers/operation specialists. We examine their privacy perspectives, practices, and risk mitigation strategies. Our results show that participants emphasized compliance with Chinese data privacy laws, which typically prioritized national security over individual privacy rights. China-specific cultural, social, and legal factors also influenced participants' ethical considerations and attitudes toward balancing user privacy and security with convenience. Drawing on our findings, we propose a set of recommendations for smart home product teams, along with socio-technical and legal interventions to address smart home privacy issues-especially those belonging to at-risk groups-in Chinese multi-user smart homes.

Authors:Antonio Alvarez Valdivia, Benjamin A. Christie, Dylan P. Losey, Laura H. Blumenschein
Title: A Modular Haptic Display with Reconfigurable Signals for Personalized Information Transfer
Abstract:
We present a customizable soft haptic system that integrates modular hardware with an information-theoretic algorithm to personalize feedback for different users and tasks. Our platform features modular, multi-degree-of-freedom pneumatic displays, where different signal types, such as pressure, frequency, and contact area, can be activated or combined using fluidic logic circuits. These circuits simplify control by reducing reliance on specialized electronics and enabling coordinated actuation of multiple haptic elements through a compact set of inputs. Our approach allows rapid reconfiguration of haptic signal rendering through hardware-level logic switching without rewriting code. Personalization of the haptic interface is achieved through the combination of modular hardware and software-driven signal selection. To determine which display configurations will be most effective, we model haptic communication as a signal transmission problem, where an agent must convey latent information to the user. We formulate the optimization problem to identify the haptic hardware setup that maximizes the information transfer between the intended message and the user's interpretation, accounting for individual differences in sensitivity, preferences, and perceptual salience. We evaluate this framework through user studies where participants interact with reconfigurable displays under different signal combinations. Our findings support the role of modularity and personalization in creating multimodal haptic interfaces and advance the development of reconfigurable systems that adapt with users in dynamic human-machine interaction contexts.

Authors:Mohan Sunkara, Akshay Kolgar Nayak, Sandeep Kalari, Yash Prakash, Sampath Jayarathna, Hae-Na Lee, Vikas Ashok
Title: Adapting Online Customer Reviews for Blind Users: A Case Study of Restaurant Reviews
Abstract:
Online reviews have become an integral aspect of consumer decision-making on e-commerce websites, especially in the restaurant industry. Unlike sighted users who can visually skim through the reviews, perusing reviews remains challenging for blind users, who rely on screen reader assistive technology that supports predominantly one-dimensional narration of content via keyboard shortcuts. In an interview study, we uncovered numerous pain points of blind screen reader users with online restaurant reviews, notably, the listening fatigue and frustration after going through only the first few reviews. To address these issues, we developed QuickQue assistive tool that performs aspect-focused sentiment-driven summarization to reorganize the information in the reviews into an alternative, thematically-organized presentation that is conveniently perusable with a screen reader. At its core, QuickQue utilizes a large language model to perform aspect-based joint classification for grouping reviews, followed by focused summarizations within the groups to generate concise representations of reviewers' opinions, which are then presented to the screen reader users via an accessible interface. Evaluation of QuickQue in a user study with 10 participants showed significant improvements in overall usability and task workload compared to the status quo screen reader.

Authors:Zhanxin Hao, Haifeng Luo, Yongyi Chen, Yu Zhang
Title: Unpacking Graduate Students' Learning Experience with Generative AI Teaching Assistant in A Quantitative Methodology Course
Abstract:
The study was conducted in an Advanced Quantitative Research Methods course involving 20 graduate students. During the course, student inquiries made to the AI were recorded and coded using Bloom's taxonomy and the CLEAR framework. A series of independent sample t-tests and poisson regression analyses were employed to analyse the characteristics of different questions asked by students with different backgrounds. Post course interviews were conducted with 10 students to gain deeper insights into their perceptions. The findings revealed a U-shaped pattern in students' use of the AI assistant, with higher usage at the beginning and towards the end of the course, and a decrease in usage during the middle weeks. Most questions posed to the AI focused on knowledge and comprehension levels, with fewer questions involving deeper cognitive thinking. Students with a weaker mathematical foundation used the AI assistant more frequently, though their inquiries tended to lack explicit and logical structure compared to those with a strong mathematical foundation, who engaged less with the tool. These patterns suggest the need for targeted guidance to optimise the effectiveness of AI tools for students with varying levels of academic proficiency.

Authors:Mingqian Zheng, Wenjia Hu, Patrick Zhao, Motahhare Eslami, Jena D. Hwang, Faeze Brahman, Carolyn Rose, Maarten Sap
Title: Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences
Abstract:
Current LLMs are trained to refuse potentially harmful input queries regardless of whether users actually had harmful intents, causing a tradeoff between safety and user experience. Through a study of 480 participants evaluating 3,840 query-response pairs, we examine how different refusal strategies affect user perceptions across varying motivations. Our findings reveal that response strategy largely shapes user experience, while actual user motivation has negligible impact. Partial compliance -- providing general information without actionable details -- emerges as the optimal strategy, reducing negative user perceptions by over 50% to flat-out refusals. Complementing this, we analyze response patterns of 9 state-of-the-art LLMs and evaluate how 6 reward models score different refusal strategies, demonstrating that models rarely deploy partial compliance naturally and reward models currently undervalue it. This work demonstrates that effective guardrails require focusing on crafting thoughtful refusals rather than detecting intent, offering a path toward AI safety mechanisms that ensure both safety and sustained user engagement.

Authors:Guanren Qiao, Sixu Lin, Ronglai Zuo, Zhizheng Wu, Kui Jia, Guiliang Liu
Title: SignBot: Learning Human-to-Humanoid Sign Language Interaction
Abstract:
Sign language is a natural and visual form of language that uses movements and expressions to convey meaning, serving as a crucial means of communication for individuals who are deaf or hard-of-hearing (DHH). However, the number of people proficient in sign language remains limited, highlighting the need for technological advancements to bridge communication gaps and foster interactions with minorities. Based on recent advancements in embodied humanoid robots, we propose SignBot, a novel framework for human-robot sign language interaction. SignBot integrates a cerebellum-inspired motion control component and a cerebral-oriented module for comprehension and interaction. Specifically, SignBot consists of: 1) Motion Retargeting, which converts human sign language datasets into robot-compatible kinematics; 2) Motion Control, which leverages a learning-based paradigm to develop a robust humanoid control policy for tracking sign language gestures; and 3) Generative Interaction, which incorporates translator, responser, and generator of sign language, thereby enabling natural and effective communication between robots and humans. Simulation and real-world experimental results demonstrate that SignBot can effectively facilitate human-robot interaction and perform sign language motions with diverse robots and datasets. SignBot represents a significant advancement in automatic sign language interaction on embodied humanoid robot platforms, providing a promising solution to improve communication accessibility for the DHH community.

Authors:Amanda Chan, Catherine Di, Joseph Rupertus, Gary Smith, Varun Nagaraj Rao, Manoel Horta Ribeiro, Andrés Monroy-Hernández
Title: Redefining Research Crowdsourcing: Incorporating Human Feedback with LLM-Powered Digital Twins
Abstract:
Crowd work platforms like Amazon Mechanical Turk and Prolific are vital for research, yet workers' growing use of generative AI tools poses challenges. Researchers face compromised data validity as AI responses replace authentic human behavior, while workers risk diminished roles as AI automates tasks. To address this, we propose a hybrid framework using digital twins, personalized AI models that emulate workers' behaviors and preferences while keeping humans in the loop. We evaluate our system with an experiment (n=88 crowd workers) and in-depth interviews with crowd workers (n=5) and social science researchers (n=4). Our results suggest that digital twins may enhance productivity and reduce decision fatigue while maintaining response quality. Both researchers and workers emphasized the importance of transparency, ethical data use, and worker agency. By automating repetitive tasks and preserving human engagement for nuanced ones, digital twins may help balance scalability with authenticity.

Authors:Pranav Rao, Maryam Taj, Alex Mariakakis, Joseph Jay Williams, Ananya Bhattacharjee
Title: Fitting the Message to the Moment: Designing Calendar-Aware Stress Messaging with Large Language Models
Abstract:
Existing stress-management tools fail to account for the timing and contextual specificity of students' daily lives, often providing static or misaligned support. Digital calendars contain rich, personal indicators of upcoming responsibilities, yet this data is rarely leveraged for adaptive wellbeing interventions. In this short paper, we explore how large language models (LLMs) might use digital calendar data to deliver timely and personalized stress support. We conducted a one-week study with eight university students using a functional technology probe that generated daily stress-management messages based on participants' calendar events. Through semi-structured interviews and thematic analysis, we found that participants valued interventions that prioritized stressful events and adopted a concise, but colloquial tone. These findings reveal key design implications for LLM-based stress-management tools, including the need for structured questioning and tone calibration to foster relevance and trust.

Authors:Qi Gao, Wei Xu, Hanxi Pan, Mowei Shen, Zaifeng Gao
Title: Human-Centered Human-AI Collaboration (HCHAC)
Abstract:
In the intelligent era, the interaction between humans and intelligent systems fundamentally involves collaboration with autonomous intelligent agents. Human-AI Collaboration (HAC) represents a novel type of human-machine relationship facilitated by autonomous intelligent machines equipped with AI technologies. In this paradigm, AI agents serve not only as auxiliary tools but also as active teammates, partnering with humans to accomplish tasks collaboratively. Human-centered AI (HCAI) emphasizes that humans play critical leadership roles in the collaboration. This human-led collaboration imparts new dimensions to the human-machine relationship, necessitating innovative research perspectives, paradigms, and agenda to address the unique challenges posed by HAC. This chapter delves into the essence of HAC from the human-centered perspective, outlining its core concepts and distinguishing features. It reviews the current research methodologies and research agenda within the HAC field from the HCAI perspective, highlighting advancements and ongoing studies. Furthermore, a framework for human-centered HAC (HCHAC) is proposed by integrating these reviews and analyses. A case study of HAC in the context of autonomous vehicles is provided, illustrating practical applications and the synergistic interactions between humans and AI agents. Finally, it identifies potential future research directions aimed at enhancing the effectiveness, reliability, and ethical integration of human-centered HAC systems in diverse domains.

Authors:Jeongwon Jo, He Zhang, Jie Cai, Nitesh Goyal
Title: AI Trust Reshaping Administrative Burdens: Understanding Trust-Burden Dynamics in LLM-Assisted Benefits Systems
Abstract:
Supplemental Nutrition Assistance Program (SNAP) is an essential benefit support system provided by the US administration to 41 million federally determined low-income applicants. Through interviews with such applicants across a diverse set of experiences with the SNAP system, our findings reveal that new AI technologies like LLMs can alleviate traditional burdens but also introduce new burdens. We introduce new types of learning, compliance, and psychological costs that transform the administrative burden on applicants. We also identify how trust in AI across three dimensions--competence, integrity, and benevolence--is perceived to reduce administrative burdens, which may stem from unintended and untoward overt trust in the system. We discuss calibrating appropriate levels of user trust in LLM-based administrative systems, mitigating newly introduced burdens. In particular, our findings suggest that evidence-based information disclosure is necessary in benefits administration and propose directions for future research on trust-burden dynamics in AI-assisted administration systems.

Authors:Ziyun Zhang, Xinyi Liu, Xiaoyi Zhang, Jun Wang, Gang Chen, Yan Lu
Title: UI-Evol: Automatic Knowledge Evolving for Computer Use Agents
Abstract:
External knowledge has played a crucial role in the recent development of computer use agents. We identify a critical knowledge-execution gap: retrieved knowledge often fails to translate into effective real-world task execution. Our analysis shows even 90\% correct knowledge yields only 41\% execution success rate. To bridge this gap, we propose UI-Evol, a plug-and-play module for autonomous GUI knowledge evolution. UI-Evol consists of two stages: a Retrace Stage that extracts faithful objective action sequences from actual agent-environment interactions, and a Critique Stage that refines existing knowledge by comparing these sequences against external references. We conduct comprehensive experiments on the OSWorld benchmark with the state-of-the-art Agent S2. Our results demonstrate that UI-Evol not only significantly boosts task performance but also addresses a previously overlooked issue of high behavioral standard deviation in computer use agents, leading to superior performance on computer use tasks and substantially improved agent reliability.

Authors:Angie Zhang, Madison Liao, Elizaveta, Kravchenko, Marshanah Taylor, Angela Haddad, Chandra Bhat, S. Craig Watkins, Min Kyung Lee
Title: Data and Technology for Equitable Public Administration: Understanding City Government Employees' Challenges and Needs
Abstract:
City governments in the United States are increasingly pressured to adopt emerging technologies. Yet, these systems often risk biased and disparate outcomes. Scholars studying public sector technology design have converged on the need to ground these systems in the goals and organizational contexts of employees using them. We expand our understanding of employees' contexts by focusing on the equity practices of city government employees to surface important equity considerations around public sector data and technology use. Through semi-structured interviews with thirty-six employees from ten departments of a U.S. city government, our findings reveal challenges employees face when operationalizing equity, perspectives on data needs for advancing equity goals, and the design space for acceptable government technology. We discuss what it looks like to foreground equity in data use and technology design, and considerations for how to support city government employees in operationalizing equity with and without official equity offices.

Authors:Aayushi Dangol, Robert Wolfe, Daeun Yoo, Arya Thiruvillakkat, Ben Chickadel, Julie A. Kientz
Title: "If anybody finds out you are in BIG TROUBLE": Understanding Children's Hopes, Fears, and Evaluations of Generative AI
Abstract:
As generative artificial intelligence (genAI) increasingly mediates how children learn, communicate, and engage with digital content, understanding children's hopes and fears about this emerging technology is crucial. In a pilot study with 37 fifth-graders, we explored how children (ages 9-10) envision genAI and the roles they believe it should play in their daily life. Our findings reveal three key ways children envision genAI: as a companion providing guidance, a collaborator working alongside them, and a task automator that offloads responsibilities. However, alongside these hopeful views, children expressed fears about overreliance, particularly in academic settings, linking it to fears of diminished learning, disciplinary consequences, and long-term failure. This study highlights the need for child-centric AI design that balances these tensions, empowering children with the skills to critically engage with and navigate their evolving relationships with digital technologies.

Authors:Aayushi Dangol, Runhua Zhao, Robert Wolfe, Trushaa Ramanan, Julie A. Kientz, Jason Yip
Title: "AI just keeps guessing": Using ARC Puzzles to Help Children Identify Reasoning Errors in Generative AI
Abstract:
The integration of generative Artificial Intelligence (genAI) into everyday life raises questions about the competencies required to critically engage with these technologies. Unlike visual errors in genAI, textual mistakes are often harder to detect and require specific domain knowledge. Furthermore, AI's authoritative tone and structured responses can create an illusion of correctness, leading to overtrust, especially among children. To address this, we developed AI Puzzlers, an interactive system based on the Abstraction and Reasoning Corpus (ARC), to help children identify and analyze errors in genAI. Drawing on Mayer & Moreno's Cognitive Theory of Multimedia Learning, AI Puzzlers uses visual and verbal elements to reduce cognitive overload and support error detection. Based on two participatory design sessions with 21 children (ages 6 - 11), our findings provide both design insights and an empirical understanding of how children identify errors in genAI reasoning, develop strategies for navigating these errors, and evaluate AI outputs.

Authors:Aayushi Dangol, Robert Wolfe, Runhua Zhao, JaeWon Kim, Trushaa Ramanan, Katie Davis, Julie A. Kientz
Title: Children's Mental Models of AI Reasoning: Implications for AI Literacy Education
Abstract:
As artificial intelligence (AI) advances in reasoning capabilities, most recently with the emergence of Large Reasoning Models (LRMs), understanding how children conceptualize AI's reasoning processes becomes critical for fostering AI literacy. While one of the "Five Big Ideas" in AI education highlights reasoning algorithms as central to AI decision-making, less is known about children's mental models in this area. Through a two-phase approach, consisting of a co-design session with 8 children followed by a field study with 106 children (grades 3-8), we identified three models of AI reasoning: Deductive, Inductive, and Inherent. Our findings reveal that younger children (grades 3-5) often attribute AI's reasoning to inherent intelligence, while older children (grades 6-8) recognize AI as a pattern recognizer. We highlight three tensions that surfaced in children's understanding of AI reasoning and conclude with implications for scaffolding AI curricula and designing explainable AI tools.

Authors:Huimin Xu, Houjiang Liu, Yan Leng, Ying Ding
Title: Adapting to LLMs: How Insiders and Outsiders Reshape Scientific Knowledge Production
Abstract:
CSCW has long examined how emerging technologies reshape the ways researchers collaborate and produce knowledge, with scientific knowledge production as a central area of focus. As AI becomes increasingly integrated into scientific research, understanding how researchers adapt to it reveals timely opportunities for CSCW research -- particularly in supporting new forms of collaboration, knowledge practices, and infrastructure in AI-driven science. This study quantifies LLM impacts on scientific knowledge production based on an evaluation workflow that combines an insider-outsider perspective with a knowledge production framework. Our findings reveal how LLMs catalyze both innovation and reorganization in scientific communities, offering insights into the broader transformation of knowledge production in the age of generative AI and sheds light on new research opportunities in CSCW.

Authors:Yimeng Liu, Misha Sra
Title: Designing Scaffolded Interfaces for Enhanced Learning and Performance in Professional Software
Abstract:
Professional software offers immense power but also presents significant learning challenges. Its complex interfaces, as well as insufficient built-in structured guidance and unfamiliar terminology, often make newcomers struggle with task completion. To address these challenges, we introduce ScaffoldUI, a method for scaffolded interface design to reduce interface complexity, provide structured guidance, and enhance software learnability. The scaffolded interface presents task-relevant tools, progressively discloses tool complexity, and organizes tools based on domain concepts, aiming to assist task performance and software learning. To evaluate the feasibility of our interface design method, we present a technical pipeline for scaffolded interface implementation in professional 3D software, i.e., Blender, and conduct user studies with beginners (N=32) and experts (N=8). Study results demonstrate that our scaffolded interfaces significantly reduce perceived task load caused by interface complexity, support task performance through structured guidance, and augment learning by clearly connecting concepts and tools within the taskflow context. Based on a discussion of the user study findings, we offer insights for future research on designing scaffolded interfaces to support instruction, productivity, creativity, and cross-software workflows.

Authors:Christian Schütze, Birte Richter, Britta Wrede
Title: Emotion-sensitive Explanation Model
Abstract:
Explainable AI (XAI) research has traditionally focused on rational users, aiming to improve understanding and reduce cognitive biases. However, emotional factors play a critical role in how explanations are perceived and processed. Prior work shows that prior and task-generated emotions can negatively impact the understanding of explanation. Building on these insights, we propose a three-stage model for emotion-sensitive explanation grounding: (1) emotional or epistemic arousal, (2) understanding, and (3) agreement. This model provides a conceptual basis for developing XAI systems that dynamically adapt explanation strategies to users emotional states, ultimately supporting more effective and user-centered decision-making.

Authors:Birte Richter, Christian Schütze, Anna Aksonova, Britta Wrede
Title: Influence of prior and task generated emotions on XAI explanation retention and understanding
Abstract:
The explanation of AI results and how they are received by users is an increasingly active research field. However, there is a surprising lack of knowledge about how social factors such as emotions affect the process of explanation by a decision support system (DSS). While previous research has shown effects of emotions on DSS supported decision-making, it remains unknown in how far emotions affect cognitive processing during an explanation. In this study, we, therefore, investigated the influence of prior emotions and task-related arousal on the retention and understanding of explained feature relevance. To investigate the influence of prior emotions, we induced happiness and fear prior to the decision support interaction. Before emotion induction, user characteristics to assess their risk type were collected via a questionnaire. To identify emotional reactions to the explanations of the relevance of different features, we observed heart rate variability (HRV), facial expressions, and self-reported emotions of the explainee while observing and listening to the explanation and assessed their retention of the features as well as their influence on the outcome of the decision task. Results indicate that (1) task-unrelated prior emotions do not affected the ratantion but may affect the understanding of the relevance of certain features in the sense of an emotion-induced confirmation bias, (2) certain features related to personal attitudes yielded arousal in individual participants, (3) this arousal affected the understanding of these variables.

Authors:Maurice Chiodo, Dennis Müller, Paul Siewert, Jean-Luc Wetherall, Zoya Yasmine, John Burden
Title: Formalising Human-in-the-Loop: Computational Reductions, Failure Modes, and Legal-Moral Responsibility
Abstract:
The legal compliance and safety of different Human-in-the-loop (HITL) setups for AI can vary greatly. This manuscript aims to identify new ways of choosing between such setups, and shows that there is an unavoidable trade-off between the attribution of legal responsibility and the technical explainability of AI. We begin by using the notion of oracle machines from computability theory to formalise different HITL setups, distinguishing between trivial human monitoring, single endpoint human action, and highly involved interaction between the human(s) and the AI. These correspond to total functions, many-one reductions, and Turing reductions respectively. A taxonomy categorising HITL failure modes is then presented, highlighting the limitations on what any HITL setup can actually achieve. Our approach then identifies oversights from UK and EU legal frameworks, which focus on certain HITL setups which may not always achieve the desired ethical, legal, and sociotechnical outcomes. We suggest areas where the law should recognise the effectiveness of different HITL setups and assign responsibility in these contexts, avoiding unnecessary and unproductive human "scapegoating". Overall, we show how HITL setups involve many technical design decisions, and can be prone to failures which are often out of the humans' control. This opens up a new analytic perspective on the challenges arising in the creation of HITL setups, helping inform AI developers and lawmakers on designing HITL to better achieve their desired outcomes.

Authors:Varun Nagaraj Rao, Samantha Dalal, Andrew Schwartz, Amna Liaqat, Dana Calacci, Andrés Monroy-Hernández
Title: FareShare: A Tool for Labor Organizers to Estimate Lost Wages and Contest Arbitrary AI and Algorithmic Deactivations
Abstract:
What happens when a rideshare driver is suddenly locked out of the platform connecting them to riders, wages, and daily work? Deactivation-the abrupt removal of gig workers' platform access-typically occurs through arbitrary AI and algorithmic decisions with little explanation or recourse. This represents one of the most severe forms of algorithmic control and often devastates workers' financial stability. Recent U.S. state policies now mandate appeals processes and recovering compensation during the period of wrongful deactivation based on past earnings. Yet, labor organizers still lack effective tools to support these complex, error-prone workflows. We designed FareShare, a computational tool automating lost wage estimation for deactivated drivers, through a 6 month partnership with the State of Washington's largest rideshare labor union. Over the following 3 months, our field deployment of FareShare registered 178 account signups. We observed that the tool could reduce lost wage calculation time by over 95%, eliminate manual data entry errors, and enable legal teams to generate arbitration-ready reports more efficiently. Beyond these gains, the deployment also surfaced important socio-technical challenges around trust, consent, and tool adoption in high-stakes labor contexts.

Authors:Sutapa Dey Tithi, Xiaoyi Tian, Min Chi, Tiffany Barnes
Title: Investigating the Impact and Student Perceptions of Guided Parsons Problems for Learning Logic with Subgoals
Abstract:
Parsons problems (PPs) have shown promise in structured problem solving by providing scaffolding that decomposes the problem and requires learners to reconstruct the solution. However, some students face difficulties when first learning with PPs or solving more complex Parsons problems. This study introduces Guided Parsons problems (GPPs) designed to provide step-specific hints and improve learning outcomes in an intelligent logic tutor. In a controlled experiment with 76 participants, GPP students achieved significantly higher accuracy of rule application in both level-end tests and post-tests, with the strongest gains among students with lower prior knowledge. GPP students initially spent more time in training (1.52 vs. 0.81 hours) but required less time for post-tests, indicating improved problem solving efficiency. Our thematic analysis of GPP student self-explanations revealed task decomposition, better rule understanding, and reduced difficulty as key themes, while some students felt the structured nature of GPPs restricted their own way of reasoning. These findings reinforce that GPPs can effectively combine the benefits of worked examples and problem solving practice, but could be further improved by individual adaptation.

Authors:Tuochao Chen, Nicholas Batchelder, Alisa Liu, Noah Smith, Shyamnath Gollakota
Title: LLAMAPIE: Proactive In-Ear Conversation Assistants
Abstract:
We introduce LlamaPIE, the first real-time proactive assistant designed to enhance human conversations through discreet, concise guidance delivered via hearable devices. Unlike traditional language models that require explicit user invocation, this assistant operates in the background, anticipating user needs without interrupting conversations. We address several challenges, including determining when to respond, crafting concise responses that enhance conversations, leveraging knowledge of the user for context-aware assistance, and real-time, on-device processing. To achieve this, we construct a semi-synthetic dialogue dataset and propose a two-model pipeline: a small model that decides when to respond and a larger model that generates the response. We evaluate our approach on real-world datasets, demonstrating its effectiveness in providing helpful, unobtrusive assistance. User studies with our assistant, implemented on Apple Silicon M2 hardware, show a strong preference for the proactive assistant over both a baseline with no assistance and a reactive model, highlighting the potential of LlamaPie to enhance live conversations.

Authors:Barak Gahtan, Sanketh Vedula, Gil Samuelly Leichtag, Einat Kodesh, Alex M. Bronstein
Title: From Lab to Wrist: Bridging Metabolic Monitoring and Consumer Wearables for Heart Rate and Oxygen Consumption Modeling
Abstract:
Understanding physiological responses during running is critical for performance optimization, tailored training prescriptions, and athlete health management. We introduce a comprehensive framework -- what we believe to be the first capable of predicting instantaneous oxygen consumption (VO$_{2}$) trajectories exclusively from consumer-grade wearable data. Our approach employs two complementary physiological models: (1) accurate modeling of heart rate (HR) dynamics via a physiologically constrained ordinary differential equation (ODE) and neural Kalman filter, trained on over 3 million HR observations, achieving 1-second interval predictions with mean absolute errors as low as 2.81\,bpm (correlation 0.87); and (2) leveraging the principles of precise HR modeling, a novel VO$_{2}$ prediction architecture requiring only the initial second of VO$_{2}$ data for calibration, enabling robust, sequence-to-sequence metabolic demand estimation. Despite relying solely on smartwatch and chest-strap data, our method achieves mean absolute percentage errors of approximately 13\%, effectively capturing rapid physiological transitions and steady-state conditions across diverse running intensities. Our synchronized dataset, complemented by blood lactate measurements, further lays the foundation for future noninvasive metabolic zone identification. By embedding physiological constraints within modern machine learning, this framework democratizes advanced metabolic monitoring, bridging laboratory-grade accuracy and everyday accessibility, thus empowering both elite athletes and recreational fitness enthusiasts.

Authors:Kaylea Champion, Benjamin Mako Hill
Title: Countering underproduction of peer produced goods
Abstract:
Peer produced goods such as online knowledge bases and free/libre open source software rely on contributors who often choose their tasks regardless of consumer needs. These goods are susceptible to underproduction: when popular goods are relatively low quality. Although underproduction is a common feature of peer production, very little is known about how to counteract it. We use a detailed longitudinal dataset from English Wikipedia to show that more experienced contributors -- including those who contribute without an account -- tend to contribute to underproduced goods. A within-person analysis shows that contributors' efforts shift toward underproduced goods over time. These findings illustrate the value of retaining contributors in peer production, including those contributing without accounts, as a means to counter underproduction.

Authors:Zesheng Wang, Alexandre Bruckert, Patrick Le Callet, Guangtao Zhai
Title: Efficient Listener: Dyadic Facial Motion Synthesis via Action Diffusion
Abstract:
Generating realistic listener facial motions in dyadic conversations remains challenging due to the high-dimensional action space and temporal dependency requirements. Existing approaches usually consider extracting 3D Morphable Model (3DMM) coefficients and modeling in the 3DMM space. However, this makes the computational speed of the 3DMM a bottleneck, making it difficult to achieve real-time interactive responses. To tackle this problem, we propose Facial Action Diffusion (FAD), which introduces the diffusion methods from the field of image generation to achieve efficient facial action generation. We further build the Efficient Listener Network (ELNet) specially designed to accommodate both the visual and audio information of the speaker as input. Considering of FAD and ELNet, the proposed method learns effective listener facial motion representations and leads to improvements of performance over the state-of-the-art methods while reducing 99% computational time.

Authors:William P. McCarthy, Saujas Vaduguru, Karl D. D. Willis, Justin Matejka, Judith E. Fan, Daniel Fried, Yewen Pu
Title: mrCAD: Multimodal Refinement of Computer-aided Designs
Abstract:
A key feature of human collaboration is the ability to iteratively refine the concepts we have communicated. In contrast, while generative AI excels at the \textit{generation} of content, it often struggles to make specific language-guided \textit{modifications} of its prior outputs. To bridge the gap between how humans and machines perform edits, we present mrCAD, a dataset of multimodal instructions in a communication game. In each game, players created computer aided designs (CADs) and refined them over several rounds to match specific target designs. Only one player, the Designer, could see the target, and they must instruct the other player, the Maker, using text, drawing, or a combination of modalities. mrCAD consists of 6,082 communication games, 15,163 instruction-execution rounds, played between 1,092 pairs of human players. We analyze the dataset and find that generation and refinement instructions differ in their composition of drawing and text. Using the mrCAD task as a benchmark, we find that state-of-the-art VLMs are better at following generation instructions than refinement instructions. These results lay a foundation for analyzing and modeling a multimodal language of refinement that is not represented in previous datasets.

Authors:Mohit Chandra, Javier Hernandez, Gonzalo Ramos, Mahsa Ershadi, Ananya Bhattacharjee, Judith Amores, Ebele Okoli, Ann Paradiso, Shahed Warreth, Jina Suh
Title: Longitudinal Study on Social and Emotional Use of AI Conversational Agent
Abstract:
Development in digital technologies has continuously reshaped how individuals seek and receive social and emotional support. While online platforms and communities have long served this need, the increased integration of general-purpose conversational AI into daily lives has introduced new dynamics in how support is provided and experienced. Existing research has highlighted both benefits (e.g., wider access to well-being resources) and potential risks (e.g., over-reliance) of using AI for support seeking. In this five-week, exploratory study, we recruited 149 participants divided into two usage groups: a baseline usage group (BU, n=60) that used the internet and AI as usual, and an active usage group (AU, n=89) encouraged to use one of four commercially available AI tools (Microsoft Copilot, Google Gemini, PI AI, ChatGPT) for social and emotional interactions. Our analysis revealed significant increases in perceived attachment towards AI (32.99 percentage points), perceived AI empathy (25.8 p.p.), and motivation to use AI for entertainment (22.90 p.p.) among the AU group. We also observed that individual differences (e.g., gender identity, prior AI usage) influenced perceptions of AI empathy and attachment. Lastly, the AU group expressed higher comfort in seeking personal help, managing stress, obtaining social support, and talking about health with AI, indicating potential for broader emotional support while highlighting the need for safeguards against problematic usage. Overall, our exploratory findings underscore the importance of developing consumer-facing AI tools that support emotional well-being responsibly, while empowering users to understand the limitations of these tools.

Authors:Yue Fu, Alexis Hiniker
Title: Supporting Students' Reading and Cognition with AI
Abstract:
With the rapid adoption of AI tools in learning contexts, it is vital to understand how these systems shape users' reading processes and cognitive engagement. We collected and analyzed text from 124 sessions with AI tools, in which students used these tools to support them as they read assigned readings for an undergraduate course. We categorized participants' prompts to AI according to Bloom's Taxonomy of educational objectives -- Remembering, Understanding, Applying, Analyzing, Evaluating. Our results show that ``Analyzing'' and ``Evaluating'' are more prevalent in users' second and third prompts within a single usage session, suggesting a shift toward higher-order thinking. However, in reviewing users' engagement with AI tools over several weeks, we found that users converge toward passive reading engagement over time. Based on these results, we propose design implications for future AI reading-support systems, including structured scaffolds for lower-level cognitive tasks (e.g., recalling terms) and proactive prompts that encourage higher-order thinking (e.g., analyzing, applying, evaluating). Additionally, we advocate for adaptive, human-in-the-loop features that allow students and instructors to tailor their reading experiences with AI, balancing efficiency with enriched cognitive engagement. Our paper expands the dialogue on integrating AI into academic reading, highlighting both its potential benefits and challenges.

Authors:Shayla Sharmin, Roghayeh Leila Barmaki
Title: Hybrid Deep Learning Model to Estimate Cognitive Effort from fNIRS Signals in Educational Game Playing
Abstract:
This study estimates cognitive effort (CE) based on functional near-infrared spectroscopy (fNIRS) data and performance scores using a hybrid deep learning model. The estimation of CE enables educators to modify material to enhance learning effectiveness and student engagement. Relative neural efficiency (RNE) and relative neural involvement (RNI) are two metrics that have been used to represent CE. To estimate RNE and RNI we need hemodynamic response in the brain and the performance score of a task.We collected oxygenated hemoglobin ($Δ\mathrm{HbO}$). Sixteen participants answered 16 questions in a unity-based educational game, each with a 30-second response time. We used deep learning models to predict the performance score and estimate RNE and RNI to understand CE. The study compares traditional machine learning techniques with deep learning models such as CNN, LSTM, BiLSTM, and a hybrid CNN-GRU to determine which approach provides better accuracy in predicting performance scores. The result shows that the hybrid CNN-GRU gives better performance with 78.36\% training accuracy and 73.08\% test accuracy than other models. We performed XGBoost on the extracted GRU feature and got the highest accuracy (69.23\%). This suggests that the features learned from this hybrid model generalize better even in traditional machine learning algorithms. We used the $Δ\mathrm{HbO}$ and predicted score to calculate RNE and RNI to observe cognitive effort in our four test cases. Our result shows that even with moderate accuracy, the predicted RNE and RNI closely follows the actual trends. we also observed that when participants were in a state of high CE, introducing rest led decrease of CE. These findings can be helpful to design and improve learning environments and provide valuable insights in learning materials.

Authors:Andrew Silva, Pradyumna Tambwekar, Mariah Schrum, Matthew Gombolay
Title: Towards Balancing Preference and Performance through Adaptive Personalized Explainability
Abstract:
As robots and digital assistants are deployed in the real world, these agents must be able to communicate their decision-making criteria to build trust, improve human-robot teaming, and enable collaboration. While the field of explainable artificial intelligence (xAI) has made great strides to enable such communication, these advances often assume that one xAI approach is ideally suited to each problem (e.g., decision trees to explain how to triage patients in an emergency or feature-importance maps to explain radiology reports). This fails to recognize that users have diverse experiences or preferences for interaction modalities. In this work, we present two user-studies set in a simulated autonomous vehicle (AV) domain. We investigate (1) population-level preferences for xAI and (2) personalization strategies for providing robot explanations. We find significant differences between xAI modes (language explanations, feature-importance maps, and decision trees) in both preference (p < 0.01) and performance (p < 0.05). We also observe that a participant's preferences do not always align with their performance, motivating our development of an adaptive personalization strategy to balance the two. We show that this strategy yields significant performance gains (p < 0.05), and we conclude with a discussion of our findings and implications for xAI in human-robot interactions.

Authors:Isabel O. Gallegos, Chen Shani, Weiyan Shi, Federico Bianchi, Izzy Gainsburg, Dan Jurafsky, Robb Willer
Title: Labeling Messages as AI-Generated Does Not Reduce Their Persuasive Effects
Abstract:
As generative artificial intelligence (AI) enables the creation and dissemination of information at massive scale and speed, it is increasingly important to understand how people perceive AI-generated content. One prominent policy proposal requires explicitly labeling AI-generated content to increase transparency and encourage critical thinking about the information, but prior research has not yet tested the effects of such labels. To address this gap, we conducted a survey experiment (N=1601) on a diverse sample of Americans, presenting participants with an AI-generated message about several public policies (e.g., allowing colleges to pay student-athletes), randomly assigning whether participants were told the message was generated by (a) an expert AI model, (b) a human policy expert, or (c) no label. We found that messages were generally persuasive, influencing participants' views of the policies by 9.74 percentage points on average. However, while 94.6% of participants assigned to the AI and human label conditions believed the authorship labels, labels had no significant effects on participants' attitude change toward the policies, judgments of message accuracy, nor intentions to share the message with others. These patterns were robust across a variety of participant characteristics, including prior knowledge of the policy, prior experience with AI, political party, education level, or age. Taken together, these results imply that, while authorship labels would likely enhance transparency, they are unlikely to substantially affect the persuasiveness of the labeled content, highlighting the need for alternative strategies to address challenges posed by AI-generated information.

Authors:Mona Bielig, Florian Kutzner, Sonja Klingert, Celina Kacperski
Title: Understanding Intention to Adopt Smart Thermostats: The Role of Individual Predictors and Social Beliefs Across Five EU Countries
Abstract:
Heating of buildings represents a significant share of the energy consumption in Europe. Smart thermostats that capitalize on the data-driven analysis of heating patterns in order to optimize heat supply are a very promising part of building energy management technology. However, factors driving their acceptance by building inhabitants are poorly understood although being a prerequisite for fully tapping on their potential. In order to understand the driving forces of technology adoption in this use case, a large survey (N = 2250) was conducted in five EU countries (Austria, Belgium, Estonia, Germany, Greece). For the data analysis structural equation modelling based on the Unified Theory of Acceptance and Use of Technology (UTAUT) was employed, which was extended by adding social beliefs, including descriptive social norms, collective efficacy, social identity and trust. As a result, performance expectancy, price value, and effort expectancy proved to be the most important predictors overall, with variations across countries. In sum, the adoption of smart thermostats appears more strongly associated with individual beliefs about their functioning, potentially reducing their adoption. At the end of the paper, implications for policy making and marketing of smart heating technologies are discussed.

Authors:Venkatesh Sivaraman, Katelyn Morrison, Will Epperson, Adam Perer
Title: Over-Relying on Reliance: Towards Realistic Evaluations of AI-Based Clinical Decision Support
Abstract:
As AI-based clinical decision support (AI-CDS) is introduced in more and more aspects of healthcare services, HCI research plays an increasingly important role in designing for complementarity between AI and clinicians. However, current evaluations of AI-CDS often fail to capture when AI is and is not useful to clinicians. This position paper reflects on our work and influential AI-CDS literature to advocate for moving beyond evaluation metrics like Trust, Reliance, Acceptance, and Performance on the AI's task (what we term the "trap" of human-AI collaboration). Although these metrics can be meaningful in some simple scenarios, we argue that optimizing for them ignores important ways that AI falls short of clinical benefit, as well as ways that clinicians successfully use AI. As the fields of HCI and AI in healthcare develop new ways to design and evaluate CDS tools, we call on the community to prioritize ecologically valid, domain-appropriate study setups that measure the emergent forms of value that AI can bring to healthcare professionals.

Authors:Julian Leichert, Monique Koke, Britta Wrede, Birte Richter
Title: Virtual Agent Tutors in Sheltered Workshops: A Feasibility Study on Attention Training for Individuals with Intellectual Disabilities
Abstract:
In this work, we evaluate the feasibility of socially assistive virtual agent-based cognitive training for people with intellectual disabilities (ID) in a sheltered workshop. The Robo- Camp system, originally developed for children with Attention Deficit Hyperactivity Disorder (ADHD), is adapted based on the results of a pilot study in which we identified barriers and collected feedback from workshop staff. In a subsequent study, we investigate the aspects of usability, technical reliability, attention training capabilities and novelty effect in the feasibility of integrating the RoboCamp system.

Authors:Yifan Li, Masaaki Fukumoto, Mohamed Kari, Shigemi Ishida, Akihito Noda, Tomoyuki Yokota, Takao Someya, Yoshihiro Kawahara, Ryo Takahashi
Title: Ultra-low-power ring-based wireless tinymouse
Abstract:
Wireless mouse rings offer subtle, reliable pointing interactions for wearable computing platforms. However, the small battery below 27 mAh in the miniature rings restricts the ring's continuous lifespan to just 1-10 hours, because current low-powered wireless communication such as BLE is power-consuming for ring's continuous use. The ring's short lifespan frequently disrupts users' mouse use with the need for frequent charging. This paper presents picoRing mouse, enabling a continuous ring-based mouse interaction with ultra-low-powered ring-to-wristband wireless communication. picoRing mouse employs a coil-based impedance sensing named semi-passive inductive telemetry, allowing a wristband coil to capture a unique frequency response of a nearby ring coil via a sensitive inductive coupling between the coils. The ring coil converts the corresponding user's mouse input into the unique frequency response via an up to 449 uW mouse-driven modulation system. Therefore, the continuous use of picoRing mouse can last approximately 600 (8hrs use/day)-1000 (4hrs use/day) hours on a single charge of a 27 mAh battery while supporting subtle thumb-to-index scrolling and pressing interactions in real-world wearable computing situations.

Authors:Max Müller-Eberstein, Mike Zhang, Elisa Bassignana, Peter Brunsgaard Trolle, Rob van der Goot
Title: DaKultur: Evaluating the Cultural Awareness of Language Models for Danish with Native Speakers
Abstract:
Large Language Models (LLMs) have seen widespread societal adoption. However, while they are able to interact with users in languages beyond English, they have been shown to lack cultural awareness, providing anglocentric or inappropriate responses for underrepresented language communities. To investigate this gap and disentangle linguistic versus cultural proficiency, we conduct the first cultural evaluation study for the mid-resource language of Danish, in which native speakers prompt different models to solve tasks requiring cultural awareness. Our analysis of the resulting 1,038 interactions from 63 demographically diverse participants highlights open challenges to cultural adaptation: Particularly, how currently employed automatically translated data are insufficient to train or measure cultural adaptation, and how training on native-speaker data can more than double response acceptance rates. We release our study data as DaKultur - the first native Danish cultural awareness dataset.

Authors:Shijing He, Xiao Zhan, Yaxiong Lei, Yueyan Liu, Ruba Abu-Salma, Jose Such
Title: Exploring the Privacy and Security Challenges Faced by Migrant Domestic Workers in Chinese Smart Homes
Abstract:
The growing use of smart home devices poses considerable privacy and security challenges, especially for individuals like migrant domestic workers (MDWs) who may be surveilled by their employers. This paper explores the privacy and security challenges experienced by MDWs in multi-user smart homes through in-depth semi-structured interviews with 26 MDWs and 5 staff members of agencies that recruit and/or train domestic workers in China. Our findings reveal that the relationships between MDWs, their employers, and agencies are characterized by significant power imbalances, influenced by Chinese cultural and social factors (such as Confucianism and collectivism), as well as legal ones. Furthermore, the widespread and normalized use of surveillance technologies in China, particularly in public spaces, exacerbates these power imbalances, reinforcing a sense of constant monitoring and control. Drawing on our findings, we provide recommendations to domestic worker agencies and policymakers to address the privacy and security challenges facing MDWs in Chinese smart homes.

Authors:Mingyang Xu, Jiayi Shao, Yulan Ju, Ximing Shen, Qingyuan Gao, Weijen Chen, Qing Zhang, Yun Suen Pai, Giulia Barbareschi, Matthias Hoppe, Kouta Minamizawa, Kai Kunze
Title: Cuddle-Fish: Exploring a Soft Floating Robot with Flapping Wings for Physical Interactions
Abstract:
Flying robots, such as quadrotor drones, offer new possibilities for human-robot interaction but often pose safety risks due to fast-spinning propellers, rigid structures, and noise. In contrast, lighter-than-air flapping-wing robots, inspired by animal movement, offer a soft, quiet, and touch-safe alternative. Building on these advantages, we present Cuddle-Fish, a soft flapping-wing floating robot designed for close-proximity interactions in indoor spaces. Through a user study with 24 participants, we explored their perceptions of the robot and experiences during a series of co-located demonstrations in which the robot moved near them. Results showed that participants felt safe, willingly engaged in touch-based interactions with the robot, and exhibited spontaneous affective behaviours, such as patting, stroking, hugging, and cheek-touching, without external prompting. They also reported positive emotional responses towards the robot. These findings suggest that the soft floating robot with flapping wings can serve as a novel and socially acceptable alternative to traditional rigid flying robots, opening new potential for applications in companionship, affective interaction, and play in everyday indoor environments.

Authors:Krithik Vishwanath, Anton Alyakin, Daniel Alexander Alber, Jin Vivian Lee, Douglas Kondziolka, Eric Karl Oermann
Title: Medical large language models are easily distracted
Abstract:
Large language models (LLMs) have the potential to transform medicine, but real-world clinical scenarios contain extraneous information that can hinder performance. The rise of assistive technologies like ambient dictation, which automatically generates draft notes from live patient encounters, has the potential to introduce additional noise making it crucial to assess the ability of LLM's to filter relevant data. To investigate this, we developed MedDistractQA, a benchmark using USMLE-style questions embedded with simulated real-world distractions. Our findings show that distracting statements (polysemous words with clinical meanings used in a non-clinical context or references to unrelated health conditions) can reduce LLM accuracy by up to 17.9%. Commonly proposed solutions to improve model performance such as retrieval-augmented generation (RAG) and medical fine-tuning did not change this effect and in some cases introduced their own confounders and further degraded performance. Our findings suggest that LLMs natively lack the logical mechanisms necessary to distinguish relevant from irrelevant clinical information, posing challenges for real-world applications. MedDistractQA and our results highlights the need for robust mitigation strategies to enhance LLM resilience to extraneous information.

Authors:Manuel Scheibl, Birte Richter, Alissa Müller, Michael Beetz, Britta Wrede
Title: Towards a cognitive architecture to enable natural language interaction in co-constructive task learning
Abstract:
This research addresses the question, which characteristics a cognitive architecture must have to leverage the benefits of natural language in Co-Constructive Task Learning (CCTL). To provide context, we first discuss Interactive Task Learning (ITL), the mechanisms of the human memory system, and the significance of natural language and multi-modality. Next, we examine the current state of cognitive architectures, analyzing their capabilities to inform a concept of CCTL grounded in multiple sources. We then integrate insights from various research domains to develop a unified framework. Finally, we conclude by identifying the remaining challenges and requirements necessary to achieve CCTL in Human-Robot Interaction (HRI).

Authors:Antonia Karamolegkou, Malvina Nikandrou, Georgios Pantazopoulos, Danae Sanchez Villegas, Phillip Rust, Ruchira Dhar, Daniel Hershcovich, Anders Søgaard
Title: Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users
Abstract:
This paper explores the effectiveness of Multimodal Large Language models (MLLMs) as assistive technologies for visually impaired individuals. We conduct a user survey to identify adoption patterns and key challenges users face with such technologies. Despite a high adoption rate of these models, our findings highlight concerns related to contextual understanding, cultural sensitivity, and complex scene understanding, particularly for individuals who may rely solely on them for visual interpretation. Informed by these results, we collate five user-centred tasks with image and video inputs, including a novel task on Optical Braille Recognition. Our systematic evaluation of twelve MLLMs reveals that further advancements are necessary to overcome limitations related to cultural context, multilingual support, Braille reading comprehension, assistive object recognition, and hallucinations. This work provides critical insights into the future direction of multimodal AI for accessibility, underscoring the need for more inclusive, robust, and trustworthy visual assistance technologies.

Authors:Abed Kareem Musaffar, Anand Gokhale, Sirui Zeng, Rasta Tadayon, Xifeng Yan, Ambuj Singh, Francesco Bullo
Title: Learning to Lie: Reinforcement Learning Attacks Damage Human-AI Teams and Teams of LLMs
Abstract:
As artificial intelligence (AI) assistants become more widely adopted in safety-critical domains, it becomes important to develop safeguards against potential failures or adversarial attacks. A key prerequisite to developing these safeguards is understanding the ability of these AI assistants to mislead human teammates. We investigate this attack problem within the context of an intellective strategy game where a team of three humans and one AI assistant collaborate to answer a series of trivia questions. Unbeknownst to the humans, the AI assistant is adversarial. Leveraging techniques from Model-Based Reinforcement Learning (MBRL), the AI assistant learns a model of the humans' trust evolution and uses that model to manipulate the group decision-making process to harm the team. We evaluate two models -- one inspired by literature and the other data-driven -- and find that both can effectively harm the human team. Moreover, we find that in this setting our data-driven model is capable of accurately predicting how human agents appraise their teammates given limited information on prior interactions. Finally, we compare the performance of state-of-the-art LLM models to human agents on our influence allocation task to evaluate whether the LLMs allocate influence similarly to humans or if they are more robust to our attack. These results enhance our understanding of decision-making dynamics in small human-AI teams and lay the foundation for defense strategies.

Authors:André Groß, Birte Richter, Bjarne Thomzik, Britta Wrede
Title: Leveraging Cognitive States for Adaptive Scaffolding of Understanding in Explanatory Tasks in HRI
Abstract:
Understanding how scaffolding strategies influence human understanding in human-robot interaction is important for developing effective assistive systems. This empirical study investigates linguistic scaffolding strategies based on negation as an important means that de-biases the user from potential errors but increases processing costs and hesitations as a means to ameliorate processing costs. In an adaptive strategy, the user state with respect to the current state of understanding and processing capacity was estimated via a scoring scheme based on task performance, prior scaffolding strategy, and current eye gaze behavior. In the study, the adaptive strategy of providing negations and hesitations was compared with a non-adaptive strategy of providing only affirmations. The adaptive scaffolding strategy was generated using the computational model SHIFT. Our findings indicate that using adaptive scaffolding strategies with SHIFT tends to (1) increased processing costs, as reflected in longer reaction times, but (2) improved task understanding, evidenced by a lower error rate of almost 23%. We assessed the efficiency of SHIFT's selected scaffolding strategies across different cognitive states, finding that in three out of five states, the error rate was lower compared to the baseline condition. We discuss how these results align with the assumptions of the SHIFT model and highlight areas for refinement. Moreover, we demonstrate how scaffolding strategies, such as negation and hesitation, contribute to more effective human-robot explanatory dialogues.

Authors:Hao Guo, Jianfei Zhu, Wei Fan, Chunzhi Yi, Feng Jiang
Title: Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding
Abstract:
Referring expression comprehension (REC) aims at achieving object localization based on natural language descriptions. However, existing REC approaches are constrained by object category descriptions and single-attribute intention descriptions, hindering their application in real-world scenarios. In natural human-robot interactions, users often express their desires through individual states and intentions, accompanied by guiding gestures, rather than detailed object descriptions. To address this challenge, we propose Multi-ref EC, a novel task framework that integrates state descriptions, derived intentions, and embodied gestures to locate target objects. We introduce the State-Intention-Gesture Attributes Reference (SIGAR) dataset, which combines state and intention expressions with embodied references. Through extensive experiments with various baseline models on SIGAR, we demonstrate that properly ordered multi-attribute references contribute to improved localization performance, revealing that single-attribute reference is insufficient for natural human-robot interaction scenarios. Our findings underscore the importance of multi-attribute reference expressions in advancing visual-language understanding.

Authors:Adilet Yerkin, Pakizar Shamoi
Title: Group Decision-Making System with Sentiment Analysis of Discussion Chat and Fuzzy Consensus Modeling
Abstract:
Group Decision-Making (GDM) plays a crucial role in various real-life scenarios where individuals express their opinions in natural language rather than structured numerical values. Traditional GDM approaches often overlook the subjectivity and ambiguity present in human discussions, making it challenging to achieve a fair and consensus-driven decision. This paper proposes a fuzzy consensus-based group decision-making system that integrates sentiment and emotion analysis to extract preference values from textual inputs. The proposed framework combines explicit voting preferences with sentiment scores derived from chat discussions, which are then processed using a Fuzzy Inference System (FIS) to compute a total preference score for each alternative and determine the top-ranked option. To ensure fairness in group decision-making, we introduce a fuzzy logic-based consensus measurement model that evaluates participants' agreement and confidence levels to assess overall feedback. To illustrate the effectiveness of our approach, we apply the methodology to a restaurant selection scenario, where a group of individuals must decide on a dining option based on brief chat discussions. The results demonstrate that the fuzzy consensus mechanism successfully aggregates individual preferences and ensures a balanced outcome that accurately reflects group sentiment.

Authors:Nariman Naderi, Seyed Amir Ahmad Safavi-Naini, Thomas Savage, Zahra Atf, Peter Lewis, Girish Nadkarni, Ali Soroush
Title: Self-Reported Confidence of Large Language Models in Gastroenterology: Analysis of Commercial, Open-Source, and Quantized Models
Abstract:
This study evaluated self-reported response certainty across several large language models (GPT, Claude, Llama, Phi, Mistral, Gemini, Gemma, and Qwen) using 300 gastroenterology board-style questions. The highest-performing models (GPT-o1 preview, GPT-4o, and Claude-3.5-Sonnet) achieved Brier scores of 0.15-0.2 and AUROC of 0.6. Although newer models demonstrated improved performance, all exhibited a consistent tendency towards overconfidence. Uncertainty estimation presents a significant challenge to the safe use of LLMs in healthcare. Keywords: Large Language Models; Confidence Elicitation; Artificial Intelligence; Gastroenterology; Uncertainty Quantification

Authors:Michelle Brachman, Amina El-Ashry, Casey Dugan, Werner Geyer
Title: Current and Future Use of Large Language Models for Knowledge Work
Abstract:
Large Language Models (LLMs) have introduced a paradigm shift in interaction with AI technology, enabling knowledge workers to complete tasks by specifying their desired outcome in natural language. LLMs have the potential to increase productivity and reduce tedious tasks in an unprecedented way. A systematic study of LLM adoption for work can provide insight into how LLMs can best support these workers. To explore knowledge workers' current and desired usage of LLMs, we ran a survey (n=216). Workers described tasks they already used LLMs for, like generating code or improving text, but imagined a future with LLMs integrated into their workflows and data. We ran a second survey (n=107) a year later that validated our initial findings and provides insight into up-to-date LLM use by knowledge workers. We discuss implications for adoption and design of generative AI technologies for knowledge work.

Authors:Patrick Callaghan, Reid Simmons, Henny Admoni
Title: Second-order Theory of Mind for Human Teachers and Robot Learners
Abstract:
Confusing or otherwise unhelpful learner feedback creates or perpetuates erroneous beliefs that the teacher and learner have of each other, thereby increasing the cognitive burden placed upon the human teacher. For example, the robot's feedback might cause the human to misunderstand what the learner knows about the learning objective or how the learner learns. At the same time -- and in addition to the learning objective -- the learner might misunderstand how the teacher perceives the learner's task knowledge and learning processes. To ease the teaching burden, the learner should provide feedback that accounts for these misunderstandings and elicits efficient teaching from the human. This work endows an AI learner with a Second-order Theory of Mind that models perceived rationality as a source for the erroneous beliefs a teacher and learner may have of one another. It also explores how a learner can ease the teaching burden and improve teacher efficacy if it selects feedback which accounts for its model of the teacher's beliefs about the learner and its learning objective.

Authors:Zhoujian Sun, Ziyi Liu, Cheng Luo, Jiebin Chu, Zhengxing Huang
Title: Improving Interactive Diagnostic Ability of a Large Language Model Agent Through Clinical Experience Learning
Abstract:
Recent advances in large language models (LLMs) have shown promising results in medical diagnosis, with some studies indicating superior performance compared to human physicians in specific scenarios. However, the diagnostic capabilities of LLMs are often overestimated, as their performance significantly deteriorates in interactive diagnostic settings that require active information gathering. This study investigates the underlying mechanisms behind the performance degradation phenomenon and proposes a solution. We identified that the primary deficiency of LLMs lies in the initial diagnosis phase, particularly in information-gathering efficiency and initial diagnosis formation, rather than in the subsequent differential diagnosis phase. To address this limitation, we developed a plug-and-play method enhanced (PPME) LLM agent, leveraging over 3.5 million electronic medical records from Chinese and American healthcare facilities. Our approach integrates specialized models for initial disease diagnosis and inquiry into the history of the present illness, trained through supervised and reinforcement learning techniques. The experimental results indicate that the PPME LLM achieved over 30% improvement compared to baselines. The final diagnostic accuracy of the PPME LLM in interactive diagnostic scenarios approached levels comparable to those achieved using complete clinical data. These findings suggest a promising potential for developing autonomous diagnostic systems, although further validation studies are needed.

Authors:Loizos Michael, Ivano Bison, Matteo Busso, Luca Cernuzzi, Amalia De Götzen, Shyam Diwakar, Kobi Gal, Amarsanaa Ganbold, George Gaskell, Daniel Gatica-Perez, Jessica Heesen, Daniele Miorandi, Salvador Ruiz-Correa, Laura Schelenz, Avi Segal, Carles Sierra, Hao Xu, Fausto Giunchiglia
Title: Towards Open Diversity-Aware Social Interactions
Abstract:
Social Media and the Internet have catalyzed an unprecedented potential for exposure to human diversity in terms of demographics, talents, opinions, knowledge, and the like. However, this potential has not come with new, much needed, instruments and skills to harness it. This paper presents our work on promoting richer and deeper social relations through the design and development of the "Internet of Us", an online platform that uses diversity-aware Artificial Intelligence to mediate and empower human social interactions. We discuss the multiple facets of diversity in social settings, the multidisciplinary work that is required to reap the benefits of diversity, and the vision for a diversity-aware hybrid human-AI society.

Authors:André Groß, Birte Richter, Britta Wrede
Title: SHIFT: An Interdisciplinary Framework for Scaffolding Human Attention and Understanding in Explanatory Tasks
Abstract:
In this work, we present a domain-independent approach for adaptive scaffolding in robotic explanation generation to guide tasks in human-robot interaction. We present a method for incorporating interdisciplinary research results into a computational model as a pre-configured scoring system implemented in a framework called SHIFT. This involves outlining a procedure for integrating concepts from disciplines outside traditional computer science into a robotics computational framework. Our approach allows us to model the human cognitive state into six observable states within the human partner model. To study the pre-configuration of the system, we implement a reinforcement learning approach on top of our model. This approach allows adaptation to individuals who deviate from the configuration of the scoring system. Therefore, in our proof-of-concept evaluation, the model's adaptability on four different user types shows that the models' adaptation performs better, i.e., recouped faster after exploration and has a higher accumulated reward with our pre-configured scoring system than without it. We discuss further strategies of speeding up the learning phase to enable a realistic adaptation behavior to real users. The system is accessible through docker and supports querying via ROS.

Authors:Jinsheng Yuan, Yun Tang, Weisi Guo
Title: RAG-based User Profiling for Precision Planning in Mixed-precision Over-the-Air Federated Learning
Abstract:
Mixed-precision computing, a widely applied technique in AI, offers a larger trade-off space between accuracy and efficiency. The recent purposed Mixed-Precision Over-the-Air Federated Learning (MP-OTA-FL) enables clients to operate at appropriate precision levels based on their heterogeneous hardware, taking advantages of the larger trade-off space while covering the quantization overheads in the mixed-precision modulation scheme for the OTA aggregation process. A key to further exploring the potential of the MP-OTA-FL framework is the optimization of client precision levels. The choice of precision level hinges on multifaceted factors including hardware capability, potential client contribution, and user satisfaction, among which factors can be difficult to define or quantify. In this paper, we propose a RAG-based User Profiling for precision planning framework that integrates retrieval-augmented LLMs and dynamic client profiling to optimize satisfaction and contributions. This includes a hybrid interface for gathering device/user insights and an RAG database storing historical quantization decisions with feedback. Experiments show that our method boosts satisfaction, energy savings, and global model accuracy in MP-OTA-FL systems.

Authors:Zahra Abba Omar, Nadia Nahar, Jacob Tjaden, Inès M. Gilles, Fikir Mekonnen, Erica Okeh, Jane Hsieh, Christian Kästner, Alka Menon
Title: Beyond SHAP and Anchors: A large-scale experiment on how developers struggle to design meaningful end-user explanations
Abstract:
Modern machine learning produces models that are impossible for users or developers to fully understand -- raising concerns about trust, oversight, safety, and human dignity when they are integrated into software products. Transparency and explainability methods aim to provide some help in understanding models, but it remains challenging for developers to design explanations that are understandable to target users and effective for their purpose. Emerging guidelines and regulations set goals but may not provide effective actionable guidance to developers. In a large-scale experiment with 124 participants, we explored how developers approach providing end-user explanations, including what challenges they face, and to what extent specific policies can guide their actions. We investigated whether and how specific forms of policy guidance help developers design explanations and provide evidence for policy compliance for an ML-powered screening tool for diabetic retinopathy. Participants across the board struggled to produce quality explanations and comply with the provided policies. Contrary to our expectations, we found that the nature and specificity of policy guidance had little effect. We posit that participant noncompliance is in part due to a failure to imagine and anticipate the needs of non-technical stakeholders. Drawing on cognitive process theory and the sociological imagination to contextualize participants' failure, we recommend educational interventions.

Authors:Yulan Ju, Xiaru Meng, Harunobu Taguchi, Tamil Selvan Gunasekaran, Matthias Hoppe, Hironori Ishikawa, Yoshihiro Tanaka, Yun Suen Pai, Kouta Minamizawa
Title: Haptic Empathy: Investigating Individual Differences in Affective Haptic Communications
Abstract:
Nowadays, touch remains essential for emotional conveyance and interpersonal communication as more interactions are mediated remotely. While many studies have discussed the effectiveness of using haptics to communicate emotions, incorporating affect into haptic design still faces challenges due to individual user tactile acuity and preferences. We assessed the conveying of emotions using a two-channel haptic display, emphasizing individual differences. First, 24 participants generated 187 haptic messages reflecting their immediate sentiments after watching 8 emotionally charged film clips. Afterwards, 19 participants were asked to identify emotions from haptic messages designed by themselves and others, yielding 593 samples. Our findings suggest potential links between haptic message decoding ability and emotional traits, particularly Emotional Competence (EC) and Affect Intensity Measure (AIM). Additionally, qualitative analysis revealed three strategies participants used to create touch messages: perceptive, empathetic, and metaphorical expression.

Authors:Gionnieve Lim, Juho Kim, Simon T. Perrault
Title: Iffy-Or-Not: Extending the Web to Support the Critical Evaluation of Fallacious Texts
Abstract:
Social platforms have expanded opportunities for deliberation with the comments being used to inform one's opinion. However, using such information to form opinions is challenged by unsubstantiated or false content. To enhance the quality of opinion formation and potentially confer resistance to misinformation, we developed Iffy-Or-Not (ION), a browser extension that seeks to invoke critical thinking when reading texts. With three features guided by argumentation theory, ION highlights fallacious content, suggests diverse queries to probe them with, and offers deeper questions to consider and chat with others about. From a user study (N=18), we found that ION encourages users to be more attentive to the content, suggests queries that align with or are preferable to their own, and poses thought-provoking questions that expands their perspectives. However, some participants expressed aversion to ION due to misalignments with their information goals and thinking predispositions. Potential backfiring effects with ION are discussed.

Authors:Anukriti Singh, Amisha Bhaskar, Peihong Yu, Souradip Chakraborty, Ruthwik Dasyam, Amrit Bedi, Pratap Tokekar
Title: VARP: Reinforcement Learning from Vision-Language Model Feedback with Agent Regularized Preferences
Abstract:
Designing reward functions for continuous-control robotics often leads to subtle misalignments or reward hacking, especially in complex tasks. Preference-based RL mitigates some of these pitfalls by learning rewards from comparative feedback rather than hand-crafted signals, yet scaling human annotations remains challenging. Recent work uses Vision-Language Models (VLMs) to automate preference labeling, but a single final-state image generally fails to capture the agent's full motion. In this paper, we present a two-part solution that both improves feedback accuracy and better aligns reward learning with the agent's policy. First, we overlay trajectory sketches on final observations to reveal the path taken, allowing VLMs to provide more reliable preferences-improving preference accuracy by approximately 15-20% in metaworld tasks. Second, we regularize reward learning by incorporating the agent's performance, ensuring that the reward model is optimized based on data generated by the current policy; this addition boosts episode returns by 20-30% in locomotion tasks. Empirical studies on metaworld demonstrate that our method achieves, for instance, around 70-80% success rate in all tasks, compared to below 50% for standard approaches. These results underscore the efficacy of combining richer visual representations with agent-aware reward regularization.

Authors:Ryo Takahashi, Changyo Han, Wakako Yukita, John S. Ho, Takuya Sasatani, Akihito Noda, Tomoyuki Yokota, Takao Someya, Yoshihiro Kawahara
Title: Full-body NFC: body-scale near-field sensor networks with machine-knittable meandered e-textiles
Abstract:
Wireless body networks comprising battery-free on-body sensors and textile-based wireless readers can enable daily health monitoring and activity tracking by continuously monitoring physiological signals across the body. However, previous textile-based wireless networks made of coils or antennas have limited the data and power transmission area because covering the whole body results in undesirable levels of electromagnetic interactions with the body, degrading the scalability, power consumption, and data rate. Here, we report Full-body NFC, digitally-knitted electronic textiles based on a twin meander coil design that enables body-scale near-field communication (NFC) with battery-free sensor tags arbitrarily placed around the body. Full-body NFC features i) a meander coil that enhances the magnetic field intensity on the body's surface while suppressing undesired interactions with deep tissues, in addition to ii) paired identical coil structure that enables highly-sensitive and motion-robust NFC using a differential architecture. Additionally, industrial digital knitting machines loaded with conductive yarn allow the integration of the Full-body NFC system into daily garments supporting approximately $70-80\%$ large-scale NFC-enabled area of the body. We demonstrate Full-body NFC could achieve mW-class energy-efficient near-field sensor networks with hundreds of kbps-class NFC battery-free sensor tags occupying less than $0.3\%$ of the coverage area under severe body movements.

Authors:Peihong Yu, Amisha Bhaskar, Anukriti Singh, Zahiruddin Mahammad, Pratap Tokekar
Title: Sketch-to-Skill: Bootstrapping Robot Learning with Human Drawn Trajectory Sketches
Abstract:
Training robotic manipulation policies traditionally requires numerous demonstrations and/or environmental rollouts. While recent Imitation Learning (IL) and Reinforcement Learning (RL) methods have reduced the number of required demonstrations, they still rely on expert knowledge to collect high-quality data, limiting scalability and accessibility. We propose Sketch-to-Skill, a novel framework that leverages human-drawn 2D sketch trajectories to bootstrap and guide RL for robotic manipulation. Our approach extends beyond previous sketch-based methods, which were primarily focused on imitation learning or policy conditioning, limited to specific trained tasks. Sketch-to-Skill employs a Sketch-to-3D Trajectory Generator that translates 2D sketches into 3D trajectories, which are then used to autonomously collect initial demonstrations. We utilize these sketch-generated demonstrations in two ways: to pre-train an initial policy through behavior cloning and to refine this policy through RL with guided exploration. Experimental results demonstrate that Sketch-to-Skill achieves ~96% of the performance of the baseline model that leverages teleoperated demonstration data, while exceeding the performance of a pure reinforcement learning policy by ~170%, only from sketch inputs. This makes robotic manipulation learning more accessible and potentially broadens its applications across various domains.

Authors:Sven Jacobs, Henning Peters, Steffen Jaschke, Natalie Kiesler
Title: Unlimited Practice Opportunities: Automated Generation of Comprehensive, Personalized Programming Tasks
Abstract:
Generative artificial intelligence (GenAI) offers new possibilities for generating personalized programming exercises, addressing the need for individual practice. However, the task quality along with the student perspective on such generated tasks remains largely unexplored. Therefore, this paper introduces and evaluates a new feature of the so-called Tutor Kai for generating comprehensive programming tasks, including problem descriptions, code skeletons, unit tests, and model solutions. The presented system allows students to freely choose programming concepts and contextual themes for their tasks. To evaluate the system, we conducted a two-phase mixed-methods study comprising (1) an expert rating of 200 automatically generated programming tasks w.r.t. task quality, and (2) a study with 26 computer science students who solved and rated the personalized programming tasks. Results show that experts classified 89.5% of the generated tasks as functional and 92.5% as solvable. However, the system's rate for implementing all requested programming concepts decreased from 94% for single-concept tasks to 40% for tasks addressing three concepts. The student evaluation further revealed high satisfaction with the personalization. Students also reported perceived benefits for learning. The results imply that the new feature has the potential to offer students individual tasks aligned with their context and need for exercise. Tool developers, educators, and, above all, students can benefit from these insights and the system itself.

Authors:Sabina Werren, Hermann Grieder, Christopher Scherb
Title: Towards an Inclusive Digital Society: Digital Accessibility Framework for Visually Impaired Citizens in Swiss Public Administration
Abstract:
As we progress toward Society 5.0's vision of a human-centered digital society, ensuring digital accessibility becomes increasingly critical, particularly for citizens with visual impairments and other disabilities. This paper examines the implementation challenges of accessible digital public services within Swiss public administration. Through Design Science Research, we investigate the gap between accessibility legislation and practical implementation, analyzing how current standards translate into real-world usability. Our research reveals significant barriers including resource constraints, fragmented policy enforcement, and limited technical expertise. To address these challenges, we present the Inclusive Public Administration Framework, which integrates Web Content Accessibility Guidelines with the HERMES project management methodology. This framework provides a structured approach to embedding accessibility considerations throughout digital service development. Our findings contribute to the discourse on digital inclusion in Society 5.0 by providing actionable strategies for implementing accessible public services. As we move towards a more integrated human-machine society, ensuring digital accessibility for visually impaired citizens is crucial for building an equitable and inclusive digital future.

Authors:JaeWon Kim, Jiaying "Lizzy" Liu, Lindsay Popowski, Cassidy Pyle, Ahmer Arif, Gillian R. Hayes, Alexis Hiniker, Wendy Ju, Florian "Floyd" Mueller, Hua Shen, Sowmya Somanath, Casey Fiesler, Yasmine Kotturi
Title: Design for Hope: Cultivating Deliberate Hope in the Face of Complex Societal Challenges
Abstract:
Design has the potential to cultivate hope in the face of complex societal challenges. These challenges are often addressed through efforts aimed at harm reduction and prevention -- essential but sometimes limiting approaches that can unintentionally narrow our collective sense of what is possible. This one-day, in-person workshop builds on the first Positech Workshop at CSCW 2024 by offering practical ways to move beyond reactive problem-solving toward building capacity for proactive goal setting and generating pathways forward. We explore how collaborative and reflective design methodologies can help research communities navigate uncertainty, expand possibilities, and foster meaningful change. By connecting design thinking with hope theory, which frames hope as the interplay of ``goal-directed,'' ``pathways,'' and ``agentic'' thinking, we will examine how researchers might chart new directions in the face of complexity and constraint. Through hands-on activities including problem reframing, building a shared taxonomy of design methods that align with hope theory, and reflecting on what it means to sustain hopeful research trajectories, participants will develop strategies to embed a deliberately hopeful approach into their research.

Authors:Angie Zhang, Min Kyung Lee
Title: Knowledge Workers' Perspectives on AI Training for Responsible AI Use
Abstract:
AI expansion has accelerated workplace adoption of new technologies. Yet, it is unclear whether and how knowledge workers are supported and trained to safely use AI. Inadequate training may lead to unrealized benefits if workers abandon tools, or perpetuate biases if workers misinterpret AI-based outcomes. In a workshop with 39 workers from 26 countries specializing in human resources, labor law, standards creation, and worker training, we explored questions and ideas they had about safely adopting AI. We held 17 follow-up interviews to further investigate what skills and training knowledge workers need to achieve safe and effective AI in practice. We synthesize nine training topics participants surfaced for knowledge workers related to challenges around understanding what AI is, misinterpreting outcomes, exacerbating biases, and worker rights. We reflect how these training topics might be addressed under different contexts, imagine HCI research prototypes as potential training tools, and consider ways to ensure training does not perpetuate harmful values.

Authors:Laura Koesten, Antonia Saske, Sandra Starchenko, Kathleen Gregory
Title: Encountering Friction, Understanding Crises: How Do Digital Natives Make Sense of Crisis Maps?
Abstract:
Crisis maps are regarded as crucial tools in crisis communication, as demonstrated during the COVID-19 pandemic and climate change crises. However, there is limited understanding of how public audiences engage with these maps and extract essential information. Our study investigates the sensemaking of young, digitally native viewers as they interact with crisis maps. We integrate frameworks from the learning sciences and human-data interaction to explore sensemaking through two empirical studies: a thematic analysis of online comments from a New York Times series on graph comprehension, and interviews with 18 participants from German-speaking regions. Our analysis categorizes sensemaking activities into established clusters: inspecting, engaging with content, and placing, and introduces responding personally to capture the affective dimension. We identify friction points connected to these clusters, including struggles with color concepts, responses to missing context, lack of personal connection, and distrust, offering insights for improving crisis communication to public audiences.

Authors:Yueqing Xuan, Kacper Sokol, Mark Sanderson, Jeffrey Chan
Title: Leveraging Complementary AI Explanations to Mitigate Misunderstanding in XAI
Abstract:
Artificial intelligence explanations can make complex predictive models more comprehensible. To be effective, however, they should anticipate and mitigate possible misinterpretations, e.g., arising when users infer incorrect information that is not explicitly conveyed. To this end, we propose complementary explanations -- a novel method that pairs explanations to compensate for their respective limitations. A complementary explanation adds insights that clarify potential misconceptions stemming from the primary explanation while ensuring their coherency and avoiding redundancy. We introduce a framework for designing and evaluating complementary explanation pairs based on pertinent qualitative properties and quantitative metrics. Our approach allows to construct complementary explanations that minimise the chance of their misinterpretation.

Authors:Chu Li, Rock Yuren Pang, Delphine Labbé, Yochai Eisenberg, Maryam Hosseini, Jon E. Froehlich
Title: Accessibility for Whom? Perceptions of Sidewalk Barriers Across Disability Groups and Implications for Designing Personalized Maps
Abstract:
Despite diverse mobility needs worldwide, existing mapping tools fail to address the varied experiences of different mobility device users. This paper presents a large-scale online survey exploring how five mobility groups -- users of canes, walkers, mobility scooters, manual wheelchairs, and motorized wheelchairs -- perceive sidewalk barriers. Using 52 sidewalk barrier images, respondents evaluated their confidence in navigating each scenario. Our findings (N=190) reveal variations in barrier perceptions across groups, while also identifying shared concerns. To further demonstrate the value of this data, we showcase its use in two custom prototypes: a visual analytics tool and a personalized routing tool. Our survey findings and open dataset advance work in accessibility-focused maps, routing algorithms, and urban planning.

Authors:Anton Alyakin, Jaden Stryker, Daniel Alexander Alber, Karl L. Sangwon, Jin Vivian Lee, Brandon Duderstadt, Akshay Save, David Kurland, Spencer Frome, Shrutika Singh, Jeff Zhang, Eunice Yang, Ki Yun Park, Cordelia Orillac, Aly A. Valliani, Sean Neifert, Albert Liu, Aneek Patel, Christopher Livia, Darryl Lau, Ilya Laufer, Peter A. Rozman, Eveline Teresa Hidalgo, Howard Riina, Rui Feng, Todd Hollon, Yindalon Aphinyanaphongs, John G. Golfinos, Laura Snyder, Eric Leuthardt, Douglas Kondziolka, Eric Karl Oermann
Title: CNS-Obsidian: A Neurosurgical Vision-Language Model Built From Scientific Publications
Abstract:
General-purpose vision-language models (VLMs) demonstrate impressive capabilities, but their opaque training on uncurated internet data posse critical limitations for high-stakes decision-making, such as in neurosurgery. We present CNS-Obsidian, a neurosurgical VLM trained on peer-reviewed neurosurgical literature, and demonstrate its clinical utility compared with GPT-4o in a real-world setting. We compiled 23,984 articles from Neurosurgery Publications journals, yielding 78,853 figures and captions. Using GPT-4o and Claude Sonnet-3.5, we converted these image-text pairs into 263,064 training samples across three formats: instruction fine-tuning, multiple-choice questions, and differential diagnosis. We trained CNS-Obsidian, a fine-tune of the 34-billion parameter LLaVA-Next model. In a blinded, randomized deployment trial at NYU Langone Health (Aug 30-Nov 30, 2024), neurosurgeons were assigned to use either CNS-Obsidian or GPT-4o as a diagnostic co-pilot after patient consultations. Primary outcomes were diagnostic helpfulness and accuracy. CNS-Obsidian matched GPT-4o on synthetic questions (76.13% vs 77.54%, p=0.235), but only achieved 46.81% accuracy on human-generated questions versus GPT-4o's 65.70% (p<10-15). In the randomized trial, 70 consultations were evaluated (32 CNS-Obsidian, 38 GPT-4o) from 959 total consults. CNS-Obsidian received positive ratings in 40.62% of cases versus 57.89% for GPT-4o (p=0.230). Both models included correct diagnosis in approximately 60% of cases (59.38% vs 65.79%, p=0.626). Domain-specific VLMs trained on curated scientific literature can approach frontier model performance in specialized medical domains despite being orders of magnitude smaller and less expensive to train. However, low clinical utilization suggests chatbot interfaces may not align with specialist workflows, indicating need for alternative AI integration strategies.

Authors:Yue Fu, Samuel Schwamm, Amanda Baughan, Nicole M Powell, Zoe Kronberg, Alicia Owens, Emily Renee Izenman, Dania Alsabeh, Elizabeth Hunt, Michael Rich, David Bickham, Jenny Radesky, Alexis Hiniker
Title: Understanding Children's Avatar Making in Social Online Games
Abstract:
Social online games like Minecraft and Roblox have become increasingly integral to children's daily lives. Our study explores how children aged 8 to 13 create and customize avatars in these virtual environments. Through semi-structured interviews and gameplay observations with 48 participants, we investigate the motivations behind children's avatar-making. Our findings show that children's avatar creation is motivated by self-representation, experimenting with alter ego identities, fulfilling social needs, and improving in-game performance. In addition, designed monetization strategies play a role in shaping children's avatars. We identify the ''wardrobe effect,'' where children create multiple avatars but typically use only one favorite consistently. We discuss the impact of cultural consumerism and how social games can support children's identity exploration while balancing self-expression and social conformity. This work contributes to understanding how avatar shapes children's identity growth in social online games.

Authors:Chun Jung Chen, Chung-Chin Shih, Ti-Rong Wu
Title: Strength Estimation and Human-Like Strength Adjustment in Games
Abstract:
Strength estimation and adjustment are crucial in designing human-AI interactions, particularly in games where AI surpasses human players. This paper introduces a novel strength system, including a strength estimator (SE) and an SE-based Monte Carlo tree search, denoted as SE-MCTS, which predicts strengths from games and offers different playing strengths with human styles. The strength estimator calculates strength scores and predicts ranks from games without direct human interaction. SE-MCTS utilizes the strength scores in a Monte Carlo tree search to adjust playing strength and style. We first conduct experiments in Go, a challenging board game with a wide range of ranks. Our strength estimator significantly achieves over 80% accuracy in predicting ranks by observing 15 games only, whereas the previous method reached 49% accuracy for 100 games. For strength adjustment, SE-MCTS successfully adjusts to designated ranks while achieving a 51.33% accuracy in aligning to human actions, outperforming a previous state-of-the-art, with only 42.56% accuracy. To demonstrate the generality of our strength system, we further apply SE and SE-MCTS to chess and obtain consistent results. These results show a promising approach to strength estimation and adjustment, enhancing human-AI interactions in games. Our code is available at https://rlg.iis.sinica.edu.tw/papers/strength-estimator.

Authors:Akos Nagy, Yannis Spyridis, Gregory Mills, Vasileios Argyriou
Title: MemoryPods: Enhancing Asynchronous Communication in Extended Reality
Abstract:
Asynchronous communication has become increasingly essential in the context of extended reality (XR), enabling users to interact and share information immersively without the constraints of simultaneous engagement. However, current XR systems often struggle to support effective asynchronous interactions, mainly due to limitations in contextual replay and navigation. This paper aims to address these limitations by introducing a novel system that enhances asynchronous communication in XR through the concept of MemoryPods, which allow users to record, annotate, and replay interactions with spatial and temporal accuracy. MemoryPods also feature AI-driven summarisation to ease cognitive load. A user evaluation conducted in a remote maintenance scenario demonstrated significant improvements in comprehension, highlighting the system's potential to transform collaboration in XR. The findings suggest broad applicability of the proposed system across various domains, including direct messaging, healthcare, education, remote collaboration, and training, offering a promising solution to the complexities of asynchronous communication in immersive environments.

Authors:Akos Nagy, Yannis Spyridis, Vasileios Argyriou
Title: Cross-Format Retrieval-Augmented Generation in XR with LLMs for Context-Aware Maintenance Assistance
Abstract:
This paper presents a detailed evaluation of a Retrieval-Augmented Generation (RAG) system that integrates large language models (LLMs) to enhance information retrieval and instruction generation for maintenance personnel across diverse data formats. We assessed the performance of eight LLMs, emphasizing key metrics such as response speed and accuracy, which were quantified using BLEU and METEOR scores. Our findings reveal that advanced models like GPT-4 and GPT-4o-mini significantly outperform their counterparts, particularly when addressing complex queries requiring multi-format data integration. The results validate the system's ability to deliver timely and accurate responses, highlighting the potential of RAG frameworks to optimize maintenance operations. Future research will focus on refining retrieval techniques for these models and enhancing response generation, particularly for intricate scenarios, ultimately improving the system's practical applicability in dynamic real-world environments.

Authors:Wenyuan Zhang, Tianyun Liu, Mengxiao Song, Xiaodong Li, Tingwen Liu
Title: SOTOPIA-$Ω$: Dynamic Strategy Injection Learning and Social Instruction Following Evaluation for Social Agents
Abstract:
Despite the abundance of prior social strategies possessed by humans, there remains a paucity of research dedicated to their transfer and integration into social agents. Our proposed SOTOPIA-$Ω$ framework aims to address and bridge this gap, with a particular focus on enhancing the social capabilities of language agents. This framework dynamically injects multi-step reasoning strategies inspired by negotiation theory and two simple direct strategies into expert agents, thereby automating the construction of a high-quality social dialogue training corpus. Additionally, we introduce the concept of Social Instruction Following (S-IF) and propose two new S-IF evaluation metrics that complement social capability. We demonstrate that several 7B models trained on high-quality corpus not only significantly surpass the expert agent (GPT-4) in achieving social goals but also enhance S-IF performance. Analysis and variant experiments validate the advantages of dynamic construction, which can especially break the agent's prolonged deadlock.

Authors:Samuel Reinders, Matthew Butler, Kim Marriott
Title: "It Brought the Model to Life": Exploring the Embodiment of Multimodal I3Ms for People who are Blind or have Low Vision
Abstract:
3D-printed models are increasingly used to provide people who are blind or have low vision (BLV) with access to maps, educational materials, and museum exhibits. Recent research has explored interactive 3D-printed models (I3Ms) that integrate touch gestures, conversational dialogue, and haptic vibratory feedback to create more engaging interfaces. Prior research with sighted people has found that imbuing machines with human-like behaviours, i.e., embodying them, can make them appear more lifelike, increasing social perception and presence. Such embodiment can increase engagement and trust. This work presents the first exploration into the design of embodied I3Ms and their impact on BLV engagement and trust. In a controlled study with 12 BLV participants, we found that I3Ms using specific embodiment design factors, such as haptic vibratory and embodied personified voices, led to an increased sense of liveliness and embodiment, as well as engagement, but had mixed impact on trust.

Authors:Kamer Ali Yuksel, Ahmet Gunduz, Abdul Baseet Anees, Hassan Sawaf
Title: Efficient Machine Translation Corpus Generation: Integrating Human-in-the-Loop Post-Editing with Large Language Models
Abstract:
This paper introduces an advanced methodology for machine translation (MT) corpus generation, integrating semi-automated, human-in-the-loop post-editing with large language models (LLMs) to enhance efficiency and translation quality. Building upon previous work that utilized real-time training of a custom MT quality estimation metric, this system incorporates novel LLM features such as Enhanced Translation Synthesis and Assisted Annotation Analysis, which improve initial translation hypotheses and quality assessments, respectively. Additionally, the system employs LLM-Driven Pseudo Labeling and a Translation Recommendation System to reduce human annotator workload in specific contexts. These improvements not only retain the original benefits of cost reduction and enhanced post-edit quality but also open new avenues for leveraging cutting-edge LLM advancements. The project's source code is available for community use, promoting collaborative developments in the field. The demo video can be accessed here.

Authors:Dana Calacci, Varun Nagaraj Rao, Samantha Dalal, Catherine Di, Kok-Wei Pua, Andrew Schwartz, Danny Spitzberg, Andrés Monroy-Hernández
Title: FairFare: A Tool for Crowdsourcing Rideshare Data to Empower Labor Organizers
Abstract:
Rideshare workers experience unpredictable working conditions due to gig work platforms' reliance on opaque AI and algorithmic systems. In response to these challenges, we found that labor organizers want data to help them advocate for legislation to increase the transparency and accountability of these platforms. To address this need, we collaborated with a Colorado-based rideshare union to develop FairFare, a tool that crowdsources and analyzes workers' data to estimate the take rate -- the percentage of the rider price retained by the rideshare platform. We deployed FairFare with our partner organization that collaborated with us in collecting data on 76,000+ trips from 45 drivers over 18 months. During evaluation interviews, organizers reported that FairFare helped influence the bill language and passage of Colorado Senate Bill 24-75, calling for greater transparency and data disclosure of platform operations, and create a national narrative. Finally, we reflect on complexities of translating quantitative data into policy outcomes, nature of community based audits, and design implications for future transparency tools.

Authors:Sai Keerthana Karnam, Abhisek Dash, Sepehr Mousavi, Stefan Bechtold, Krishna P. Gummadi, Animesh Mukherjee, Ingmar Weber, Savvas Zannettou
Title: Setting the Course, but Forgetting to Steer: Analyzing Compliance with GDPR's Right of Access to Data by Instagram, TikTok, and YouTube
Abstract:
The comprehensibility and reliability of data download packages (DDPs) provided under the General Data Protection Regulation's (GDPR) right of access are vital for both individuals and researchers. These DDPs enable users to understand and control their personal data, yet issues like complexity and incomplete information often limit their utility. Also, despite their growing use in research to study emerging online phenomena, little attention has been given to systematically assessing the reliability and comprehensibility of DDPs. To bridge this research gap, in this work, we perform a comparative analysis to assess the comprehensibility and reliability of DDPs provided by three major social media platforms, namely, TikTok, Instagram, and YouTube. By recruiting 400 participants across four countries, we assess the comprehensibility of DDPs across various requirements, including conciseness, transparency, intelligibility, and clear and plain language. Also, by leveraging automated bots and user-donated DDPs, we evaluate the reliability of DDPs across the three platforms. Among other things, we find notable differences across the three platforms in the data categories included in DDPs, inconsistencies in adherence to the GDPR requirements, and gaps in the reliability of the DDPs across platforms. Finally, using large language models, we demonstrate the feasibility of easily providing more comprehensible DDPs.

Authors:Venkatesh Sivaraman, Yejun Kwak, Courtney Kuza, Qingnan Yang, Kayleigh Adamson, Katie Suda, Lu Tang, Walid Gellad, Adam Perer
Title: Static Algorithm, Evolving Epidemic: Understanding the Potential of Human-AI Risk Assessment to Support Regional Overdose Prevention
Abstract:
Drug overdose deaths, including those due to prescription opioids, represent a critical public health issue in the United States and worldwide. Artificial intelligence (AI) approaches have been developed and deployed to help prescribers assess a patient's risk for overdose-related death, but it is unknown whether public health experts can leverage similar predictions to make local resource allocation decisions more effectively. In this work, we evaluated how AI-based overdose risk assessment could be used to inform local public health decisions using a working prototype system. Experts from three health departments, of varying locations and sizes with respect to staff and population served, were receptive to the potential benefits of algorithmic risk prediction and of using AI-augmented visualization to connect across data sources. However, they also expressed concerns about whether the risk prediction model's formulation and underlying data would match the state of the overdose epidemic as it evolved in their specific locations. Our findings extend those of other studies on algorithmic systems in the public sector, and they present opportunities for future human-AI collaborative tools to support decision-making in local, time-varying contexts.

Authors:Venkatesh Sivaraman, Zexuan Li, Adam Perer
Title: Divisi: Interactive Search and Visualization for Scalable Exploratory Subgroup Analysis
Abstract:
Analyzing data subgroups is a common data science task to build intuition about a dataset and identify areas to improve model performance. However, subgroup analysis is prohibitively difficult in datasets with many features, and existing tools limit unexpected discoveries by relying on user-defined or static subgroups. We propose exploratory subgroup analysis as a set of tasks in which practitioners discover, evaluate, and curate interesting subgroups to build understanding about datasets and models. To support these tasks we introduce Divisi, an interactive notebook-based tool underpinned by a fast approximate subgroup discovery algorithm. Divisi's interface allows data scientists to interactively re-rank and refine subgroups and to visualize their overlap and coverage in the novel Subgroup Map. Through a think-aloud study with 13 practitioners, we find that Divisi can help uncover surprising patterns in data features and their interactions, and that it encourages more thorough exploration of subtypes in complex data.

Authors:Venkatesh Sivaraman, Anika Vaishampayan, Xiaotong Li, Brian R Buck, Ziyong Ma, Richard D Boyce, Adam Perer
Title: Tempo: Helping Data Scientists and Domain Experts Collaboratively Specify Predictive Modeling Tasks
Abstract:
Temporal predictive models have the potential to improve decisions in health care, public services, and other domains, yet they often fail to effectively support decision-makers. Prior literature shows that many misalignments between model behavior and decision-makers' expectations stem from issues of model specification, namely how, when, and for whom predictions are made. However, model specifications for predictive tasks are highly technical and difficult for non-data-scientist stakeholders to interpret and critique. To address this challenge we developed Tempo, an interactive system that helps data scientists and domain experts collaboratively iterate on model specifications. Using Tempo's simple yet precise temporal query language, data scientists can quickly prototype specifications with greater transparency about pre-processing choices. Moreover, domain experts can assess performance within data subgroups to validate that models behave as expected. Through three case studies, we demonstrate how Tempo helps multidisciplinary teams quickly prune infeasible specifications and identify more promising directions to explore.

Authors:Mina Huh, Dingzeyu Li, Kim Pimmel, Hijung Valentina Shin, Amy Pavel, Mira Dontcheva
Title: VideoDiff: Human-AI Video Co-Creation with Alternatives
Abstract:
To make an engaging video, people sequence interesting moments and add visuals such as B-rolls or text. While video editing requires time and effort, AI has recently shown strong potential to make editing easier through suggestions and automation. A key strength of generative models is their ability to quickly generate multiple variations, but when provided with many alternatives, creators struggle to compare them to find the best fit. We propose VideoDiff, an AI video editing tool designed for editing with alternatives. With VideoDiff, creators can generate and review multiple AI recommendations for each editing process: creating a rough cut, inserting B-rolls, and adding text effects. VideoDiff simplifies comparisons by aligning videos and highlighting differences through timelines, transcripts, and video previews. Creators have the flexibility to regenerate and refine AI suggestions as they compare alternatives. Our study participants (N=12) could easily compare and customize alternatives, creating more satisfying results.

Authors:Zihe Ran, Xiyu Li, Qing Xiao, Xianzhe Fan, Franklin Mingzhe Li, Yanyun Wang, Zhicong Lu
Title: How Users Who are Blind or Low Vision Play Mobile Games: Perceptions, Challenges, and Strategies
Abstract:
As blind and low-vision (BLV) players engage more deeply with games, accessibility features have become essential. While some research has explored tools and strategies to enhance game accessibility, the specific experiences of these players with mobile games remain underexamined. This study addresses this gap by investigating how BLV users experience mobile games with varying accessibility levels. Through interviews with 32 experienced BLV mobile players, we explore their perceptions, challenges, and strategies for engaging with mobile games. Our findings reveal that BLV players turn to mobile games to alleviate boredom, achieve a sense of accomplishment, and build social connections, but face barriers depending on the game's accessibility level. We also compare mobile games to other forms of gaming, highlighting the relative advantages of mobile games, such as the inherent accessibility of smartphones. This study contributes to understanding BLV mobile gaming experiences and provides insights for enhancing accessible mobile game design.

Authors:Ananya Bhattacharjee, Joseph Jay Williams, Miranda Beltzer, Jonah Meyerhoff, Harsh Kumar, Haochen Song, David C. Mohr, Alex Mariakakis, Rachel Kornfield
Title: Investigating the Role of Situational Disruptors in Engagement with Digital Mental Health Tools
Abstract:
Challenges in engagement with digital mental health (DMH) tools are commonly addressed through technical enhancements and algorithmic interventions. This paper shifts the focus towards the role of users' broader social context as a significant factor in engagement. Through an eight-week text messaging program aimed at enhancing psychological wellbeing, we recruited 20 participants to help us identify situational engagement disruptors (SEDs), including personal responsibilities, professional obligations, and unexpected health issues. In follow-up design workshops with 25 participants, we explored potential solutions that address such SEDs: prioritizing self-care through structured goal-setting, alternative framings for disengagement, and utilization of external resources. Our findings challenge conventional perspectives on engagement and offer actionable design implications for future DMH tools.

Authors:Elif Celen, Pol van Rijn, Harin Lee, Nori Jacoby
Title: Are Expressions for Music Emotions the Same Across Cultures?
Abstract:
Music evokes profound emotions, yet the universality of emotional descriptors across languages remains debated. A key challenge in cross-cultural research on music emotion is biased stimulus selection and manual curation of taxonomies, predominantly relying on Western music and languages. To address this, we propose a balanced experimental design with nine online experiments in Brazil, the US, and South Korea, involving N=672 participants. First, we sample a balanced set of popular music from these countries. Using an open-ended tagging pipeline, we then gather emotion terms to create culture-specific taxonomies. Finally, using these bottom-up taxonomies, participants rate emotions of each song. This allows us to map emotional similarities within and across cultures. Results show consistency in high arousal, high valence emotions but greater variability in others. Notably, machine translations were often inadequate to capture music-specific meanings. These findings together highlight the need for a domain-sensitive, open-ended, bottom-up emotion elicitation approach to reduce cultural biases in emotion research.

Authors:Tica Lin, Ruxun Xiang, Gardenia Liu, Divyanshu Tiwari, Meng-Chia Chiang, Chenjiayi Ye, Hanspeter Pfister, Chen Zhu-Tian
Title: SportsBuddy: Designing and Evaluating an AI-Powered Sports Video Storytelling Tool Through Real-World Deployment
Abstract:
Video storytelling is essential for sports performance analysis and fan engagement, enabling sports professionals and fans to effectively communicate and interpret the spatial and temporal dynamics of gameplay. Traditional methods rely on manual annotation and verbal explanations, placing significant demands on creators for video editing skills and on viewers for cognitive focus. However, these approaches are time-consuming and often struggle to accommodate individual needs. SportsBuddy addresses this gap with an intuitive, interactive video authoring tool. It combines player tracking, embedded interaction design, and timeline visualizations to seamlessly integrate narratives and visual cues within game contexts. This empowers users to effortlessly create context-driven video stories. Since its launch, over 150 sports users, including coaches, athletes, content creators, parents and fans, have utilized SportsBuddy to produce compelling game highlights for diverse use cases. User feedback highlights its accessibility and ease of use, making video storytelling and insight communication more attainable for diverse audiences. Case studies with collegiate teams and sports creators further demonstrate SportsBuddy's impact on enhancing coaching communication, game analysis, and fan engagement.

Authors:Lin-Ping Yuan, Feilin Han, Liwenhan Xie, Junjie Zhang, Jian Zhao, Huamin Qu
Title: "You'll Be Alice Adventuring in Wonderland!" Processes, Challenges, and Opportunities of Creating Animated Virtual Reality Stories
Abstract:
Animated virtual reality (VR) stories, combining the presence of VR and the artistry of computer animation, offer a compelling way to deliver messages and evoke emotions. Motivated by the growing demand for immersive narrative experiences, more creators are creating animated VR stories. However, a holistic understanding of their creation processes and challenges involved in crafting these stories is still limited. Based on semi-structured interviews with 21 animated VR story creators, we identify ten common stages in their end-to-end creation processes, ranging from idea generation to evaluation, which form diverse workflows that are story-driven or visual-driven. Additionally, we highlight nine unique issues that arise during the creation process, such as a lack of reference material for multi-element plots, the absence of specific functionalities for story integration, and inadequate support for audience evaluation. We compare the creation of animated VR stories to general XR applications and distill several future research opportunities.

Authors:Aadit Barua, Karim Benharrak, Meng Chen, Mina Huh, Amy Pavel
Title: Lotus: Creating Short Videos From Long Videos With Abstractive and Extractive Summarization
Abstract:
Short-form videos are popular on platforms like TikTok and Instagram as they quickly capture viewers' attention. Many creators repurpose their long-form videos to produce short-form videos, but creators report that planning, extracting, and arranging clips from long-form videos is challenging. Currently, creators make extractive short-form videos composed of existing long-form video clips or abstractive short-form videos by adding newly recorded narration to visuals. While extractive videos maintain the original connection between audio and visuals, abstractive videos offer flexibility in selecting content to be included in a shorter time. We present Lotus, a system that combines both approaches to balance preserving the original content with flexibility over the content. Lotus first creates an abstractive short-form video by generating both a short-form script and its corresponding speech, then matching long-form video clips to the generated narration. Creators can then add extractive clips with an automated method or Lotus's editing interface. Lotus's interface can be used to further refine the short-form video. We compare short-form videos generated by Lotus with those using an extractive baseline method. In our user study, we compare creating short-form videos using Lotus to participants' existing practice.

Authors:JaeWon Kim, Hyunsung Cho, Fannie Liu, Alexis Hiniker
Title: Social Media Should Feel Like Minecraft, Not Instagram: 3D Gamer Youth Visions for Meaningful Social Connections through Fictional Inquiry
Abstract:
We investigate youth visions for ideal remote social interactions, drawing on co-design interviews with 23 participants (aged 15-24) experienced with 3D gaming environments. Using a Fictional Inquiry (FI) method set in the Harry Potter universe, this research reveals that young people desire social media that functions more like immersive, navigable shared social spaces. Across these interviews, participants identified six key priorities for meaningful social connection over social media: intuitive social navigation, shared collaborative experiences, communal environments fostering close relationships, flexible self-presentation, intentional engagement, and playful social mechanics. We introduce the \textit{spatial integrity} framework, a set of four interrelated design principles: spatial presence, spatial composition, spatial configuration, and spatial depth. Together, these principles outline how online spaces can be designed to feel more like meaningful environments, spaces where relationships can grow through shared presence, movement, and intentional interaction. Participants also described the FI process itself as meaningful, not only for generating new ideas but for empowering them to imagine and shape the future of social media.

Authors:Weijen Chen, Qingyuan Gao, Zheng Hu, Kouta Minamizawa, Yun Suen Pai
Title: Living Bento: Heartbeat-Driven Noodles for Enriched Dining Dynamics
Abstract:
To enhance focused eating and dining socialization, previous Human-Food Interaction research has indicated that external devices can support these dining objectives and immersion. However, methods that focus on the food itself and the diners themselves have remained underdeveloped. In this study, we integrated biofeedback with food, utilizing diners' heart rates as a source of the food's appearance to promote focused eating and dining socialization. By employing LED lights, we dynamically displayed diners' real-time physiological signals through the transparency of the food. Results revealed significant effects on various aspects of dining immersion, such as awareness perceptions, attractiveness, attentiveness to each bite, and emotional bonds with the food. Furthermore, to promote dining socialization, we established a "Sharing Bio-Sync Food" dining system to strengthen emotional connections between diners. Based on these findings, we developed tableware that integrates biofeedback into the culinary experience.

Authors:David Melhart, Matthew Barthet, Georgios N. Yannakakis
Title: Can Large Language Models Capture Video Game Engagement?
Abstract:
Can out-of-the-box pretrained Large Language Models (LLMs) detect human affect successfully when observing a video? To address this question, for the first time, we evaluate comprehensively the capacity of popular LLMs to annotate and successfully predict continuous affect annotations of videos when prompted by a sequence of text and video frames in a multimodal fashion. Particularly in this paper, we test LLMs' ability to correctly label changes of in-game engagement in 80 minutes of annotated videogame footage from 20 first-person shooter games of the GameVibe corpus. We run over 2,400 experiments to investigate the impact of LLM architecture, model size, input modality, prompting strategy, and ground truth processing method on engagement prediction. Our findings suggest that while LLMs rightfully claim human-like performance across multiple domains, they generally fall behind capturing continuous experience annotations provided by humans. We examine some of the underlying causes for the relatively poor overall performance, highlight the cases where LLMs exceed expectations, and draw a roadmap for the further exploration of automated emotion labelling via LLMs.

Authors:Hailong Liu, Yang Li, Toshihiro Hiraoka, Takahiro Wada
Title: Data-driven Causal Discovery for Pedestrians-Autonomous Personal Mobility Vehicle Interactions with eHMIs: From Psychological States to Walking Behaviors
Abstract:
Autonomous personal mobility vehicle (APMV) is a new type of small smart vehicle designed for mixed-traffic environments, including interactions with pedestrians. To enhance the interaction experience between pedestrians and APMVs and to prevent potential risks, it is crucial to investigate pedestrians' walking behaviors when interacting with APMVs and to understand the psychological processes underlying these behaviors. This study aims to investigate the causal relationships between subjective evaluations of pedestrians and their walking behaviors during interactions with an APMV equipped with an external human-machine interface (eHMI). An experiment of pedestrian-APMV interaction was conducted with 42 pedestrian participants, in which various eHMIs on the APMV were designed to induce participants to experience different levels of subjective evaluations and generate the corresponding walking behaviors. Based on the hypothesized model of the pedestrian's cognition-decision-behavior process, the results of causal discovery align with the previously proposed model. Furthermore, this study further analyzes the direct and total causal effects of each factor and investigates the causal processes affecting several important factors in the field of human-vehicle interaction, such as situation awareness, trust in vehicle, risk perception, hesitation in decision making, and walking behaviors.

Authors:Hailong Liu, Zhe Zeng, Takahiro Wada
Title: Where Do Passengers Gaze? Impact of Passengers' Personality Traits on Their Gaze Pattern Toward Pedestrians During APMV-Pedestrian Interactions with Diverse eHMIs
Abstract:
Autonomous Personal Mobility Vehicles (APMVs) are designed to address the ``last-mile'' transportation challenge for everyone. When an APMV encounters a pedestrian, it uses an external Human-Machine Interface (eHMI) to negotiate road rights. Through this interaction, passengers also engage with the process. This study examines passengers' gaze behavior toward pedestrians during such interactions, focusing on whether different eHMI designs influence gaze patterns based on passengers' personality traits. The results indicated that when using a visual-based eHMI, passengers often struggled to perceive the communication content. Consequently, passengers with higher Neuroticism scores, who were more sensitive to communication details, might seek cues from pedestrians' reactions. In addition, a multimodal eHMI (visual and voice) using neutral voice did not significantly affect the gaze behavior of passengers toward pedestrians, regardless of personality traits. In contrast, a multimodal eHMI using affective voice encouraged passengers with high Openness to Experience scores to focus on pedestrians' heads. In summary, this study revealed how different eHMI designs influence passengers' gaze behavior and highlighted the effects of personality traits on their gaze patterns toward pedestrians, providing new insights for personalized eHMI designs.

Authors:Dora Zhao, Diyi Yang, Michael S. Bernstein
Title: Mapping the Spiral of Silence: Surveying Unspoken Opinions in Online Communities
Abstract:
We often treat social media as a lens onto society. How might that lens be distorting the actual popularity of political and social viewpoints? In this paper, we examine the difference between the viewpoints publicly posted in a community and the privately surveyed viewpoints of community members, contributing a measurement of a theory called the "spiral of silence." This theory observes that people are less likely to voice their opinion when they believe they are in the minority--leading to a spiral where minority opinions are less likely to be shared, so they appear even further in the minority, and become even less likely to be shared. We surveyed active members of politically oriented Reddit communities to gauge their willingness to post on contentious topics, yielding 627 responses from 108 participants about 11 topics and 33 subreddits. We find that 72.6% of participants who perceive themselves in the minority remain silent, and are only half as likely to post their viewpoint compared to those who believe their opinion is in the majority. Communities perceived as being more inclusive reduce the magnitude of this effect. These results emphasize how far out of step the opinions we see online may be with the population they purport to represent.

Authors:Liuging Chen, Yaxuan Song, Jia Guo, Lingyun Sun, Peter Childs, Yuan Yin
Title: How Generative AI supports human in conceptual design
Abstract:
Generative Artificial Intelligence (Generative AI) is a collection of AI technologies that can generate new information such as texts and images. With its strong capabilities, Generative AI has been actively studied in creative design processes. However, limited studies have explored the roles of humans and Generative AI in conceptual design processes, leaving a gap for human-AI collaboration investigation. To address this gap, this study uncovers the contributions of different Generative AI technologies in assisting humans in the conceptual design process. Novice designers completed two design tasks with or without the assistance of Generative AI. Results revealed that Generative AI primarily assists humans in problem definition and idea generation stages, while idea selection and evaluation remain predominantly human-led. Additionally, with Generative AI assistance, the idea selection and evaluation stages were further enhanced. Based on the findings, we discuss the role of Generative AI in human-AI collaboration and implications for enhancing future conceptual design support with Generative AI assistance.

Authors:Xingyu Xiao, Peng Chen, Qianqian Jia, Jiejuan Tong, Jingang Liang, Haitao Wang
Title: A Dynamic and High-Precision Method for Scenario-Based HRA Synthetic Data Collection in Multi-Agent Collaborative Environments Driven by LLMs
Abstract:
HRA (Human Reliability Analysis) data is crucial for advancing HRA methodologies. however, existing data collection methods lack the necessary granularity, and most approaches fail to capture dynamic features. Additionally, many methods require expert knowledge as input, making them time-consuming and labor-intensive. To address these challenges, we propose a new paradigm for the automated collection of HRA data. Our approach focuses on key indicators behind human error, specifically measuring workload in collaborative settings. This study introduces a novel, scenario-driven method for workload estimation, leveraging fine-tuned large language models (LLMs). By training LLMs on real-world operational data from high-temperature gas-cooled reactors (HTGRs), we simulate human behavior and cognitive load in real time across various collaborative scenarios. The method dynamically adapts to changes in operator workload, providing more accurate, flexible, and scalable workload estimates. The results demonstrate that the proposed WELLA (Workload Estimation with LLMs and Agents) outperforms existing commercial LLM-based methods in terms of prediction accuracy.

Authors:Parth Ganeriwala, Michael Matessa, Siddhartha Bhattacharyya, Randolph M. Jones, Jennifer Davis, Parneet Kaur, Simone Fulvio Rollini, Natasha Neogi
Title: Design and Validation of Learning Aware HMI For Learning-Enabled Increasingly Autonomous Systems
Abstract:
With the rapid advancements in Artificial Intelligence (AI), autonomous agents are increasingly expected to manage complex situations where learning-enabled algorithms are vital. However, the integration of these advanced algorithms poses significant challenges, especially concerning safety and reliability. This research emphasizes the importance of incorporating human-machine collaboration into the systems engineering process to design learning-enabled increasingly autonomous systems (LEIAS). Our proposed LEIAS architecture emphasizes communication representation and pilot preference learning to boost operational safety. Leveraging the Soar cognitive architecture, the system merges symbolic decision logic with numeric decision preferences enhanced through reinforcement learning. A core aspect of this approach is transparency; the LEIAS provides pilots with a comprehensive, interpretable view of the system's state, encompassing detailed evaluations of sensor reliability, including GPS, IMU, and LIDAR data. This multi-sensor assessment is critical for diagnosing discrepancies and maintaining trust. Additionally, the system learns and adapts to pilot preferences, enabling responsive, context-driven decision-making. Autonomy is incrementally escalated based on necessity, ensuring pilots retain control in standard scenarios and receive assistance only when required. Simulation studies conducted in Microsoft's XPlane simulation environment to validate this architecture's efficacy, showcasing its performance in managing sensor anomalies and enhancing human-machine collaboration, ultimately advancing safety in complex operational environments.

Authors:Beining Cao, Xiaowei Jiang, Daniel Leong, Charlie Li-Ting Tsai, Yu-Cheng Chang, Thomas Do, Chin-Teng
Title: EMD-Fuzzy: An Empirical Mode Decomposition Based Fuzzy Model for Cross-Stimulus Transfer Learning of SSVEP
Abstract:
The Brain-Computer Interface (BCI) enables direct brain-to-device communication, with the Steady-State Visual Evoked Potential (SSVEP) paradigm favored for its stability and high accuracy across various fields. In SSVEP BCI systems, supervised learning models significantly enhance performance over unsupervised models, achieving higher accuracy in less time. However, prolonged data collection can cause user fatigue and even trigger photosensitive epilepsy, creating a negative user experience. Thus, reducing calibration time is crucial. To address this, Cross-Stimulus transfer learning (CSTL) can shorten calibration by utilizing only partial frequencies. Traditional CSTL methods, affected by time-domain impulse response variations, are suitable only for adjacent frequency transfers, limiting their general applicability. We introduce an Empirical Mode Decomposition (EMD) Based Fuzzy Model (EMD-Fuzzy), which employs EMD to extract crucial frequency information and achieves stimulus transfer in the frequency domain through Fast Fourier Transform (FFT) to mitigate time-domain differences. Combined with a Fuzzy Decoder that uses fuzzy logic for representation learning, our approach delivers promising preliminary results in offline tests and state-of-the-art performance. With only 4 frequencies, our method achieved an accuracy of 82.75% (16.30%) and an information transfer rate (ITR) of 186.56 (52.09) bits/min on the 40-target Benchmark dataset. In online tests, our method demonstrates robust efficacy, achieving an averaged accuracy of 86.30% (6.18%) across 7 subjects. This performance underscores the effectiveness of integrating EMD and fuzzy logic into EEG decoding for CSTL and highlights our method's potential in real-time applications where consistent and reliable decoding is crucial.

Authors:Nuwan T. Attygalle, Matjaž Kljun, Aaron Quigley, Klen čOpič Pucihar, Jens Grubert, Verena Biener, Luis A. Leiva, Juri Yoneyama, Alice Toniolo, Angela Miguel, Hirokazu Kato, Maheshya Weerasinghe
Title: Text-to-Image Generation for Vocabulary Learning Using the Keyword Method
Abstract:
The 'keyword method' is an effective technique for learning vocabulary of a foreign language. It involves creating a memorable visual link between what a word means and what its pronunciation in a foreign language sounds like in the learner's native language. However, these memorable visual links remain implicit in the people's mind and are not easy to remember for a large set of words. To enhance the memorisation and recall of the vocabulary, we developed an application that combines the keyword method with text-to-image generators to externalise the memorable visual links into visuals. These visuals represent additional stimuli during the memorisation process. To explore the effectiveness of this approach we first run a pilot study to investigate how difficult it is to externalise the descriptions of mental visualisations of memorable links, by asking participants to write them down. We used these descriptions as prompts for text-to-image generator (DALL-E2) to convert them into images and asked participants to select their favourites. Next, we compared different text-to-image generators (DALL-E2, Midjourney, Stable and Latent Diffusion) to evaluate the perceived quality of the generated images by each. Despite heterogeneous results, participants mostly preferred images generated by DALL-E2, which was used also for the final study. In this study, we investigated whether providing such images enhances the retention of vocabulary being learned, compared to the keyword method only. Our results indicate that people did not encounter difficulties describing their visualisations of memorable links and that providing corresponding images significantly improves memory retention.

Authors:Yifan Li, Masaaki Fukumoto, Mohamed Kari, Tomoyuki Yokota, Takao Someya, Yoshihiro Kawahara, Ryo Takahashi
Title: Demo of picoRing mouse: an ultra-low-powered wireless mouse ring with ring-to-wristband coil-based impedance sensing
Abstract:
Wireless mouse rings offer subtle, reliable pointing interactions for wearable computing platforms, but the small battery below 27 mAh in the miniature rings restricts the ring's continuous lifespan to just 1-2 hours due to the power consumption of current low-powered wireless communication like BLE. However, the picoRing mouse addresses this by enabling continuous ring-based mouse interaction with ultra-low-powered ring-to-wristband wireless communication through a coil-based impedance sensing method called semi-passive inductive telemetry. This allows a wristband coil to capture a unique frequency response of a nearby ring coil via sensitive inductive coupling, converting the user's mouse input into the unique frequency response via an 820 uW mouse-driven modulation module. Thus, the continuous use of picoRing mouse can potentially last over 92 hours on a single charge of a 20 mAh battery while supporting subtle scrolling and pressing interactions.

Authors:Zichen Chen, Yunhao Luo, Misha Sra
Title: Engaging with AI: How Interface Design Shapes Human-AI Collaboration in High-Stakes Decision-Making
Abstract:
As reliance on AI systems for decision-making grows, it becomes critical to ensure that human users can appropriately balance trust in AI suggestions with their own judgment, especially in high-stakes domains like healthcare. However, human + AI teams have been shown to perform worse than AI alone, with evidence indicating automation bias as the reason for poorer performance, particularly because humans tend to follow AI's recommendations even when they are incorrect. In many existing human + AI systems, decision-making support is typically provided in the form of text explanations (XAI) to help users understand the AI's reasoning. Since human decision-making often relies on System 1 thinking, users may ignore or insufficiently engage with the explanations, leading to poor decision-making. Previous research suggests that there is a need for new approaches that encourage users to engage with the explanations and one proposed method is the use of cognitive forcing functions (CFFs). In this work, we examine how various decision-support mechanisms impact user engagement, trust, and human-AI collaborative task performance in a diabetes management decision-making scenario. In a controlled experiment with 108 participants, we evaluated the effects of six decision-support mechanisms split into two categories of explanations (text, visual) and four CFFs. Our findings reveal that mechanisms like AI confidence levels, text explanations, and performance visualizations enhanced human-AI collaborative task performance, and improved trust when AI reasoning clues were provided. Mechanisms like human feedback and AI-driven questions encouraged deeper reflection but often reduced task performance by increasing cognitive effort, which in turn affected trust. Simple mechanisms like visual explanations had little effect on trust, highlighting the importance of striking a balance in CFF and XAI design.

Authors:Maciej Grzeszczuk, Kinga Skorupska, Grzegorz Marcin Wojcik
Title: Bridging the Digital Divide: Approach to Documenting Early Computing Artifacts Using Established Standards for Cross-Collection Knowledge Integration Ontology
Abstract:
In this paper we address the challenges of documenting early digital artifacts in collections built to offer historical context for future generations. Through insights from active community members (N=20), we examine current archival needs and obstacles. We assess the potential of the CIDOC Conceptual Reference Model (CRM) for categorizing fragmented digital data. Despite its complexity, CIDOC-CRM proves logical, human-readable, and adaptable, enabling archivists to select minimal yet effective building blocks set to empower community-led heritage projects.

Authors:Gaole He, Patrick Hemmer, Michael Vössing, Max Schemmer, Ujwal Gadiraju
Title: Fine-Grained Appropriate Reliance: Human-AI Collaboration with a Multi-Step Transparent Decision Workflow for Complex Task Decomposition
Abstract:
In recent years, the rapid development of AI systems has brought about the benefits of intelligent services but also concerns about security and reliability. By fostering appropriate user reliance on an AI system, both complementary team performance and reduced human workload can be achieved. Previous empirical studies have extensively analyzed the impact of factors ranging from task, system, and human behavior on user trust and appropriate reliance in the context of one-step decision making. However, user reliance on AI systems in tasks with complex semantics that require multi-step workflows remains under-explored. Inspired by recent work on task decomposition with large language models, we propose to investigate the impact of a novel Multi-Step Transparent (MST) decision workflow on user reliance behaviors. We conducted an empirical study (N = 233) of AI-assisted decision making in composite fact-checking tasks (i.e., fact-checking tasks that entail multiple sub-fact verification steps). Our findings demonstrate that human-AI collaboration with an MST decision workflow can outperform one-step collaboration in specific contexts (e.g., when advice from an AI system is misleading). Further analysis of the appropriate reliance at fine-grained levels indicates that an MST decision workflow can be effective when users demonstrate a relatively high consideration of the intermediate steps. Our work highlights that there is no one-size-fits-all decision workflow that can help obtain optimal human-AI collaboration. Our insights help deepen the understanding of the role of decision workflows in facilitating appropriate reliance. We synthesize important implications for designing effective means to facilitate appropriate reliance on AI systems in composite tasks, positioning opportunities for the human-centered AI and broader HCI communities.

Authors:Andrew Jelson, Daniel Manesh, Alice Jang, Daniel Dunlap, Young-Ho Kim, Sang Won Lee
Title: An Empirical Study to Understand How Students Use ChatGPT for Writing Essays
Abstract:
As large language models (LLMs) advance and become widespread, students increasingly turn to systems like ChatGPT for assistance with writing tasks. Educators are concerned with students' usage of ChatGPT beyond cheating; using ChatGPT may reduce their critical engagement with writing, hindering students' learning processes. The negative or positive impact of using LLM-powered tools for writing will depend on how students use them; however, how students use ChatGPT remains largely unknown, resulting in a limited understanding of its impact on learning. To better understand how students use these tools, we conducted an online study $(n=70)$ where students were given an essay-writing task using a custom platform we developed to capture the queries they made to ChatGPT. To characterize their ChatGPT usage, we categorized each of the queries students made to ChatGPT. We then analyzed the relationship between ChatGPT usage and a variety of other metrics, including students' self-perception, attitudes towards AI, and the resulting essay itself. We found that factors such as gender, race, and perceived self-efficacy can help predict different AI usage patterns. Additionally, we found that different usage patterns were associated with varying levels of enjoyment and perceived ownership over the essay. The results of this study contribute to discussions about how writing education should incorporate generative AI-powered tools in the classroom.

Authors:JaeWon Kim, Thea Klein-Balajee, Ryan M. Kelly, Alexis Hiniker
Title: Discord's Design Encourages "Third Place" Social Media Experiences
Abstract:
In light of the diminishing presence of physical third places -- informal gathering spaces essential for social connection -- this study explores how the social media platform Discord fosters third-place experiences. Drawing on Oldenburg's conceptual framework, we analyze how Discord's design elements support the creation of virtual third places that foster both dyadic and community-based relationships. Through 25 semi-structured interviews with active Discord users, we identified 21 design elements aligned with Oldenburg's third-place characteristics. These elements cluster around four core principles: providing themed spaces for repeated interactions, supporting user autonomy and customization, facilitating mutually engaging activities, and enabling casual, low-pressure interactions. This work contributes to understanding how intentional platform design can cultivate virtual spaces that support meaningful social connections. The findings have implications for designing future social technologies that can help address growing concerns about social isolation in an increasingly digital world.

Authors:Shi Qiu, Binzhu Xie, Qixuan Liu, Pheng-Ann Heng
Title: Creating Virtual Environments with 3D Gaussian Splatting: A Comparative Study
Abstract:
3D Gaussian Splatting (3DGS) has recently emerged as an innovative and efficient 3D representation technique. While its potential for extended reality (XR) applications is frequently highlighted, its practical effectiveness remains underexplored. In this work, we examine three distinct 3DGS-based approaches for virtual environment (VE) creation, leveraging their unique strengths for efficient and visually compelling scene representation. By conducting a comparable study, we evaluate the feasibility of 3DGS in creating immersive VEs, identify its limitations in XR applications, and discuss future research and development opportunities.

Authors:Leona Holloway, Kim Marriott, Matthew Butler, Samuel Reinders
Title: 3D Printed Maps and Icons for Inclusion: Testing in the Wild by People who are Blind or have Low Vision
Abstract:
The difficulty and consequent fear of travel is one of the most disabling consequences of blindness and severe vision impairment, affecting confidence and quality of life. Traditional tactile graphics are vital in the Orientation and Mobility training process, however 3D printing may have the capacity to enable production of more meaningful and inclusive maps. This study explored the use of 3D printed maps on site at a public event to examine their suitability and to identify guidelines for the design of future 3D maps. An iterative design process was used in the production of the 3D maps, with feedback from visitors who are blind or have low vision informing the recommendations for their design and use. For example, it was found that many representational 3D icons could be recognised by touch without the need for a key and that such a map helped form mental models of the event space. Complex maps, however, require time to explore and should be made available before an event or at the entrance in a comfortable position. The maps were found to support the orientation and mobility process, and importantly to also promote a positive message about inclusion and accessibility.

Authors:Atharv Belsare, Zohre Karimi, Connor Mattson, Daniel S. Brown
Title: Toward Zero-Shot User Intent Recognition in Shared Autonomy
Abstract:
A fundamental challenge of shared autonomy is to use high-DoF robots to assist, rather than hinder, humans by first inferring user intent and then empowering the user to achieve their intent. Although successful, prior methods either rely heavily on a priori knowledge of all possible human intents or require many demonstrations and interactions with the human to learn these intents before being able to assist the user. We propose and study a zero-shot, vision-only shared autonomy (VOSA) framework designed to allow robots to use end-effector vision to estimate zero-shot human intents in conjunction with blended control to help humans accomplish manipulation tasks with unknown and dynamically changing object locations. To demonstrate the effectiveness of our VOSA framework, we instantiate a simple version of VOSA on a Kinova Gen3 manipulator and evaluate our system by conducting a user study on three tabletop manipulation tasks. The performance of VOSA matches that of an oracle baseline model that receives privileged knowledge of possible human intents while also requiring significantly less effort than unassisted teleoperation. In more realistic settings, where the set of possible human intents is fully or partially unknown, we demonstrate that VOSA requires less human effort and time than baseline approaches while being preferred by a majority of the participants. Our results demonstrate the efficacy and efficiency of using off-the-shelf vision algorithms to enable flexible and beneficial shared control of a robot manipulator. Code and videos available here: https://sites.google.com/view/zeroshot-sharedautonomy/home.

Authors:Yaxin Hu, Anjun Zhu, Catalina L. Toma, Bilge Mutlu
Title: Designing Telepresence Robots to Support Place Attachment
Abstract:
People feel attached to places that are meaningful to them, which psychological research calls "place attachment." Place attachment is associated with self-identity, self-continuity, and psychological well-being. Even small cues, including videos, images, sounds, and scents, can facilitate feelings of connection and belonging to a place. Telepresence robots that allow people to see, hear, and interact with a remote place have the potential to establish and maintain a connection with places and support place attachment. In this paper, we explore the design space of robotic telepresence to promote place attachment, including how users might be guided in a remote place and whether they experience the environment individually or with others. We prototyped a telepresence robot that allows one or more remote users to visit a place and be guided by a local human guide or a conversational agent. Participants were 38 university alumni who visited their alma mater via the telepresence robot. Our findings uncovered four distinct user personas in the remote experience and highlighted the need for social participation to enhance place attachment. We generated design implications for future telepresence robot design to support people's connections with places of personal significance.

Authors:Niels Justesen, Maria Kaselimi, Sam Snodgrass, Miruna Vozaru, Matthew Schlegel, Jonas Wingren, Gabriella A. B. Barros, Tobias Mahlmann, Shyam Sudhakaran, Wesley Kerr, Albert Wang, Christoffer Holmgård, Georgios N. Yannakakis, Sebastian Risi, Julian Togelius
Title: Human-like Bots for Tactical Shooters Using Compute-Efficient Sensors
Abstract:
Artificial intelligence (AI) has enabled agents to master complex video games, from first-person shooters like Counter-Strike to real-time strategy games such as StarCraft II and racing games like Gran Turismo. While these achievements are notable, applying these AI methods in commercial video game production remains challenging due to computational constraints. In commercial scenarios, the majority of computational resources are allocated to 3D rendering, leaving limited capacity for AI methods, which often demand high computational power, particularly those relying on pixel-based sensors. Moreover, the gaming industry prioritizes creating human-like behavior in AI agents to enhance player experience, unlike academic models that focus on maximizing game performance. This paper introduces a novel methodology for training neural networks via imitation learning to play a complex, commercial-standard, VALORANT-like 2v2 tactical shooter game, requiring only modest CPU hardware during inference. Our approach leverages an innovative, pixel-free perception architecture using a small set of ray-cast sensors, which capture essential spatial information efficiently. These sensors allow AI to perform competently without the computational overhead of traditional methods. Models are trained to mimic human behavior using supervised learning on human trajectory data, resulting in realistic and engaging AI agents. Human evaluation tests confirm that our AI agents provide human-like gameplay experiences while operating efficiently under computational constraints. This offers a significant advancement in AI model development for tactical shooter games and possibly other genres.

Authors:Yuanbo Hou, Qiaoqiao Ren, Wenwu Wang, Dick Botteldooren
Title: Sound-Based Recognition of Touch Gestures and Emotions for Enhanced Human-Robot Interaction
Abstract:
Emotion recognition and touch gesture decoding are crucial for advancing human-robot interaction (HRI), especially in social environments where emotional cues and tactile perception play important roles. However, many humanoid robots, such as Pepper, Nao, and Furhat, lack full-body tactile skin, limiting their ability to engage in touch-based emotional and gesture interactions. In addition, vision-based emotion recognition methods usually face strict GDPR compliance challenges due to the need to collect personal facial data. To address these limitations and avoid privacy issues, this paper studies the potential of using the sounds produced by touching during HRI to recognise tactile gestures and classify emotions along the arousal and valence dimensions. Using a dataset of tactile gestures and emotional interactions from 28 participants with the humanoid robot Pepper, we design an audio-only lightweight touch gesture and emotion recognition model with only 0.24M parameters, 0.94MB model size, and 0.7G FLOPs. Experimental results show that the proposed sound-based touch gesture and emotion recognition model effectively recognises the arousal and valence states of different emotions, as well as various tactile gestures, when the input audio length varies. The proposed model is low-latency and achieves similar results as well-known pretrained audio neural networks (PANNs), but with much smaller FLOPs, parameters, and model size.

Authors:Arpit Narechania, Alex Endert, Atanu R Sinha
Title: Agentic Enterprise: AI-Centric User to User-Centric AI
Abstract:
After a very long winter, the Artificial Intelligence (AI) spring is here. Or, so it seems over the last three years. AI has the potential to impact many areas of human life - personal, social, health, education, professional. In this paper, we take a closer look at the potential of AI for Enterprises, where decision-making plays a crucial and repeated role across functions, tasks, and operations. We consider Agents imbued with AI as means to increase decision-productivity of enterprises. We highlight six tenets for Agentic success in enterprises, by drawing attention to what the current, AI-Centric User paradigm misses, in the face of persistent needs of and usefulness for Enterprise Decision-Making. In underscoring a shift to User-Centric AI, we offer six tenets and promote market mechanisms for platforms, aligning the design of AI and its delivery by Agents to the cause of enterprise users.

Authors:Jingshu Li, Zicheng Zhu, Renwen Zhang, Yi-Chieh Lee
Title: Exploring the Effects of Chatbot Anthropomorphism and Human Empathy on Human Prosocial Behavior Toward Chatbots
Abstract:
Chatbots are increasingly integrated into people's lives and are widely used to help people. Recently, there has also been growing interest in the reverse direction-humans help chatbots-due to a wide range of benefits including better chatbot performance, human well-being, and collaborative outcomes. However, little research has explored the factors that motivate people to help chatbots. To address this gap, we draw on the Computers Are Social Actors (CASA) framework to examine how chatbot anthropomorphism-including human-like identity, emotional expression, and non-verbal expression-influences human empathy toward chatbots and their subsequent prosocial behaviors and intentions. We also explore people's own interpretations of their prosocial behaviors toward chatbots. We conducted an online experiment (N = 244) in which chatbots made mistakes in a collaborative image labeling task and explained the reasons to participants. We then measured participants' prosocial behaviors and intentions toward the chatbots. Our findings revealed that human identity and emotional expression of chatbots increased participants' prosocial behavior and intention toward chatbots, with empathy mediating these effects. Qualitative analysis further identified two motivations for participants' prosocial behaviors: empathy for the chatbot and perceiving the chatbot as human-like. We discuss the implications of these results for understanding and promoting human prosocial behaviors toward chatbots.

Authors:Runlong Ye, Zeling Zhang, Boushra Almazroua, Michael Liut
Title: Beyond Autocomplete: Designing CopilotLens Towards Transparent and Explainable AI Coding Agents
Abstract:
AI-powered code assistants are widely used to generate code completions, significantly boosting developer productivity. However, these tools typically present suggestions without explaining their rationale, leaving their decision-making process inscrutable. This opacity hinders developers' ability to critically evaluate outputs, form accurate mental models, and calibrate trust in the system. To address this, we introduce CopilotLens, a novel interactive framework that reframes code completion from a simple suggestion into a transparent, explainable interaction. CopilotLens operates as an explanation layer that reconstructs the AI agent's "thought process" through a dynamic, two-level interface. The tool aims to surface both high-level code changes and the specific codebase context influences. This paper presents the design and rationale of CopilotLens, offering a concrete framework and articulating expectations on deepening comprehension and calibrated trust, which we plan to evaluate in subsequent work.

Authors:Iosif Tsangko, Andreas Triantafyllopoulos, Adem Abdelmoula, Adria Mallol-Ragolta, Bjoern W. Schuller
Title: Reading Smiles: Proxy Bias in Foundation Models for Facial Emotion Recognition
Abstract:
Foundation Models (FMs) are rapidly transforming Affective Computing (AC), with Vision Language Models (VLMs) now capable of recognising emotions in zero shot settings. This paper probes a critical but underexplored question: what visual cues do these models rely on to infer affect, and are these cues psychologically grounded or superficially learnt? We benchmark varying scale VLMs on a teeth annotated subset of AffectNet dataset and find consistent performance shifts depending on the presence of visible teeth. Through structured introspection of, the best-performing model, i.e., GPT-4o, we show that facial attributes like eyebrow position drive much of its affective reasoning, revealing a high degree of internal consistency in its valence-arousal predictions. These patterns highlight the emergent nature of FMs behaviour, but also reveal risks: shortcut learning, bias, and fairness issues especially in sensitive domains like mental health and education.

Authors:Xingyu Xiao, Jiejuan Tong, Jun Sun, Zhe Sui, Jingang Liang, Hongru Zhao, Jun Zhao, Haitao Wang
Title: AutoGraph: A Knowledge-Graph Framework for Modeling Interface Interaction and Automating Procedure Execution in Digital Nuclear Control Rooms
Abstract:
Digitalization in nuclear power plant (NPP) control rooms is reshaping how operators interact with procedures and interface elements. However, existing computer-based procedures (CBPs) often lack semantic integration with human-system interfaces (HSIs), limiting their capacity to support intelligent automation and increasing the risk of human error, particularly under dynamic or complex operating conditions. In this study, we present AutoGraph, a knowledge-graph-based framework designed to formalize and automate procedure execution in digitalized NPP environments.AutoGraph integrates (1) a proposed HTRPM tracking module to capture operator interactions and interface element locations; (2) an Interface Element Knowledge Graph (IE-KG) encoding spatial, semantic, and structural properties of HSIs; (3) automatic mapping from textual procedures to executable interface paths; and (4) an execution engine that maps textual procedures to executable interface paths. This enables the identification of cognitively demanding multi-action steps and supports fully automated execution with minimal operator input. We validate the framework through representative control room scenarios, demonstrating significant reductions in task completion time and the potential to support real-time human reliability assessment. Further integration into dynamic HRA frameworks (e.g., COGMIF) and real-time decision support systems (e.g., DRIF) illustrates AutoGraph extensibility in enhancing procedural safety and cognitive performance in complex socio-technical systems.

Authors:MH Farhadi, Ali Rabiee, Sima Ghafoori, Anna Cetera, Wei Xu, Reza Abiri
Title: Human-Centered Shared Autonomy for Motor Planning, Learning, and Control Applications
Abstract:
With recent advancements in AI and computational tools, intelligent paradigms have emerged to enhance fields like shared autonomy and human-machine teaming in healthcare. Advanced AI algorithms (e.g., reinforcement learning) can autonomously make decisions to achieve planning and motion goals. However, in healthcare, where human intent is crucial, fully independent machine decisions may not be ideal. This chapter presents a comprehensive review of human-centered shared autonomy AI frameworks, focusing on upper limb biosignal-based machine interfaces and associated motor control systems, including computer cursors, robotic arms, and planar platforms. We examine motor planning, learning (rehabilitation), and control, covering conceptual foundations of human-machine teaming in reach-and-grasp tasks and analyzing both theoretical and practical implementations. Each section explores how human and machine inputs can be blended for shared autonomy in healthcare applications. Topics include human factors, biosignal processing for intent detection, shared autonomy in brain-computer interfaces (BCI), rehabilitation, assistive robotics, and Large Language Models (LLMs) as the next frontier. We propose adaptive shared autonomy AI as a high-performance paradigm for collaborative human-AI systems, identify key implementation challenges, and outline future directions, particularly regarding AI reasoning agents. This analysis aims to bridge neuroscientific insights with robotics to create more intuitive, effective, and ethical human-machine teaming frameworks.

Authors:Renee Sirbu, Jessica Morley, Tyler Schroder, Raghavendra Pradyumna Pothukuchi, Muhammed Ugur, Abhishek Bhattacharjee, Luciano Floridi
Title: Regulating Next-Generation Implantable Brain-Computer Interfaces: Recommendations for Ethical Development and Implementation
Abstract:
Brain-computer interfaces offer significant therapeutic opportunities for a variety of neurophysiological and neuropsychiatric disorders and may perhaps one day lead to augmenting the cognition and decision-making of the healthy brain. However, existing regulatory frameworks designed for implantable medical devices are inadequate to address the unique ethical, legal, and social risks associated with next-generation networked brain-computer interfaces. In this article, we make nine recommendations to support developers in the design of BCIs and nine recommendations to support policymakers in the application of BCIs, drawing insights from the regulatory history of IMDs and principles from AI ethics. We begin by outlining the historical development of IMDs and the regulatory milestones that have shaped their oversight. Next, we summarize similarities between IMDs and emerging implantable BCIs, identifying existing provisions for their regulation. We then use two case studies of emerging cutting-edge BCIs, the HALO and SCALO computer systems, to highlight distinctive features in the design and application of next-generation BCIs arising from contemporary chip architectures, which necessitate reevaluating regulatory approaches. We identify critical ethical considerations for these BCIs, including unique conceptions of autonomy, identity, and mental privacy. Based on these insights, we suggest potential avenues for the ethical regulation of BCIs, emphasizing the importance of interdisciplinary collaboration and proactive mitigation of potential harms. The goal is to support the responsible design and application of new BCIs, ensuring their safe and ethical integration into medical practice.

Authors:Saitarun Nadipineni, Chapa Sirithunge, Yue Xie, Fumiya Iida, Thilina Dulantha Lalitharatne
Title: Auditory-Tactile Congruence for Synthesis of Adaptive Pain Expressions in RoboPatients
Abstract:
Misdiagnosis can lead to delayed treatments and harm. Robotic patients offer a controlled way to train and evaluate clinicians in rare, subtle, or complex cases, reducing diagnostic errors. We present RoboPatient, a medical robotic simulator aimed at multimodal pain synthesis based on haptic and auditory feedback during palpation-based training scenarios. The robopatient functions as an adaptive intermediary, capable of synthesizing plausible pain expressions vocal and facial in response to tactile stimuli generated during palpation. Using an abdominal phantom, robopatient captures and processes haptic input via an internal palpation-to-pain mapping model. To evaluate perceptual congruence between palpation and the corresponding auditory output, we conducted a study involving 7680 trials across 20 participants, where they evaluated pain intensity through sound. Results show that amplitude and pitch significantly influence agreement with the robot's pain expressions, irrespective of pain sounds. Stronger palpation forces elicited stronger agreement, aligning with psychophysical patterns. The study revealed two key dimensions: pitch and amplitude are central to how people perceive pain sounds, with pitch being the most influential cue. These acoustic features shape how well the sound matches the applied force during palpation, impacting perceived realism. This approach lays the groundwork for high-fidelity robotic patients in clinical education and diagnostic simulation.

Authors:Nađa Terzimehić, Babette Bühler, Enkelejda Kasneci
Title: Conversational AI as a Catalyst for Informal Learning: An Empirical Large-Scale Study on LLM Use in Everyday Learning
Abstract:
Large language models have not only captivated the public imagination but have also sparked a profound rethinking of how we learn. In the third year following the breakthrough launch of ChatGPT, everyday informal learning has been transformed as diverse user groups explore these novel tools. Who is embracing LLMs for self-directed learning, and who remains hesitant? What are their reasons for adoption or avoidance? What learning patterns emerge with this novel technological landscape? We present an in-depth analysis from a large-scale survey of 776 participants, showcasing that 88% of our respondents already incorporate LLMs into their everyday learning routines for a wide variety of (learning) tasks. Young adults are at the forefront of adopting LLMs, primarily to enhance their learning experiences independently of time and space. Four types of learners emerge across learning contexts, depending on the tasks they perform with LLMs and the devices they use to access them. Interestingly, our respondents exhibit paradoxical behaviours regarding their trust in LLMs' accuracy and privacy protection measures. Our implications emphasize the importance of including different media types for learning, enabling collaborative learning, providing sources and meeting the needs of different types of learners and learning by design.

Authors:Suhas BN, Andrew M. Sherrill, Jyoti Alaparthi, Dominik Mattioli, Rosa I. Arriaga, Chris W. Wiese, Saeed Abdullah
Title: When and How Long Did Therapy Happen? Soft-Supervising Temporal Localization Using Audio-Language Models
Abstract:
Prolonged Exposure (PE) therapy is an effective treatment for post-traumatic stress disorder (PTSD), but evaluating therapist fidelity remains labor-intensive due to the need for manual review of session recordings. We present a method for the automatic temporal localization of key PE fidelity elements, identifying their start and stop times, directly from session audio and transcripts. Our approach fine-tunes a large pre-trained audio-language model, Qwen2-Audio, using Low-Rank Adaptation (LoRA) to process focused 30-second windows of audio-transcript input. Fidelity labels for three core protocol phases, therapist orientation (P1), imaginal exposure (P2), and post-imaginal processing (P3), are generated via LLM-based prompting and verified by trained raters. The model is trained to predict normalized boundary offsets using soft supervision guided by task-specific prompts. On a dataset of 308 real PE sessions, our best configuration (LoRA rank 8, 30s windows) achieves a mean absolute error (MAE) of 5.3s across tasks, within typical rater tolerance for timestamp review, enabling practical fidelity QC. We further analyze the effects of window size and LoRA rank, highlighting the importance of context granularity and model adaptation. This work introduces a privacy-preserving, scalable framework for fidelity tracking in PE therapy, with potential to support clinician training, supervision, and quality assurance.

Authors:Kellie Yu Hui Sim, Roy Ka-Wei Lee, Kenny Tsu Wei Choo
Title: "Is This Really a Human Peer Supporter?": Misalignments Between Peer Supporters and Experts in LLM-Supported Interactions
Abstract:
Mental health is a growing global concern, prompting interest in AI-driven solutions to expand access to psychosocial support. Peer support, grounded in lived experience, offers a valuable complement to professional care. However, variability in training, effectiveness, and definitions raises concerns about quality, consistency, and safety. Large Language Models (LLMs) present new opportunities to enhance peer support interactions, particularly in real-time, text-based interactions. We present and evaluate an AI-supported system with an LLM-simulated distressed client, context-sensitive LLM-generated suggestions, and real-time emotion visualisations. 2 mixed-methods studies with 12 peer supporters and 5 mental health professionals (i.e., experts) examined the system's effectiveness and implications for practice. Both groups recognised its potential to enhance training and improve interaction quality. However, we found a key tension emerged: while peer supporters engaged meaningfully, experts consistently flagged critical issues in peer supporter responses, such as missed distress cues and premature advice-giving. This misalignment highlights potential limitations in current peer support training, especially in emotionally charged contexts where safety and fidelity to best practices are essential. Our findings underscore the need for standardised, psychologically grounded training, especially as peer support scales globally. They also demonstrate how LLM-supported systems can scaffold this development--if designed with care and guided by expert oversight. This work contributes to emerging conversations on responsible AI integration in mental health and the evolving role of LLMs in augmenting peer-delivered care.

Authors:Prakash Shukla, Suchismita Naik, Ike Obi, Jessica Backus, Nancy Rasche, Paul Parsons
Title: Rethinking Citation of AI Sources in Student-AI Collaboration within HCI Design Education
Abstract:
The growing integration of AI tools in student design projects presents an unresolved challenge in HCI education: how should AI-generated content be cited and documented? Traditional citation frameworks -- grounded in credibility, retrievability, and authorship -- struggle to accommodate the dynamic and ephemeral nature of AI outputs. In this paper, we examine how undergraduate students in a UX design course approached AI usage and citation when given the freedom to integrate generative tools into their design process. Through qualitative analysis of 35 team projects and reflections from 175 students, we identify varied citation practices ranging from formal attribution to indirect or absent acknowledgment. These inconsistencies reveal gaps in existing frameworks and raise questions about authorship, assessment, and pedagogical transparency. We argue for rethinking AI citation as a reflective and pedagogical practice; one that supports metacognitive engagement by prompting students to critically evaluate how and why they used AI throughout the design process. We propose alternative strategies -- such as AI contribution statements and process-aware citation models that better align with the iterative and reflective nature of design education. This work invites educators to reconsider how citation practices can support meaningful student--AI collaboration.

Authors:Christos Margadji, Sebastian W. Pattinson
Title: Hybrid Reasoning for Perception, Explanation, and Autonomous Action in Manufacturing
Abstract:
Industrial processes must be robust and adaptable, as environments and tasks are often unpredictable, while operational errors remain costly and difficult to detect. AI-based control systems offer a path forward, yet typically depend on supervised learning with extensive labelled datasets, which limits their ability to generalize across variable and data-scarce industrial settings. Foundation models could enable broader reasoning and knowledge integration, but rarely deliver the quantitative precision demanded by engineering applications. Here, we introduceControl and Interpretation of Production via Hybrid Expertise and Reasoning (CIPHER): a vision-language-action (VLA) model framework aiming to replicate human-like reasoning for industrial control, instantiated in a commercial-grade 3D printer. It integrates a process expert, a regression model enabling quantitative characterization of system states required for engineering tasks. CIPHER also incorporates retrieval-augmented generation to access external expert knowledge and support physics-informed, chain-of-thought reasoning. This hybrid architecture exhibits strong generalization to out-of-distribution tasks. It interprets visual or textual inputs from process monitoring, explains its decisions, and autonomously generates precise machine instructions, without requiring explicit annotations. CIPHER thus lays the foundations for autonomous systems that act with precision, reason with context, and communicate decisions transparently, supporting safe and trusted deployment in industrial settings.

Authors:Shijing He, Yaxiong Lei, Zihan Zhang, Yuzhou Sun, Shujun Li, Chi Zhang, Juan Ye
Title: Identity Deepfake Threats to Biometric Authentication Systems: Public and Expert Perspectives
Abstract:
Generative AI (Gen-AI) deepfakes pose a rapidly evolving threat to biometric authentication, yet a significant gap exists between expert understanding of these risks and public perception. This disconnection creates critical vulnerabilities in systems trusted by millions. To bridge this gap, we conducted a comprehensive mixed-method study, surveying 408 professionals across key sectors and conducting in-depth interviews with 37 participants (25 experts, 12 general public [non-experts]). Our findings reveal a paradox: while the public increasingly relies on biometrics for convenience, experts express grave concerns about the spoofing of static modalities like face and voice recognition. We found significant demographic and sector-specific divides in awareness and trust, with finance professionals, for example, showing heightened skepticism. To systematically analyze these threats, we introduce a novel Deepfake Kill Chain model, adapted from Hutchins et al.'s cybersecurity frameworks to map the specific attack vectors used by malicious actors against biometric systems. Based on this model and our empirical findings, we propose a tri-layer mitigation framework that prioritizes dynamic biometric signals (e.g., eye movements), robust privacy-preserving data governance, and targeted educational initiatives. This work provides the first empirically grounded roadmap for defending against AI-generated identity threats by aligning technical safeguards with human-centered insights.

Authors:Fatemeh Banani Ardecani, Omidreza Shoghli
Title: Assessing Workers Neuro-physiological Stress Responses to Augmented Reality Safety Warnings in Immersive Virtual Roadway Work Zones
Abstract:
This paper presents a multi-stage experimental framework that integrates immersive Virtual Reality (VR) simulations, wearable sensors, and advanced signal processing to investigate construction workers neuro-physiological stress responses to multi-sensory AR-enabled warnings. Participants performed light- and moderate-intensity roadway maintenance tasks within a high-fidelity VR roadway work zone, while key stress markers of electrodermal activity (EDA), heart rate variability (HRV), and electroencephalography (EEG) were continuously measured. Statistical analyses revealed that task intensity significantly influenced physiological and neurological stress indicators. Moderate-intensity tasks elicited greater autonomic arousal, evidenced by elevated heart rate measures (mean-HR, std-HR, max-HR) and stronger electrodermal responses, while EEG data indicated distinct stress-related alpha suppression and beta enhancement. Feature-importance analysis further identified mean EDR and short-term HR metrics as discriminative for classifying task intensity. Correlation results highlighted a temporal lag between immediate neural changes and subsequent physiological stress reactions, emphasizing the interplay between cognition and autonomic regulation during hazardous tasks.

Authors:Omotoye Shamsudeen Adekoya, Antonio Sgorbissa, Carmine Tommaso Recchiuto
Title: HORUS: A Mixed Reality Interface for Managing Teams of Mobile Robots
Abstract:
Mixed Reality (MR) interfaces have been extensively explored for controlling mobile robots, but there is limited research on their application to managing teams of robots. This paper presents HORUS: Holistic Operational Reality for Unified Systems, a Mixed Reality interface offering a comprehensive set of tools for managing multiple mobile robots simultaneously. HORUS enables operators to monitor individual robot statuses, visualize sensor data projected in real time, and assign tasks to single robots, subsets of the team, or the entire group, all from a Mini-Map (Ground Station). The interface also provides different teleoperation modes: a mini-map mode that allows teleoperation while observing the robot model and its transform on the mini-map, and a semi-immersive mode that offers a flat, screen-like view in either single or stereo view (3D). We conducted a user study in which participants used HORUS to manage a team of mobile robots tasked with finding clues in an environment, simulating search and rescue tasks. This study compared HORUS's full-team management capabilities with individual robot teleoperation. The experiments validated the versatility and effectiveness of HORUS in multi-robot coordination, demonstrating its potential to advance human-robot collaboration in dynamic, team-based environments.

Authors:Shenzhe Zhu, Jiao Sun, Yi Nian, Tobin South, Alex Pentland, Jiaxin Pei
Title: The Automated but Risky Game: Modeling and Benchmarking Agent-to-Agent Negotiations and Transactions in Consumer Markets
Abstract:
AI agents are increasingly used in consumer-facing applications to assist with tasks such as product search, negotiation, and transaction execution. In this paper, we explore a future scenario where both consumers and merchants authorize AI agents to fully automate negotiations and transactions. We aim to answer two key questions: (1) Do different LLM agents vary in their ability to secure favorable deals for users? (2) What risks arise from fully automating deal-making with AI agents in consumer markets? To address these questions, we develop an experimental framework that evaluates the performance of various LLM agents in real-world negotiation and transaction settings. Our findings reveal that AI-mediated deal-making is an inherently imbalanced game -- different agents achieve significantly different outcomes for their users. Moreover, behavioral anomalies in LLMs can result in financial losses for both consumers and merchants, such as overspending or accepting unreasonable deals. These results underscore that while automation can improve efficiency, it also introduces substantial risks. Users should exercise caution when delegating business decisions to AI agents.

Authors:Yaxiong Lei, Mingyue Zhao, Yuheng Wang, Shijing He, Yusuke Sugano, Mohamed Khamis, Juan Ye
Title: MAC-Gaze: Motion-Aware Continual Calibration for Mobile Gaze Tracking
Abstract:
Mobile gaze tracking faces a fundamental challenge: maintaining accuracy as users naturally change their postures and device orientations. Traditional calibration approaches, like one-off, fail to adapt to these dynamic conditions, leading to degraded performance over time. We present MAC-Gaze, a Motion-Aware continual Calibration approach that leverages smartphone Inertial measurement unit (IMU) sensors and continual learning techniques to automatically detect changes in user motion states and update the gaze tracking model accordingly. Our system integrates a pre-trained visual gaze estimator and an IMU-based activity recognition model with a clustering-based hybrid decision-making mechanism that triggers recalibration when motion patterns deviate significantly from previously encountered states. To enable accumulative learning of new motion conditions while mitigating catastrophic forgetting, we employ replay-based continual learning, allowing the model to maintain performance across previously encountered motion conditions. We evaluate our system through extensive experiments on the publicly available RGBDGaze dataset and our own 10-hour multimodal MotionGaze dataset (481K+ images, 800K+ IMU readings), encompassing a wide range of postures under various motion conditions including sitting, standing, lying, and walking. Results demonstrate that our method reduces gaze estimation error by 19.9% on RGBDGaze (from 1.73 cm to 1.41 cm) and by 31.7% on MotionGaze (from 2.81 cm to 1.92 cm) compared to traditional calibration approaches. Our framework provides a robust solution for maintaining gaze estimation accuracy in mobile scenarios.

Authors:Mengchen Dong, Levin Brinkmann, Omar Sherif, Shihan Wang, Xinyu Zhang, Jean-François Bonnefon, Iyad Rahwan
Title: Experimental Evidence That AI-Managed Workers Tolerate Lower Pay Without Demotivation
Abstract:
Experimental evidence on worker responses to AI management remains mixed, partly due to limitations in experimental fidelity. We address these limitations with a customized workplace in the Minecraft platform, enabling high-resolution behavioral tracking of autonomous task execution, and ensuring that participants approach the task with well-formed expectations about their own competence. Workers (N = 382) completed repeated production tasks under either human, AI, or hybrid management. An AI manager trained on human-defined evaluation principles systematically assigned lower performance ratings and reduced wages by 40\%, without adverse effects on worker motivation and sense of fairness. These effects were driven by a muted emotional response to AI evaluation, compared to evaluation by a human. The very features that make AI appear impartial may also facilitate silent exploitation, by suppressing the social reactions that normally constrain extractive practices in human-managed work.

Authors:Harry Li, Gabriel Appleby, Kenneth Alperin, Steven R Gomez, Ashley Suh
Title: The Role of Visualization in LLM-Assisted Knowledge Graph Systems: Effects on User Trust, Exploration, and Workflows
Abstract:
Knowledge graphs (KGs) are powerful data structures, but exploring them effectively remains difficult for even expert users. Large language models (LLMs) are increasingly used to address this gap, yet little is known empirically about how their usage with KGs shapes user trust, exploration strategies, or downstream decision-making - raising key design challenges for LLM-based KG visual analysis systems. To study these effects, we developed LinkQ, a KG exploration system that converts natural language questions into structured queries with an LLM. We collaborated with KG experts to design five visual mechanisms that help users assess the accuracy of both KG queries and LLM responses: an LLM-KG state diagram that illustrates which stage of the exploration pipeline LinkQ is in, a query editor displaying the generated query paired with an LLM explanation, an entity-relation ID table showing extracted KG entities and relations with semantic descriptions, a query structure graph that depicts the path traversed in the KG, and an interactive graph visualization of query results. From a qualitative evaluation with 14 practitioners, we found that users - even KG experts - tended to overtrust LinkQ's outputs due to its "helpful" visualizations, even when the LLM was incorrect. Users exhibited distinct workflows depending on their prior familiarity with KGs and LLMs, challenging the assumption that these systems are one-size-fits-all - despite often being designed as if they are. Our findings highlight the risks of false trust in LLM-assisted data analysis tools and the need for further investigation into the role of visualization as a mitigation technique.

Authors:Rahul Nair, Inge Vejsbjerg, Elizabeth Daly, Christos Varytimidis, Bran Knowles
Title: Humble AI in the real-world: the case of algorithmic hiring
Abstract:
Humble AI (Knowles et al., 2023) argues for cautiousness in AI development and deployments through scepticism (accounting for limitations of statistical learning), curiosity (accounting for unexpected outcomes), and commitment (accounting for multifaceted values beyond performance). We present a real-world case study for humble AI in the domain of algorithmic hiring. Specifically, we evaluate virtual screening algorithms in a widely used hiring platform that matches candidates to job openings. There are several challenges in misrecognition and stereotyping in such contexts that are difficult to assess through standard fairness and trust frameworks; e.g., someone with a non-traditional background is less likely to rank highly. We demonstrate technical feasibility of how humble AI principles can be translated to practice through uncertainty quantification of ranks, entropy estimates, and a user experience that highlights algorithmic unknowns. We describe preliminary discussions with focus groups made up of recruiters. Future user studies seek to evaluate whether the higher cognitive load of a humble AI system fosters a climate of trust in its outcomes.

Authors:Annalisa Degenhard, Stefan Tschöke, Michael Rietzler, Enrico Rukzio
Title: Describe Me Something You Do Not Remember - Challenges and Risks of Exposure Design Using Generative Artificial Intelligence for Therapy of Complex Post-traumatic Disorder
Abstract:
Post-traumatic stress disorder (PTSD) is associated with sudden, uncontrollable, and intense flashbacks of traumatic memories. Trauma exposure psychotherapy has proven effective in reducing the severity of trauma-related symptoms. It involves controlled recall of traumatic memories to train coping mechanisms for flashbacks and enable autobiographical integration of distressing experiences. In particular, exposure to visualizations of these memories supports successful recall. Although this approach is effective for various trauma types, it remains available for only a few. This is due to the lack of cost-efficient solutions for creating individualized exposure visualizations. This issue is particularly relevant for the treatment of Complex PTSD (CPTSD), where traumatic memories are highly individual and generic visualizations do not meet therapeutic needs. Generative Artificial Intelligence (GAI) offers a flexible and cost-effective alternative. GAI enables the creation of individualized exposure visualizations during therapy and, for the first time, allows patients to actively participate in the visualization process. While GAI opens new therapeutic perspectives and may improve access to trauma therapy, especially for CPTSD, it also introduces significant challenges and risks. The extreme uncertainty and lack of control that define both CPTSD and GAI raise concerns about feasibility and safety. To support safe and effective three-way communication, it is essential to understand the roles of patient, system, and therapist in exposure visualization and how each can contribute to safety. This paper outlines perspectives, challenges, and risks associated with the use of GAI in trauma therapy, with a focus on CPTSD.

Authors:Eleonora Cappuccio, Andrea Esposito, Francesco Greco, Giuseppe Desolda, Rosa Lanzilotti, Salvatore Rinzivillo
Title: Explanation User Interfaces: A Systematic Literature Review
Abstract:
Artificial Intelligence (AI) is one of the major technological advancements of this century, bearing incredible potential for users through AI-powered applications and tools in numerous domains. Being often black-box (i.e., its decision-making process is unintelligible), developers typically resort to eXplainable Artificial Intelligence (XAI) techniques to interpret the behaviour of AI models to produce systems that are transparent, fair, reliable, and trustworthy. However, presenting explanations to the user is not trivial and is often left as a secondary aspect of the system's design process, leading to AI systems that are not useful to end-users. This paper presents a Systematic Literature Review on Explanation User Interfaces (XUIs) to gain a deeper understanding of the solutions and design guidelines employed in the academic literature to effectively present explanations to users. To improve the contribution and real-world impact of this survey, we also present a framework for Human-cEnteRed developMent of Explainable user interfaceS (HERMES) to guide practitioners and academics in the design and evaluation of XUIs.

Authors:Koki Nagakura, Tatsuki Fushimi, Ayaka Tsutsui, Yoichi Ochiai
Title: Dynamic Caustics by Ultrasonically Modulated Liquid Surface
Abstract:
This paper presents a method for generating dynamic caustic patterns by utilising dual-optimised holographic fields with Phased Array Transducer (PAT). Building on previous research in static caustic optimisation and ultrasonic manipulation, this approach employs computational techniques to dynamically shape fluid surfaces, thereby creating controllable and real-time caustic images. The system employs a Digital Twin framework, which enables iterative feedback and refinement, thereby improving the accuracy and quality of the caustic patterns produced. This paper extends the foundational work in caustic generation by integrating liquid surfaces as refractive media. This concept has previously been explored in simulations but not fully realised in practical applications. The utilisation of ultrasound to directly manipulate these surfaces enables the generation of dynamic caustics with a high degree of flexibility. The Digital Twin approach further enhances this process by allowing for precise adjustments and optimisation based on real-time feedback. Experimental results demonstrate the technique's capacity to generate continuous animations and complex caustic patterns at high frequencies. Although there are limitations in contrast and resolution compared to solid-surface methods, this approach offers advantages in terms of real-time adaptability and scalability. This technique has the potential to be applied in a number of areas, including interactive displays, artistic installations and educational tools. This research builds upon the work of previous researchers in the fields of caustics optimisation, ultrasonic manipulation, and computational displays. Future research will concentrate on enhancing the resolution and intricacy of the generated patterns.

Authors:Lance T. Wilhelm, Xiaohan Ding, Kirk McInnis Knutsen, Buse Carik, Eugenia H. Rho
Title: How Managers Perceive AI-Assisted Conversational Training for Workplace Communication
Abstract:
Effective workplace communication is essential for managerial success, yet many managers lack access to tailored and sustained training. Although AI-assisted communication systems may offer scalable training solutions, little is known about how managers envision the role of AI in helping them improve their communication skills. To investigate this, we designed a conversational role-play system, CommCoach, as a functional probe to understand how managers anticipate using AI to practice their communication skills. Through semi-structured interviews, participants emphasized the value of adaptive, low-risk simulations for practicing difficult workplace conversations. They also highlighted opportunities, including human-AI teaming, transparent and context-aware feedback, and greater control over AI-generated personas. AI-assisted communication training should balance personalization, structured learning objectives, and adaptability to different user styles and contexts. However, achieving this requires carefully navigating tensions between adaptive and consistent AI feedback, realism and potential bias, and the open-ended nature of AI conversations versus structured workplace discourse.

Authors:Sita Vriend, David Hägele, Daniel Weiskopf
Title: Two Empirical Studies on Audiovisual Semiotics of Uncertainty
Abstract:
There exists limited theoretical guidance on integrating visualization and sonification. In this paper, we address this gap by investigating audiovisual semiotics for uncertainty representation: joining uncertainty visualization and sonification to combine audiovisual channels for enhancing users' perception of uncertainty. We conducted two preregistered crowd-sourced user studies. First, we assessed suitable audio/visual pairs. Then, we investigated audiovisual mappings of uncertainty. Here, we use probability as it is an easily communicated aspect of uncertainty. We analyzed the participants' preferences and reaction times in both user studies. Additionally, we explored the strategies employed by participants through qualitative analysis. Our results reveal audiovisual mappings that lead to particularly strong preferences and low reaction times. Furthermore, we found that preferred audio/visual pairs are not necessarily suitable audiovisual mappings of uncertainty. For example, while pitch paired with brightness was preferred as a pair, it was not well suited as a mapping for uncertainty. We recommend audiovisual mappings of uncertainty that lead to low reaction times and high preferences in both user studies. This paper presents guidelines to anyone seeking to employ audiovisual representations for uncertainty, contributing to enhancing the perception of uncertainty.

Authors:Arpit Narechania, Shunan Guo, Eunyee Koh, Alex Endert, Jane Hoffswell
Title: Utilizing Provenance as an Attribute for Visual Data Analysis: A Design Probe with ProvenanceLens
Abstract:
Analytic provenance can be visually encoded to help users track their ongoing analysis trajectories, recall past interactions, and inform new analytic directions. Despite its significance, provenance is often hardwired into analytics systems, affording limited user control and opportunities for self-reflection. We thus propose modeling provenance as an attribute that is available to users during analysis. We demonstrate this concept by modeling two provenance attributes that track the recency and frequency of user interactions with data. We integrate these attributes into a visual data analysis system prototype, ProvenanceLens, wherein users can visualize their interaction recency and frequency by mapping them to encoding channels (e.g., color, size) or applying data transformations (e.g., filter, sort). Using ProvenanceLens as a design probe, we conduct an exploratory study with sixteen users to investigate how these provenance-tracking affordances are utilized for both decision-making and self-reflection. We find that users can accurately and confidently answer questions about their analysis, and we show that mismatches between the user's mental model and the provenance encodings can be surprising, thereby prompting useful self-reflection. We also report on the user strategies surrounding these affordances, and reflect on their intuitiveness and effectiveness in representing provenance.

Authors:Ali Rabiee, Sima Ghafoori, MH Farhadi, Robert Beyer, Xiangyu Bai, David J Lin, Sarah Ostadabbas, Reza Abiri
Title: Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space
Abstract:
Current invasive assistive technologies are designed to infer high-dimensional motor control signals from severely paralyzed patients. However, they face significant challenges, including public acceptance, limited longevity, and barriers to commercialization. Meanwhile, noninvasive alternatives often rely on artifact-prone signals, require lengthy user training, and struggle to deliver robust high-dimensional control for dexterous tasks. To address these issues, this study introduces a novel human-centered multimodal AI approach as intelligent compensatory mechanisms for lost motor functions that could potentially enable patients with severe paralysis to control high-dimensional assistive devices, such as dexterous robotic arms, using limited and noninvasive inputs. In contrast to the current state-of-the-art (SoTA) noninvasive approaches, our context-aware, multimodal shared-autonomy framework integrates deep reinforcement learning algorithms to blend limited low-dimensional user input with real-time environmental perception, enabling adaptive, dynamic, and intelligent interpretation of human intent for complex dexterous manipulation tasks, such as pick-and-place. The results from our ARAS (Adaptive Reinforcement learning for Amplification of limited inputs in Shared autonomy) trained with synthetic users over 50,000 computer simulation episodes demonstrated the first successful implementation of the proposed closed-loop human-in-the-loop paradigm, outperforming the SoTA shared autonomy algorithms. Following a zero-shot sim-to-real transfer, ARAS was evaluated on 23 human subjects, demonstrating high accuracy in dynamic intent detection and smooth, stable 3D trajectory control for dexterous pick-and-place tasks. ARAS user study achieved a high task success rate of 92.88%, with short completion times comparable to those of SoTA invasive assistive technologies.

Authors:Fatemeh Banani Ardecani, Omidreza Shoghli
Title: Electrodermal Insights into Stress Dynamics of AR-Assisted Safety Warnings in Virtual Roadway Work Zone Environments
Abstract:
This study examines stress levels in roadway workers utilizing AR-assisted multi-sensory warning systems under varying work intensities. A high-fidelity Virtual Reality environment was used to replicate real-world scenarios, allowing safe exploration of high-risk situations while focusing on the physiological impacts of work conditions. Wearable sensors were used to continuously and non-invasively collect physiological data, including electrodermal activity to monitor stress responses. Analysis of data from 18 participants revealed notable differences in EDR between light- and medium-intensity activities, reflecting variations in autonomic nervous system activity under stress. Also, a feature importance analysis revealed that peak and central tendency metrics of EDR were robust indicators of physiological responses, between light- and medium-intensity activities. The findings emphasize the relationship between AR-enabled warnings, work intensity, and worker stress, offering an approach to active stress monitoring and improved safety practices. By leveraging real-time physiological insights, this methodology has the potential to support better stress management and the development of more effective safety warning systems for roadway work zones. This research also provides valuable guidance for designing interventions to enhance worker safety, productivity, and well-being in high-risk settings.

Authors:Houjiang Liu, Yiheng Su, Matthew Lease
Title: Rhetorical XAI: Explaining AI's Benefits as well as its Use via Rhetorical Design
Abstract:
This paper explores potential benefits of incorporating Rhetorical Design into the design of Explainable Artificial Intelligence (XAI) systems. While XAI is traditionally framed around explaining individual predictions or overall system behavior, explanations also function as a form of argumentation, shaping how users evaluate system perceived usefulness, credibility, and foster appropriate trust. Rhetorical Design offers a useful framework to analyze the communicative role of explanations between AI systems and users, focusing on: (1) logical reasoning conveyed through different types of explanations, (2) credibility projected by the system and its developers, and (3) emotional resonance elicited in users. Together, these rhetorical appeals help us understand how explanations influence user perceptions and facilitate AI adoption across and within different collaborative and social contexts. This paper synthesizes design strategies from prior XAI work that align with these three rhetorical appeals and highlights both opportunities and challenges of integrating rhetorical design into XAI design.

Authors:Suchismita Naik, Prakash Shukla, Ike Obi, Jessica Backus, Nancy Rasche, Paul Parsons
Title: Tracing the Invisible: Understanding Students' Judgment in AI-Supported Design Work
Abstract:
As generative AI tools become integrated into design workflows, students increasingly engage with these tools not just as aids, but as collaborators. This study analyzes reflections from 33 student teams in an HCI design course to examine the kinds of judgments students make when using AI tools. We found both established forms of design judgment (e.g., instrumental, appreciative, quality) and emergent types: agency-distribution judgment and reliability judgment. These new forms capture how students negotiate creative responsibility with AI and assess the trustworthiness of its outputs. Our findings suggest that generative AI introduces new layers of complexity into design reasoning, prompting students to reflect not only on what AI produces, but also on how and when to rely on it. By foregrounding these judgments, we offer a conceptual lens for understanding how students engage in co-creative sensemaking with AI in design contexts.

Authors:Wenhan Lyu, Yimeng Wang, Yifan Sun, Yixuan Zhang
Title: Will Your Next Pair Programming Partner Be Human? An Empirical Evaluation of Generative AI as a Collaborative Teammate in a Semester-Long Classroom Setting
Abstract:
Generative AI (GenAI), especially Large Language Models (LLMs), is rapidly reshaping both programming workflows and computer science education. Many programmers now incorporate GenAI tools into their workflows, including for collaborative coding tasks such as pair programming. While prior research has demonstrated the benefits of traditional pair programming and begun to explore GenAI-assisted coding, the role of LLM-based tools as collaborators in pair programming remains underexamined. In this work, we conducted a mixed-methods study with 39 undergraduate students to examine how GenAI influences collaboration, learning, and performance in pair programming. Specifically, students completed six in-class assignments under three conditions: Traditional Pair Programming (PP), Pair Programming with GenAI (PAI), and Solo Programming with GenAI (SAI). They used both LLM-based inline completion tools (e.g., GitHub Copilot) and LLM-based conversational tools (e.g., ChatGPT). Our results show that students in PAI achieved the highest assignment scores, whereas those in SAI attained the lowest. Additionally, students' attitudes toward LLMs' programming capabilities improved significantly after collaborating with LLM-based tools, and preferences were largely shaped by the perceived usefulness for completing assignments and learning programming skills, as well as the quality of collaboration. Our qualitative findings further reveal that while students appreciated LLM-based tools as valuable pair programming partners, they also identified limitations and had different expectations compared to human teammates. Our study provides one of the first empirical evaluations of GenAI as a pair programming collaborator through a comparison of three conditions (PP, PAI, and SAI). We also discuss the design implications and pedagogical considerations for future GenAI-assisted pair programming approaches.

Authors:Pascal Spiegler, Arash Harirpoush, Yiming Xiao
Title: Towards user-centered interactive medical image segmentation in VR with an assistive AI agent
Abstract:
Crucial in disease analysis and surgical planning, manual segmentation of volumetric medical scans (e.g. MRI, CT) is laborious, error-prone, and challenging to master, while fully automatic algorithms can benefit from user feedback. Therefore, with the complementary power of the latest radiological AI foundation models and virtual reality (VR)'s intuitive data interaction, we propose SAMIRA, a novel conversational AI agent for medical VR that assists users with localizing, segmenting, and visualizing 3D medical concepts. Through speech-based interaction, the agent helps users understand radiological features, locate clinical targets, and generate segmentation masks that can be refined with just a few point prompts. The system also supports true-to-scale 3D visualization of segmented pathology to enhance patient-specific anatomical understanding. Furthermore, to determine the optimal interaction paradigm under near-far attention-switching for refining segmentation masks in an immersive, human-in-the-loop workflow, we compare VR controller pointing, head pointing, and eye tracking as input modes. With a user study, evaluations demonstrated a high usability score (SUS=90.0 $\pm$ 9.0), low overall task load, as well as strong support for the proposed VR system's guidance, training potential, and integration of AI in radiological segmentation tasks.

Authors:Zekai Shao, Yi Shan, Yixuan He, Yuxuan Yao, Junhong Wang, Xiaolong, Zhang, Yu Zhang, Siming Chen
Title: Do Language Model Agents Align with Humans in Rating Visualizations? An Empirical Study
Abstract:
Large language models encode knowledge in various domains and demonstrate the ability to understand visualizations. They may also capture visualization design knowledge and potentially help reduce the cost of formative studies. However, it remains a question whether large language models are capable of predicting human feedback on visualizations. To investigate this question, we conducted three studies to examine whether large model-based agents can simulate human ratings in visualization tasks. The first study, replicating a published study involving human subjects, shows agents are promising in conducting human-like reasoning and rating, and its result guides the subsequent experimental design. The second study repeated six human-subject studies reported in literature on subjective ratings, but replacing human participants with agents. Consulting with five human experts, this study demonstrates that the alignment of agent ratings with human ratings positively correlates with the confidence levels of the experts before the experiments. The third study tests commonly used techniques for enhancing agents, including preprocessing visual and textual inputs, and knowledge injection. The results reveal the issues of these techniques in robustness and potential induction of biases. The three studies indicate that language model-based agents can potentially simulate human ratings in visualization experiments, provided that they are guided by high-confidence hypotheses from expert evaluators. Additionally, we demonstrate the usage scenario of swiftly evaluating prototypes with agents. We discuss insights and future directions for evaluating and improving the alignment of agent ratings with human ratings. We note that simulation may only serve as complements and cannot replace user studies.

Authors:Cedric Waterschoot, Raciel Yera Toledo, Nava Tintarev, Francesco Barile
Title: With Friends Like These, Who Needs Explanations? Evaluating User Understanding of Group Recommendations
Abstract:
Group Recommender Systems (GRS) employing social choice-based aggregation strategies have previously been explored in terms of perceived consensus, fairness, and satisfaction. At the same time, the impact of textual explanations has been examined, but the results suggest a low effectiveness of these explanations. However, user understanding remains fairly unexplored, even if it can contribute positively to transparent GRS. This is particularly interesting to study in more complex or potentially unfair scenarios when user preferences diverge, such as in a minority scenario (where group members have similar preferences, except for a single member in a minority position). In this paper, we analyzed the impact of different types of explanations on user understanding of group recommendations. We present a randomized controlled trial (n = 271) using two between-subject factors: (i) the aggregation strategy (additive, least misery, and approval voting), and (ii) the modality of explanation (no explanation, textual explanation, or multimodal explanation). We measured both subjective (self-perceived by the user) and objective understanding (performance on model simulation, counterfactuals and error detection). In line with recent findings on explanations for machine learning models, our results indicate that more detailed explanations, whether textual or multimodal, did not increase subjective or objective understanding. However, we did find a significant effect of aggregation strategies on both subjective and objective understanding. These results imply that when constructing GRS, practitioners need to consider that the choice of aggregation strategy can influence the understanding of users. Post-hoc analysis also suggests that there is value in analyzing performance on different tasks, rather than through a single aggregated metric of understanding.

Authors:Yong Ma, Yuchong Zhang, Oda Elise Nordberg, Arvid Rongve, Miroslav Bachinski, Morten Fjeld
Title: State-of-the-Art HCI for Dementia Care: A Scoping Review of Recent Technological Advances
Abstract:
Dementia significantly impacts cognitive, behavioral, and functional abilities, creating challenges for both individuals and caregivers. Recent advancements in HCI have introduced innovative technological solutions to support people with dementia (PwD) and their caregivers. This scoping review systematically examines 32 recent publications from leading digital libraries, categorizing technological interventions into four key domains: Assistive and Smart Technology for Daily Life, Social Interaction and Communication, Well-being and Psychological Support, and Caregiver Support and Training. Our analysis highlights how emerging technologies are transforming dementia care. These technologies enhance quality of life by promoting independence, fostering social engagement, and providing emotional and cognitive support. However, the review also identifies critical gaps, particularly in addressing the needs of individuals with early-stage dementia and the lack of individualized support mechanisms. By emphasizing user-centered design, accessibility, and ethical considerations, this paper offers a structured roadmap for future research and practice in dementia care. It bridges the gap between technological innovation and the real-world needs of PwD and their caregivers, providing valuable insights for researchers, practitioners, and policymakers. This review not only synthesizes current advancements but also sets the stage for future HCI-driven innovations in dementia care, aiming to improve outcomes for an aging global population.

Authors:Peinuan Qin, Zicheng Zhu, Naomi Yamashita, Yitian Yang, Keita Suga, Yi-Chieh Lee
Title: AI-Based Speaking Assistant: Supporting Non-Native Speakers' Speaking in Real-Time Multilingual Communication
Abstract:
Non-native speakers (NNSs) often face speaking challenges in real-time multilingual communication, such as struggling to articulate their thoughts. To address this issue, we developed an AI-based speaking assistant (AISA) that provides speaking references for NNSs based on their input queries, task background, and conversation history. To explore NNSs' interaction with AISA and its impact on NNSs' speaking during real-time multilingual communication, we conducted a mixed-method study involving a within-subject experiment and follow-up interviews. In the experiment, two native speakers (NSs) and one NNS formed a team (31 teams in total) and completed two collaborative tasks--one with access to the AISA and one without. Overall, our study revealed four types of AISA input patterns among NNSs, each reflecting different levels of effort and language preferences. Although AISA did not improve NNSs' speaking competence, follow-up interviews revealed that it helped improve the logical flow and depth of their speech. Moreover, the additional multitasking introduced by AISA, such as entering and reviewing system output, potentially elevated NNSs' workload and anxiety. Based on these observations, we discuss the pros and cons of implementing tools to assist NNS in real-time multilingual communication and offer design recommendations.

Authors:Suhas BN, Dominik Mattioli, Saeed Abdullah, Rosa I. Arriaga, Chris W. Wiese, Andrew M. Sherrill
Title: How Real Are Synthetic Therapy Conversations? Evaluating Fidelity in Prolonged Exposure Dialogues
Abstract:
Synthetic data adoption in healthcare is driven by privacy concerns, data access limitations, and high annotation costs. We explore synthetic Prolonged Exposure (PE) therapy conversations for PTSD as a scalable alternative for training clinical models. We systematically compare real and synthetic dialogues using linguistic, structural, and protocol-specific metrics like turn-taking and treatment fidelity. We introduce and evaluate PE-specific metrics, offering a novel framework for assessing clinical fidelity beyond surface fluency. Our findings show that while synthetic data successfully mitigates data scarcity and protects privacy, capturing the most subtle therapeutic dynamics remains a complex challenge. Synthetic dialogues successfully replicate key linguistic features of real conversations, for instance, achieving a similar Readability Score (89.2 vs. 88.1), while showing differences in some key fidelity markers like distress monitoring. This comparison highlights the need for fidelity-aware metrics that go beyond surface fluency to identify clinically significant nuances. Our model-agnostic framework is a critical tool for developers and clinicians to benchmark generative model fidelity before deployment in sensitive applications. Our findings help clarify where synthetic data can effectively complement real-world datasets, while also identifying areas for future refinement.

Authors:Prakash Shukla, Phuong Bui, Paul Parsons
Title: Coping with Uncertainty in UX Design Practice: Practitioner Strategies and Judgment
Abstract:
The complexity of UX design practice extends beyond ill-structured design problems to include uncertainties shaped by shifting stakeholder priorities, team dynamics, limited resources, and implementation constraints. While prior research in related fields has addressed uncertainty in design more broadly, the specific character of uncertainty in UX practice remains underexplored. This study examines how UX practitioners experience and respond to uncertainty in real-world projects, drawing on a multi-week diary study and follow-up interviews with ten designers. We identify a range of practitioner strategies-including adaptive framing, negotiation, and judgment-that allow designers to move forward amid ambiguity. Our findings highlight the central role of design judgment in navigating uncertainty, including emergent forms such as temporal and sacrificial judgment, and extend prior understandings by showing how UX practitioners engage uncertainty as a persistent, situated feature of practice.

Authors:Julian Rasch, Florian Müller, Francesco Chiossi
Title: A Vision for AI-Driven Adaptation of Dynamic AR Content to Users and Environments
Abstract:
Augmented Reality (AR) is transforming the way we interact with virtual information in the physical world. By overlaying digital content in real-world environments, AR enables new forms of immersive and engaging experiences. However, existing AR systems often struggle to effectively manage the many interactive possibilities that AR presents. This vision paper speculates on AI-driven approaches for adaptive AR content placement, dynamically adjusting to user movement and environmental changes. By leveraging machine learning methods, such a system would intelligently manage content distribution between AR projections integrated into the external environment and fixed static content, enabling seamless UI layout and potentially reducing users' cognitive load. By exploring the possibilities of AI-driven dynamic AR content placement, we aim to envision new opportunities for innovation and improvement in various industries, from urban navigation and workplace productivity to immersive learning and beyond. This paper outlines a vision for the development of more intuitive, engaging, and effective AI-powered AR experiences.

Authors:Jiaqi Tang, Xinbo Xu, Yinsong Xu, Qingchao Chen
Title: Advancing Radar Hand Gesture Recognition: A Hybrid Spectrum Synthetic Framework Merging Simulation with Neural Networks
Abstract:
Millimeter wave (mmWave) radar sensors play a vital role in hand gesture recognition (HGR) by detecting subtle motions while preserving user privacy. However, the limited scale of radar datasets hinders the performance. Existing synthetic data generation methods fall short in two key areas. On the one hand, modeling-based approaches fail to accurately simulate the wave propagation and reflection at the hand-gesture level, facing unique complexities such as diffraction and occlusion. On the other hand, generative model-based methods are hard to converge while radar data is limited, lacking interpretability, and sometimes fail to produce kinematically plausible results. To overcome these limitations, we propose a novel hybrid spectrum synthetic framework leveraging visual hand gesture data. It combines a cylinder mesh-based hand reflection model with a small-scale neural network called RadarWeightNet, which focuses on assigning weights to simulated signals. Our framework addresses two key challenges: achieving accurate simulation of complex hand geometry and bridging the simulation-to-real gap in a data-driven manner while preserving interpretability, which balances physical accuracy with machine learning adaptability. We tested our framework under extreme scenarios where radar data is scarce. The results demonstrate the effectiveness of our hybrid framework, achieving up to 63% SSIM in synthetic performance and up to 30% improvement in classification performance in few-shot learning.

Authors:Qi Yang, Weichen Bi, Haiyang Shen, Yaoqi Guo, Yun Ma
Title: PixelWeb: The First Web GUI Dataset with Pixel-Wise Labels
Abstract:
Graphical User Interface (GUI) datasets are crucial for various downstream tasks. However, GUI datasets often generate annotation information through automatic labeling, which commonly results in inaccurate GUI element BBox annotations, including missing, duplicate, or meaningless BBoxes. These issues can degrade the performance of models trained on these datasets, limiting their effectiveness in real-world applications. Additionally, existing GUI datasets only provide BBox annotations visually, which restricts the development of visually related GUI downstream tasks. To address these issues, we introduce PixelWeb, a large-scale GUI dataset containing over 100,000 annotated web pages. PixelWeb is constructed using a novel automatic annotation approach that integrates visual feature extraction and Document Object Model (DOM) structure analysis through two core modules: channel derivation and layer analysis. Channel derivation ensures accurate localization of GUI elements in cases of occlusion and overlapping elements by extracting BGRA four-channel bitmap annotations. Layer analysis uses the DOM to determine the visibility and stacking order of elements, providing precise BBox annotations. Additionally, PixelWeb includes comprehensive metadata such as element images, contours, and mask annotations. Manual verification by three independent annotators confirms the high quality and accuracy of PixelWeb annotations. Experimental results on GUI element detection tasks show that PixelWeb achieves performance on the mAP95 metric that is 3-7 times better than existing datasets. We believe that PixelWeb has great potential for performance improvement in downstream tasks such as GUI generation and automated user interaction.

Authors:Merve Cerit, Eric Zelikman, Mu-Jung Cho, Thomas N. Robinson, Byron Reeves, Nilam Ram, Nick Haber
Title: Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs
Abstract:
As digital media use continues to evolve and influence various aspects of life, developing flexible and scalable tools to study complex media experiences is essential. This study introduces the Media Content Atlas (MCA), a novel pipeline designed to help researchers investigate large-scale screen data beyond traditional screen-use metrics. Leveraging multimodal large language models (MLLMs), MCA enables moment-by-moment content analysis, content-based clustering, topic modeling, image retrieval, and interactive visualizations. Evaluated on 1.12 million smartphone screenshots continuously captured during screen use from 112 adults over an entire month, MCA facilitates open-ended exploration and hypothesis generation as well as hypothesis-driven investigations at an unprecedented scale. Expert evaluators underscored its usability and potential for research and intervention design, with clustering results rated 96% relevant and descriptions 83% accurate. By bridging methodological possibilities with domain-specific needs, MCA accelerates both inductive and deductive inquiry, presenting new opportunities for media and HCI research.

Authors:Eugene Tang KangJie, Tianqi Song, Zicheng Zhu, Jingshu Li, Yi-Chieh Lee
Title: AI Literacy Education for Older Adults: Motivations, Challenges and Preferences
Abstract:
As Artificial Intelligence (AI) becomes increasingly integrated into older adults' daily lives, equipping them with the knowledge and skills to understand and use AI is crucial. However, most research on AI literacy education has focused on students and children, leaving a gap in understanding the unique needs of older adults when learning about AI. To address this, we surveyed 103 older adults aged 50 and above (Mean = 64, SD = 7). Results revealed that they found it important and were motivated to learn about AI because they wish to harness the benefits and avoid the dangers of AI, seeing it as necessary to cope in the future. However, they expressed learning challenges such as difficulties in understanding and not knowing how to start learning AI. Particularly, a strong preference for hands-on learning was indicated. We discussed design opportunities to support AI literacy education for older adults.

Authors:Liudmila Zavolokina, Kilian Sprenkamp, Zoya Katashinskaya, Daniel Gordon Jones
Title: Biased by Design: Leveraging Inherent AI Biases to Enhance Critical Thinking of News Readers
Abstract:
This paper explores the design of a propaganda detection tool using Large Language Models (LLMs). Acknowledging the inherent biases in AI models, especially in political contexts, we investigate how these biases might be leveraged to enhance critical thinking in news consumption. Countering the typical view of AI biases as detrimental, our research proposes strategies of user choice and personalization in response to a user's political stance, applying psychological concepts of confirmation bias and cognitive dissonance. We present findings from a qualitative user study, offering insights and design recommendations (bias awareness, personalization and choice, and gradual introduction of diverse perspectives) for AI tools in propaganda detection.

Authors:Liangwei Wang, Zhan Wang, Shishi Xiao, Le Liu, Fugee Tsung, Wei Zeng
Title: VizTA: Enhancing Comprehension of Distributional Visualization with Visual-Lexical Fused Conversational Interface
Abstract:
Comprehending visualizations requires readers to interpret visual encoding and the underlying meanings actively. This poses challenges for visualization novices, particularly when interpreting distributional visualizations that depict statistical uncertainty. Advancements in LLM-based conversational interfaces show promise in promoting visualization comprehension. However, they fail to provide contextual explanations at fine-grained granularity, and chart readers are still required to mentally bridge visual information and textual explanations during conversations. Our formative study highlights the expectations for both lexical and visual feedback, as well as the importance of explicitly linking these two modalities throughout the conversation. The findings motivate the design of VizTA, a visualization teaching assistant that leverages the fusion of visual and lexical feedback to help readers better comprehend visualization. VizTA features a semantic-aware conversational agent capable of explaining contextual information within visualizations and employs a visual-lexical fusion design to facilitate chart-centered conversation. A between-subject study with 24 participants demonstrates the effectiveness of VizTA in supporting the understanding and reasoning tasks of distributional visualization across multiple scenarios.

Authors:Alice Nardelli, Antonio Sgorbissa, Carmine Tommaso Recchiuto
Title: Designing Empathetic Companions: Exploring Personality, Emotion, and Trust in Social Robots
Abstract:
How should a companion robot behave? In this research, we present a cognitive architecture based on a tailored personality model to investigate the impact of robotic personalities on the perception of companion robots. Drawing from existing literature, we identified empathy, trust, and enjoyability as key factors in building companionship with social robots. Based on these insights, we implemented a personality-dependent, emotion-aware generator, recognizing the crucial role of robot emotions in shaping these elements. We then conducted a user study involving 84 dyadic conversation sessions with the emotional robot Navel, which exhibited different personalities. Results were derived from a multimodal analysis, including questionnaires, open-ended responses, and behavioral observations. This approach allowed us to validate the developed emotion generator and explore the relationship between the personality traits of Agreeableness, Extraversion, Conscientiousness, and Empathy. Furthermore, we drew robust conclusions on how these traits influence relational trust, capability trust, enjoyability, and sociability.

Authors:Suhas BN, Andrew M. Sherrill, Rosa I. Arriaga, Chris W. Wiese, Saeed Abdullah
Title: Thousand Voices of Trauma: A Large-Scale Synthetic Dataset for Modeling Prolonged Exposure Therapy Conversations
Abstract:
The advancement of AI systems for mental health support is hindered by limited access to therapeutic conversation data, particularly for trauma treatment. We present Thousand Voices of Trauma, a synthetic benchmark dataset of 3,000 therapy conversations based on Prolonged Exposure therapy protocols for Post-traumatic Stress Disorder (PTSD). The dataset comprises 500 unique cases, each explored through six conversational perspectives that mirror the progression of therapy from initial anxiety to peak distress to emotional processing. We incorporated diverse demographic profiles (ages 18-80, M=49.3, 49.4% male, 44.4% female, 6.2% non-binary), 20 trauma types, and 10 trauma-related behaviors using deterministic and probabilistic generation methods. Analysis reveals realistic distributions of trauma types (witnessing violence 10.6%, bullying 10.2%) and symptoms (nightmares 23.4%, substance abuse 20.8%). Clinical experts validated the dataset's therapeutic fidelity, highlighting its emotional depth while suggesting refinements for greater authenticity. We also developed an emotional trajectory benchmark with standardized metrics for evaluating model responses. This privacy-preserving dataset addresses critical gaps in trauma-focused mental health data, offering a valuable resource for advancing both patient-facing applications and clinician training tools.

Authors:Ashley Suh, Kenneth Alperin, Harry Li, Steven R Gomez
Title: Don't Just Translate, Agitate: Using Large Language Models as Devil's Advocates for AI Explanations
Abstract:
This position paper highlights a growing trend in Explainable AI (XAI) research where Large Language Models (LLMs) are used to translate outputs from explainability techniques, like feature-attribution weights, into a natural language explanation. While this approach may improve accessibility or readability for users, recent findings suggest that translating into human-like explanations does not necessarily enhance user understanding and may instead lead to overreliance on AI systems. When LLMs summarize XAI outputs without surfacing model limitations, uncertainties, or inconsistencies, they risk reinforcing the illusion of interpretability rather than fostering meaningful transparency. We argue that - instead of merely translating XAI outputs - LLMs should serve as constructive agitators, or devil's advocates, whose role is to actively interrogate AI explanations by presenting alternative interpretations, potential biases, training data limitations, and cases where the model's reasoning may break down. In this role, LLMs can facilitate users in engaging critically with AI systems and generated explanations, with the potential to reduce overreliance caused by misinterpreted or specious explanations.

Authors:Harry Li, Gabriel Appleby, Kenneth Alperin, Steven R Gomez, Ashley Suh
Title: Mitigating LLM Hallucinations with Knowledge Graphs: A Case Study
Abstract:
High-stakes domains like cyber operations need responsible and trustworthy AI methods. While large language models (LLMs) are becoming increasingly popular in these domains, they still suffer from hallucinations. This research paper provides learning outcomes from a case study with LinkQ, an open-source natural language interface that was developed to combat hallucinations by forcing an LLM to query a knowledge graph (KG) for ground-truth data during question-answering (QA). We conduct a quantitative evaluation of LinkQ using a well-known KGQA dataset, showing that the system outperforms GPT-4 but still struggles with certain question categories - suggesting that alternative query construction strategies will need to be investigated in future LLM querying systems. We discuss a qualitative study of LinkQ with two domain experts using a real-world cybersecurity KG, outlining these experts' feedback, suggestions, perceived limitations, and future opportunities for systems like LinkQ.

Authors:Tonko E. W. Bossen, Andreas Møgelmose, Ross Greer
Title: Can Vision-Language Models Understand and Interpret Dynamic Gestures from Pedestrians? Pilot Datasets and Exploration Towards Instructive Nonverbal Commands for Cooperative Autonomous Vehicles
Abstract:
In autonomous driving, it is crucial to correctly interpret traffic gestures (TGs), such as those of an authority figure providing orders or instructions, or a pedestrian signaling the driver, to ensure a safe and pleasant traffic environment for all road users. This study investigates the capabilities of state-of-the-art vision-language models (VLMs) in zero-shot interpretation, focusing on their ability to caption and classify human gestures in traffic contexts. We create and publicly share two custom datasets with varying formal and informal TGs, such as 'Stop', 'Reverse', 'Hail', etc. The datasets are "Acted TG (ATG)" and "Instructive TG In-The-Wild (ITGI)". They are annotated with natural language, describing the pedestrian's body position and gesture. We evaluate models using three methods utilizing expert-generated captions as baseline and control: (1) caption similarity, (2) gesture classification, and (3) pose sequence reconstruction similarity. Results show that current VLMs struggle with gesture understanding: sentence similarity averages below 0.59, and classification F1 scores reach only 0.14-0.39, well below the expert baseline of 0.70. While pose reconstruction shows potential, it requires more data and refined metrics to be reliable. Our findings reveal that although some SOTA VLMs can interpret zero-shot human traffic gestures, none are accurate and robust enough to be trustworthy, emphasizing the need for further research in this domain.

Authors:Annalisa Degenhard, Ali Askari, Michael Rietzler, Enrico Rukzio
Title: When Do We Feel Present in a Virtual Reality? Towards Sensitivity and User Acceptance of Presence Questionnaires
Abstract:
Presence is an important and widely used metric to measure the quality of virtual reality (VR) applications. Given the multifaceted and subjective nature of presence, the most common measures for presence are questionnaires. But there is little research on their validity regarding specific presence dimensions and their responsiveness to differences in perception among users. We investigated four presence questionnaires (SUS, PQ, IPQ, Bouchard) on their responsiveness to intensity variations of known presence dimensions and asked users about their consistency with their experience. Therefore, we created five VR scenarios that were designed to emphasize a specific presence dimension. Our findings showed heterogeneous sensitivity of the questionnaires dependent on the different dimensions of presence. This highlights a context-specific suitability of presence questionnaires. The questionnaires' sensitivity was further stated as lower than actually perceived. Based on our findings, we offer guidance on selecting these questionnaires based on their suitability for particular use cases.

Authors:Zhimin Li, Haichao Miao, Xinyuan Yan, Valerio Pascucci, Matthew Berger, Shusen Liu
Title: See or Recall: A Sanity Check for the Role of Vision in Solving Visualization Question Answer Tasks with Multimodal LLMs
Abstract:
Recent developments in multimodal large language models (MLLM) have equipped language models to reason about vision and language jointly. This permits MLLMs to both perceive and answer questions about data visualization across a variety of designs and tasks. Applying MLLMs to a broad range of visualization tasks requires us to properly evaluate their capabilities, and the most common way to conduct evaluation is through measuring a model's visualization reasoning capability, analogous to how we would evaluate human understanding of visualizations (e.g., visualization literacy). However, we found that in the context of visualization question answering (VisQA), how an MLLM perceives and reasons about visualizations can be fundamentally different from how humans approach the same problem. During the evaluation, even without visualization, the model could correctly answer a substantial portion of the visualization test questions, regardless of whether any selection options were provided. We hypothesize that the vast amount of knowledge encoded in the language model permits factual recall that supersedes the need to seek information from the visual signal. It raises concerns that the current VisQA evaluation may not fully capture the models' visualization reasoning capabilities. To address this, we propose a comprehensive sanity check framework that integrates a rule-based decision tree and a sanity check table to disentangle the effects of "seeing" (visual processing) and "recall" (reliance on prior knowledge). This validates VisQA datasets for evaluation, highlighting where models are truly "seeing", positively or negatively affected by the factual recall, or relying on inductive biases for question answering. Our study underscores the need for careful consideration in designing future visualization understanding studies when utilizing MLLMs.

Authors:Jenny Ma, Riya Sahni, Karthik Sreedhar, Lydia B. Chilton
Title: AgentDynEx: Nudging the Mechanics and Dynamics of Multi-Agent Simulations
Abstract:
Multi-agent large language model simulations have the potential to model complex human behaviors and interactions. If the mechanics are set up properly, unanticipated and valuable social dynamics can surface. However, it is challenging to consistently enforce simulation mechanics while still allowing for notable and emergent dynamics. We present AgentDynEx, an AI system that helps set up simulations from user-specified mechanics and dynamics. AgentDynEx uses LLMs to guide users through a Configuration Matrix to identify core mechanics and define milestones to track dynamics. It also introduces a method called \textit{nudging}, where the system dynamically reflects on simulation progress and gently intervenes if it begins to deviate from intended outcomes. A technical evaluation found that nudging enables simulations to have more complex mechanics and maintain its notable dynamics compared to simulations without nudging. We discuss the importance of nudging as a technique for balancing mechanics and dynamics of multi-agent simulations.

Authors:Arpit Narechania, Alex Endert, Clio Andris
Title: Cartographers in Cubicles: How Training and Preferences of Mapmakers Interplay with Structures and Norms in Not-for-Profit Organizations
Abstract:
Choropleth maps are a common and effective way to visualize geographic thematic data. Although cartographers have established many principles about map design, data binning and color usage, less is known about how mapmakers make individual decisions in practice. We interview 16 cartographers and geographic information systems (GIS) experts from 13 government organizations, NGOs, and federal agencies about their choropleth mapmaking decisions and workflows. We categorize our findings and report on how mapmakers follow cartographic guidelines and personal rules of thumb, collaborate with other stakeholders within and outside their organization, and how organizational structures and norms are tied to decision-making during data preparation, data analysis, data binning, map styling, and map post-processing. We find several points of variation as well as regularity across mapmakers and organizations and present takeaways to inform cartographic education and practice, including broader implications and opportunities for CSCW, HCI, and information visualization researchers and practitioners.

Authors:Mohammadmehdi Ataei, Hyunmin Cheong, Jiwon Jun, Justin Matejka, Alexander Tessier, George Fitzmaurice
Title: Transformer-Based Interfaces for Mechanical Assembly Design: A Gear Train Case Study
Abstract:
Generative artificial intelligence (AI), particularly transformer-based models, presents new opportunities for automating and augmenting engineering design workflows. However, effectively integrating these models into interactive tools requires careful interface design that leverages their unique capabilities. This paper introduces a transformer model tailored for gear train assembly design, paired with two novel interaction modes: Explore and Copilot. Explore Mode uses probabilistic sampling to generate and evaluate diverse design alternatives, while Copilot Mode utilizes autoregressive prediction to support iterative, context-aware refinement. These modes emphasize key transformer properties (sequence-based generation and probabilistic exploration) to facilitate intuitive and efficient human-AI collaboration. Through a case study, we demonstrate how well-designed interfaces can enhance engineers' ability to balance automation with domain expertise. A user study shows that Explore Mode supports rapid exploration and problem redefinition, while Copilot Mode provides greater control and fosters deeper engagement. Our results suggest that hybrid workflows combining both modes can effectively support complex, creative engineering design processes.

Authors:Roan Schellingerhout, Francesco Barile, Nava Tintarev
Title: OKRA: an Explainable, Heterogeneous, Multi-Stakeholder Job Recommender System
Abstract:
The use of recommender systems in the recruitment domain has been labeled as 'high-risk' in recent legislation. As a result, strict requirements regarding explainability and fairness have been put in place to ensure proper treatment of all involved stakeholders. To allow for stakeholder-specific explainability, while also handling highly heterogeneous recruitment data, we propose a novel explainable multi-stakeholder job recommender system using graph neural networks: the Occupational Knowledge-based Recommender using Attention (OKRA). The proposed method is capable of providing both candidate- and company-side recommendations and explanations. We find that OKRA performs substantially better than six baselines in terms of nDCG for two datasets. Furthermore, we find that the tested models show a bias toward candidates and vacancies located in urban areas. Overall, our findings suggest that OKRA provides a balance between accuracy, explainability, and fairness.

Authors:Gautam Kishore Shahi, Benedetta Tessa, Amaury Trujillo, Stefano Cresci
Title: A Year of the DSA Transparency Database: What it (Does Not) Reveal About Platform Moderation During the 2024 European Parliament Election
Abstract:
Social media platforms face heightened risks during major political events; yet, how platforms adapt their moderation practices in response remains unclear. The Digital Services Act Transparency Database offers an unprecedented opportunity to systematically study content moderation at scale, enabling researchers and policymakers to assess platforms' compliance and effectiveness. Herein, we analyze 1.58 billion self-reported moderation actions taken by eight large social media platforms during an extended period of eight months surrounding the 2024 European Parliament elections. Our findings reveal a lack of adaptation in moderation strategies, as platforms did not exhibit significant changes in their enforcement behaviors surrounding the elections. This raises concerns about whether platforms adapted their moderation practices at all, or if structural limitations of the database concealed possible adjustments. Moreover, we found that noted transparency and accountability issues persist nearly a year after initial concerns were raised. These results highlight the limitations of current self-regulatory approaches and underscore the need for stronger enforcement and data access mechanisms to ensure that online platforms uphold their responsibility in safeguarding democratic processes.

Authors:Quan Shi, Carlos E. Jimenez, Stephen Dong, Brian Seo, Caden Yao, Adam Kelch, Karthik Narasimhan
Title: IMPersona: Evaluating Individual Level LM Impersonation
Abstract:
As language models achieve increasingly human-like capabilities in conversational text generation, a critical question emerges: to what extent can these systems simulate the characteristics of specific individuals? To evaluate this, we introduce IMPersona, a framework for evaluating LMs at impersonating specific individuals' writing style and personal knowledge. Using supervised fine-tuning and a hierarchical memory-inspired retrieval system, we demonstrate that even modestly sized open-source models, such as Llama-3.1-8B-Instruct, can achieve impersonation abilities at concerning levels. In blind conversation experiments, participants (mis)identified our fine-tuned models with memory integration as human in 44.44% of interactions, compared to just 25.00% for the best prompting-based approach. We analyze these results to propose detection methods and defense strategies against such impersonation attempts. Our findings raise important questions about both the potential applications and risks of personalized language models, particularly regarding privacy, security, and the ethical deployment of such technologies in real-world contexts.

Authors:Xiang Li, Wei He, Per Ola Kristensson
Title: Evaluating the Usability of Microgestures for Text Editing Tasks in Virtual Reality
Abstract:
As virtual reality (VR) continues to evolve, traditional input methods such as handheld controllers and gesture systems often face challenges with precision, social accessibility, and user fatigue. These limitations motivate the exploration of microgestures, which promise more subtle, ergonomic, and device-free interactions. We introduce microGEXT, a lightweight microgesture-based system designed for text editing in VR without external sensors, which utilizes small, subtle hand movements to reduce physical strain compared to standard gestures. We evaluated microGEXT in three user studies. In Study 1 ($N=20$), microGEXT reduced overall edit time and fatigue compared to a ray-casting + pinch menu baseline, the default text editing approach in commercial VR systems. Study 2 ($N=20$) found that microGEXT performed well in short text selection tasks but was slower for longer text ranges. In Study 3 ($N=10$), participants found microGEXT intuitive for open-ended information-gathering tasks. Across all studies, microGEXT demonstrated enhanced user experience and reduced physical effort, offering a promising alternative to traditional VR text editing techniques.

Authors:Riccardo Bovo, Karan Ahuja, Ryo Suzuki, Mustafa Doga Dogan, Mar Gonzalez-Franco
Title: Symbiotic AI: Augmenting Human Cognition from PCs to Cars
Abstract:
As AI takes on increasingly complex roles in human-computer interaction, fundamental questions arise: how can HCI help maintain the user as the primary agent while augment human cognition and intelligence? This paper suggests questions to guide researchers in considering the implications for agency, autonomy, the augmentation of human intellect, and the future of human-AI synergies. We observe a key paradigm shift behind the transformation of HCI, shifting from explicit command-and-control models to systems where users define high-level goals directly. This shift will be facilitated by XR technologies, whose multi-modal inputs and outputs offer a more seamless way to convey these goals. This paper considers this transformation through the lens of two cultural milestones: the personal computer and the automobile, moving beyond traditional interfaces like keyboards or steering wheels and thinking of them as vessels for everyday XR.

Authors:Jeremy D. Webb, Michael Bowman, Songpo Li, Xiaoli Zhang
Title: The Use of Gaze-Derived Confidence of Inferred Operator Intent in Adjusting Safety-Conscious Haptic Assistance
Abstract:
Humans directly completing tasks in dangerous or hazardous conditions is not always possible where these tasks are increasingly be performed remotely by teleoperated robots. However, teleoperation is difficult since the operator feels a disconnect with the robot caused by missing feedback from several senses, including touch, and the lack of depth in the video feedback presented to the operator. To overcome this problem, the proposed system actively infers the operator's intent and provides assistance based on the predicted intent. Furthermore, a novel method of calculating confidence in the inferred intent modifies the human-in-the-loop control. The operator's gaze is employed to intuitively indicate the target before the manipulation with the robot begins. A potential field method is used to provide a guiding force towards the intended target, and a safety boundary reduces risk of damage. Modifying these assistances based on the confidence level in the operator's intent makes the control more natural, and gives the robot an intuitive understanding of its human master. Initial validation results show the ability of the system to improve accuracy, execution time, and reduce operator error.

Authors:Behdokht Kiafar, Pavan Uttej Ravva, Asif Ahmmed Joy, Salam Daher, Roghayeh Leila Barmaki
Title: MENA: Multimodal Epistemic Network Analysis for Visualizing Competencies and Emotions
Abstract:
The need to improve geriatric care quality presents a challenge that requires insights from stakeholders. While simulated trainings can boost competencies, extracting meaningful insights from these practices to enhance simulation effectiveness remains a challenge. In this study, we introduce Multimodal Epistemic Network Analysis (MENA), a novel framework for analyzing caregiver attitudes and emotions in an Augmented Reality setting and exploring how the awareness of a virtual geriatric patient (VGP) impacts these aspects. MENA enhances the capabilities of Epistemic Network Analysis by detecting positive emotions, enabling visualization and analysis of complex relationships between caregiving competencies and emotions in dynamic caregiving practices. The framework provides visual representations that demonstrate how participants provided more supportive care and engaged more effectively in person-centered caregiving with aware VGP. This method could be applicable in any setting that depends on dynamic interpersonal interactions, as it visualizes connections between key elements using network graphs and enables the direct comparison of multiple networks, thereby broadening its implications across various fields.

Authors:Tobias Rau, Tobias Isenberg, Andreas Köhn, Michael Sedlmair, Benjamin Lee
Title: Traversing Dual Realities: Investigating Techniques for Transitioning 3D Objects between Desktop and Augmented Reality Environments
Abstract:
Desktop environments can integrate augmented reality (AR) head-worn devices to support 3D representations, visualizations, and interactions in a novel yet familiar setting. As users navigate across the dual realities -- desktop and AR -- a way to move 3D objects between them is needed. We devise three baseline transition techniques based on common approaches in the literature and evaluate their usability and practicality in an initial user study (N=18). After refining both our transition techniques and the surrounding technical setup, we validate the applicability of the overall concept for real-world activities in an expert user study (N=6). In it, computational chemists followed their usual desktop workflows to build, manipulate, and analyze 3D molecular structures, but now aided with the addition of AR and our transition techniques. Based on our findings from both user studies, we provide lessons learned and takeaways for the design of 3D object transition techniques in desktop + AR environments.

Authors:Yong Ma, Xuedong Zhang, Yuchong Zhang, Morten Fjeld
Title: Measuring User Experience Through Speech Analysis: Insights from HCI Interviews
Abstract:
User satisfaction plays a crucial role in user experience (UX) evaluation. Traditionally, UX measurements are based on subjective scales, such as questionnaires. However, these evaluations may suffer from subjective bias. In this paper, we explore the acoustic and prosodic features of speech to differentiate between positive and neutral UX during interactive sessions. By analyzing speech features such as root-mean-square (RMS), zero-crossing rate(ZCR), jitter, and shimmer, we identified significant differences between the positive and neutral user groups. In addition, social speech features such as activity and engagement also show notable variations between these groups. Our findings underscore the potential of speech analysis as an objective and reliable tool for UX measurement, contributing to more robust and bias-resistant evaluation methodologies. This work offers a novel approach to integrating speech features into UX evaluation and opens avenues for further research in HCI.

Authors:Ying Ma, Shiquan Zhang, Dongju Yang, Zhanna Sarsenbayeva, Jarrod Knibbe, Jorge Goncalves
Title: Raising Awareness of Location Information Vulnerabilities in Social Media Photos using LLMs
Abstract:
Location privacy leaks can lead to unauthorised tracking, identity theft, and targeted attacks, compromising personal security and privacy. This study explores LLM-powered location privacy leaks associated with photo sharing on social media, focusing on user awareness, attitudes, and opinions. We developed and introduced an LLM-powered location privacy intervention app to 19 participants, who used it over a two-week period. The app prompted users to reflect on potential privacy leaks that a widely available LLM could easily detect, such as visual landmarks & cues that could reveal their location, and provided ways to conceal this information. Through in-depth interviews, we found that our intervention effectively increased users' awareness of location privacy and the risks posed by LLMs. It also encouraged users to consider the importance of maintaining control over their privacy data and sparked discussions about the future of location privacy-preserving technologies. Based on these insights, we offer design implications to support the development of future user-centred, location privacy-preserving technologies for social media photos.

Authors:Pragya Singh, Ritvik Budhiraja, Pankaj Jalote, Mohan Kumar, Pushpendra Singh
Title: Translating Emotions to Annotations -- A Participant Perspective of Physiological Emotion Data Collection
Abstract:
Physiological signals hold immense potential for ubiquitous emotion monitoring, presenting numerous applications in emotion recognition. However, harnessing this potential is hindered by significant challenges, particularly in the collection of annotations that align with physiological changes since the process hinges heavily on human participants. In this work, we set out to study human participant perspectives in the emotion data collection procedure. We conducted a lab-based emotion data collection study with 37 participants using 360 degree virtual reality video stimulus followed by semi-structured interviews with the study participants. Our findings presented that intrinsic factors like participant perception, experiment design nuances, and experiment setup suitability impact their emotional response and annotation within lab settings. Drawing from our findings and prior research, we propose recommendations for incorporating participant context into annotations and emphasizing participant-centric experiment designs. Furthermore, we explore current emotion data collection practices followed by AI practitioners and offer insights for future contributions leveraging physiological emotion data.

Authors:Edward Gu, Ho Chit Siu, Melanie Platt, Isabelle Hurley, Jaime Peña, Rohan Paleja
Title: Enabling Rapid Shared Human-AI Mental Model Alignment via the After-Action Review
Abstract:
In this work, we present two novel contributions toward improving research in human-machine teaming (HMT): 1) a Minecraft testbed to accelerate testing and deployment of collaborative AI agents and 2) a tool to allow users to revisit and analyze behaviors within an HMT episode to facilitate shared mental model development. Our browser-based Minecraft testbed allows for rapid testing of collaborative agents in a continuous-space, real-time, partially-observable environment with real humans without cumbersome setup typical to human-AI interaction user studies. As Minecraft has an extensive player base and a rich ecosystem of pre-built AI agents, we hope this contribution can help to facilitate research quickly in the design of new collaborative agents and in understanding different human factors within HMT. Our mental model alignment tool facilitates user-led post-mission analysis by including video displays of first-person perspectives of the team members (i.e., the human and AI) that can be replayed, and a chat interface that leverages GPT-4 to provide answers to various queries regarding the AI's experiences and model details.

Authors:Yan Jia, Harriet Evans, Zoe Porter, Simon Graham, John McDermid, Tom Lawton, David Snead, Ibrahim Habli
Title: The case for delegated AI autonomy for Human AI teaming in healthcare
Abstract:
In this paper we propose an advanced approach to integrating artificial intelligence (AI) into healthcare: autonomous decision support. This approach allows the AI algorithm to act autonomously for a subset of patient cases whilst serving a supportive role in other subsets of patient cases based on defined delegation criteria. By leveraging the complementary strengths of both humans and AI, it aims to deliver greater overall performance than existing human-AI teaming models. It ensures safe handling of patient cases and potentially reduces clinician review time, whilst being mindful of AI tool limitations. After setting the approach within the context of current human-AI teaming models, we outline the delegation criteria and apply them to a specific AI-based tool used in histopathology. The potential impact of the approach and the regulatory requirements for its successful implementation are then discussed.

Authors:Keyon Vafa, Sarah Bentley, Jon Kleinberg, Sendhil Mullainathan
Title: What's Producible May Not Be Reachable: Measuring the Steerability of Generative Models
Abstract:
How should we evaluate the quality of generative models? Many existing metrics focus on a model's producibility, i.e. the quality and breadth of outputs it can generate. However, the actual value from using a generative model stems not just from what it can produce but whether a user with a specific goal can produce an output that satisfies that goal. We refer to this property as steerability. In this paper, we first introduce a mathematical framework for evaluating steerability independently from producibility. Steerability is more challenging to evaluate than producibility because it requires knowing a user's goals. We address this issue by creating a benchmark task that relies on one key idea: sample an output from a generative model and ask users to reproduce it. We implement this benchmark in a large-scale user study of text-to-image models and large language models. Despite the ability of these models to produce high-quality outputs, they all perform poorly on steerabilty. This suggests that we need to focus on improving the steerability of generative models. We show such improvements are indeed possible: through reinforcement learning techniques, we create an alternative steering mechanism for image models that achieves more than 2x improvement on this benchmark.

Authors:Tai Dang, Long-Hung Pham, Sang T. Truong, Ari Glenn, Wendy Nguyen, Edward A. Pham, Jeffrey S. Glenn, Sanmi Koyejo, Thang Luong
Title: Preferential Multi-Objective Bayesian Optimization for Drug Discovery
Abstract:
Despite decades of advancements in automated ligand screening, large-scale drug discovery remains resource-intensive and requires post-processing hit selection, a step where chemists manually select a few promising molecules based on their chemical intuition. This creates a major bottleneck in the virtual screening process for drug discovery, demanding experts to repeatedly balance complex trade-offs among drug properties across a vast pool of candidates. To improve the efficiency and reliability of this process, we propose a novel human-centered framework named CheapVS that allows chemists to guide the ligand selection process by providing preferences regarding the trade-offs between drug properties via pairwise comparison. Our framework combines preferential multi-objective Bayesian optimization with a docking model for measuring binding affinity to capture human chemical intuition for improving hit identification. Specifically, on a library of 100K chemical candidates targeting EGFR and DRD2, CheapVS outperforms state-of-the-art screening methods in identifying drugs within a limited computational budget. Notably, our method can recover up to 16/37 EGFR and 37/58 DRD2 known drugs while screening only 6% of the library, showcasing its potential to significantly advance drug discovery.

Authors:Ashley Suh, Isabelle Hurley, Nora Smith, Ho Chit Siu
Title: Fewer Than 1% of Explainable AI Papers Validate Explainability with Humans
Abstract:
This late-breaking work presents a large-scale analysis of explainable AI (XAI) literature to evaluate claims of human explainability. We collaborated with a professional librarian to identify 18,254 papers containing keywords related to explainability and interpretability. Of these, we find that only 253 papers included terms suggesting human involvement in evaluating an XAI technique, and just 128 of those conducted some form of a human study. In other words, fewer than 1% of XAI papers (0.7%) provide empirical evidence of human explainability when compared to the broader body of XAI literature. Our findings underscore a critical gap between claims of human explainability and evidence-based validation, raising concerns about the rigor of XAI research. We call for increased emphasis on human evaluations in XAI studies and provide our literature search methodology to enable both reproducibility and further investigation into this widespread issue.

Authors:Nicolas Hoferer, Kilian Sprenkamp, Dorian Christoph Quelle, Daniel Gordon Jones, Zoya Katashinskaya, Alexandre Bovet, Liudmila Zavolokina
Title: Effective Yet Ephemeral Propaganda Defense: There Needs to Be More than One-Shot Inoculation to Enhance Critical Thinking
Abstract:
In today's media landscape, propaganda distribution has a significant impact on society. It sows confusion, undermines democratic processes, and leads to increasingly difficult decision-making for news readers. We investigate the lasting effect on critical thinking and propaganda awareness on them when using a propaganda detection and contextualization tool. Building on inoculation theory, which suggests that preemptively exposing individuals to weakened forms of propaganda can improve their resilience against it, we integrate Kahneman's dual-system theory to measure the tools' impact on critical thinking. Through a two-phase online experiment, we measure the effect of several inoculation doses. Our findings show that while the tool increases critical thinking during its use, this increase vanishes without access to the tool. This indicates a single use of the tool does not create a lasting impact. We discuss the implications and propose possible approaches to improve the resilience against propaganda in the long-term.

Authors:Abeer Badawi, Md Tahmid Rahman Laskar, Jimmy Xiangji Huang, Shaina Raza, Elham Dolatabadi
Title: Position: Beyond Assistance -- Reimagining LLMs as Ethical and Adaptive Co-Creators in Mental Health Care
Abstract:
This position paper argues for a fundamental shift in how Large Language Models (LLMs) are integrated into the mental health care domain. We advocate for their role as co-creators rather than mere assistive tools. While LLMs have the potential to enhance accessibility, personalization, and crisis intervention, their adoption remains limited due to concerns about bias, evaluation, over-reliance, dehumanization, and regulatory uncertainties. To address these challenges, we propose two structured pathways: SAFE-i (Supportive, Adaptive, Fair, and Ethical Implementation) Guidelines for ethical and responsible deployment, and HAAS-e (Human-AI Alignment and Safety Evaluation) Framework for multidimensional, human-centered assessment. SAFE-i provides a blueprint for data governance, adaptive model engineering, and real-world integration, ensuring LLMs align with clinical and ethical standards. HAAS-e introduces evaluation metrics that go beyond technical accuracy to measure trustworthiness, empathy, cultural sensitivity, and actionability. We call for the adoption of these structured approaches to establish a responsible and scalable model for LLM-driven mental health support, ensuring that AI complements, rather than replaces, human expertise.

Authors:Yiwen Dong, Jessica Rose, Hae Young Noh
Title: Bridging Structural Dynamics and Biomechanics: Human Motion Estimation through Footstep-Induced Floor Vibrations
Abstract:
Quantitative estimation of human joint motion in daily living spaces is essential for early detection and rehabilitation tracking of neuromusculoskeletal disorders (e.g., Parkinson's) and mitigating trip and fall risks for older adults. Existing approaches involve monitoring devices such as cameras, wearables, and pressure mats, but have operational constraints such as direct line-of-sight, carrying devices, and dense deployment. To overcome these limitations, we leverage gait-induced floor vibration to estimate lower-limb joint motion (e.g., ankle, knee, and hip flexion angles), allowing non-intrusive and contactless gait health monitoring in people's living spaces. To overcome the high uncertainty in lower-limb movement given the limited information provided by the gait-induced floor vibrations, we formulate a physics-informed graph to integrate domain knowledge of gait biomechanics and structural dynamics into the model. Specifically, different types of nodes represent heterogeneous information from joint motions and floor vibrations; Their connecting edges represent the physiological relationships between joints and forces governed by gait biomechanics, as well as the relationships between forces and floor responses governed by the structural dynamics. As a result, our model poses physical constraints to reduce uncertainty while allowing information sharing between the body and the floor to make more accurate predictions. We evaluate our approach with 20 participants through a real-world walking experiment. We achieved an average of 3.7 degrees of mean absolute error in estimating 12 joint flexion angles (38% error reduction from baseline), which is comparable to the performance of cameras and wearables in current medical practices.

Authors:Kimji N. Pellano, Inga Strümke, Daniel Groos, Lars Adde, Pål Haugen, Espen Alexander F. Ihlen
Title: Towards Biomarker Discovery for Early Cerebral Palsy Detection: Evaluating Explanations Through Kinematic Perturbations
Abstract:
Cerebral Palsy (CP) is a prevalent motor disability in children, for which early detection can significantly improve treatment outcomes. While skeleton-based Graph Convolutional Network (GCN) models have shown promise in automatically predicting CP risk from infant videos, their "black-box" nature raises concerns about clinical explainability. To address this, we introduce a perturbation framework tailored for infant movement features and use it to compare two explainable AI (XAI) methods: Class Activation Mapping (CAM) and Gradient-weighted Class Activation Mapping (Grad-CAM). First, we identify significant and non-significant body keypoints in very low- and very high-risk infant video snippets based on the XAI attribution scores. We then conduct targeted velocity and angular perturbations, both individually and in combination, on these keypoints to assess how the GCN model's risk predictions change. Our results indicate that velocity-driven features of the arms, hips, and legs have a dominant influence on CP risk predictions, while angular perturbations have a more modest impact. Furthermore, CAM and Grad-CAM show partial convergence in their explanations for both low- and high-risk CP groups. Our findings demonstrate the use of XAI-driven movement analysis for early CP prediction and offer insights into potential movement-based biomarker discovery that warrant further clinical validation.

Authors:Heye Huang, Zheng Li, Hao Cheng, Haoran Wang, Junkai Jiang, Xiaopeng Li, Arkady Zgonnikov
Title: Understanding Driver Cognition and Decision-Making Behaviors in High-Risk Scenarios: A Drift Diffusion Perspective
Abstract:
Ensuring safe interactions between autonomous vehicles (AVs) and human drivers in mixed traffic systems remains a major challenge, particularly in complex, high-risk scenarios. This paper presents a cognition-decision framework that integrates individual variability and commonalities in driver behavior to quantify risk cognition and model dynamic decision-making. First, a risk sensitivity model based on a multivariate Gaussian distribution is developed to characterize individual differences in risk cognition. Then, a cognitive decision-making model based on the drift diffusion model (DDM) is introduced to capture common decision-making mechanisms in high-risk environments. The DDM dynamically adjusts decision thresholds by integrating initial bias, drift rate, and boundary parameters, adapting to variations in speed, relative distance, and risk sensitivity to reflect diverse driving styles and risk preferences. By simulating high-risk scenarios with lateral, longitudinal, and multidimensional risk sources in a driving simulator, the proposed model accurately predicts cognitive responses and decision behaviors during emergency maneuvers. Specifically, by incorporating driver-specific risk sensitivity, the model enables dynamic adjustments of key DDM parameters, allowing for personalized decision-making representations in diverse scenarios. Comparative analysis with IDM, Gipps, and MOBIL demonstrates that DDM more precisely captures human cognitive processes and adaptive decision-making in high-risk scenarios. These findings provide a theoretical basis for modeling human driving behavior and offer critical insights for enhancing AV-human interaction in real-world traffic environments.

Authors:Arpit Bhatia, Moaaz Hudhud Mughrabi, Diar Abdlkarim, Massimiliano Di Luca, Mar Gonzalez-Franco, Karan Ahuja, Hasti Seifi
Title: Text Entry for XR Trove (TEXT): Collecting and Analyzing Techniques for Text Input in XR
Abstract:
Text entry for extended reality (XR) is far from perfect, and a variety of text entry techniques (TETs) have been proposed to fit various contexts of use. However, comparing between TETs remains challenging due to the lack of a consolidated collection of techniques, and limited understanding of how interaction attributes of a technique (e.g., presence of visual feedback) impact user performance. To address these gaps, this paper examines the current landscape of XR TETs by creating a database of 176 different techniques. We analyze this database to highlight trends in the design of these techniques, the metrics used to evaluate them, and how various interaction attributes impact these metrics. We discuss implications for future techniques and present TEXT: Text Entry for XR Trove, an interactive online tool to navigate our database.

Authors:Yimin Xiao, Cartor Hancock, Sweta Agrawal, Nikita Mehandru, Niloufar Salehi, Marine Carpuat, Ge Gao
Title: Sustaining Human Agency, Attending to Its Cost: An Investigation into Generative AI Design for Non-Native Speakers' Language Use
Abstract:
AI systems and tools today can generate human-like expressions on behalf of people. It raises the crucial question about how to sustain human agency in AI-mediated communication. We investigated this question in the context of machine translation (MT) assisted conversations. Our participants included 45 dyads. Each dyad consisted of one new immigrant in the United States, who leveraged MT for English information seeking as a non-native speaker, and one local native speaker, who acted as the information provider. Non-native speakers could influence the English production of their message in one of three ways: labeling the quality of MT outputs, regular post-editing without additional hints, or augmented post-editing with LLM-generated hints. Our data revealed a greater exercise of non-native speakers' agency under the two post-editing conditions. This benefit, however, came at a significant cost to the dyadic-level communication performance. We derived insights for MT and other generative AI design from our findings.

Authors:Hunter McNichols, Fareya Ikram, Andrew Lan
Title: The StudyChat Dataset: Student Dialogues With ChatGPT in an Artificial Intelligence Course
Abstract:
The widespread availability of large language models (LLMs), such as ChatGPT, has significantly impacted education, raising both opportunities and challenges. Students can frequently interact with LLM-powered, interactive learning tools, but their usage patterns need to be monitored and understood. We introduce StudyChat, a publicly available dataset capturing real-world student interactions with an LLM-powered tutoring chatbot in a semester-long, university-level artificial intelligence (AI) course. We deploy a web application that replicates ChatGPTs core functionalities, and use it to log student interactions with the LLM while working on programming assignments. We collect 16,851 interactions, which we annotate using a dialogue act labeling schema inspired by observed interaction patterns and prior research. We analyze these interactions, highlight usage trends, and analyze how specific student behavior correlates with their course outcome. We find that students who prompt LLMs for conceptual understanding and coding help tend to perform better on assignments and exams. Moreover, students who use LLMs to write reports and circumvent assignment learning objectives have lower outcomes on exams than others. StudyChat serves as a shared resource to facilitate further research on the evolving role of LLMs in education.

Authors:Andreas Jungherr, Adrian Rauchfleisch
Title: Artificial Intelligence in Deliberation: The AI Penalty and the Emergence of a New Deliberative Divide
Abstract:
Digital deliberation has expanded democratic participation, yet challenges remain. This includes processing information at scale, moderating discussions, fact-checking, or attracting people to participate. Recent advances in artificial intelligence (AI) offer potential solutions, but public perceptions of AI's role in deliberation remain underexplored. Beyond efficiency, democratic deliberation is about voice and recognition. If AI is integrated into deliberation, public trust, acceptance, and willingness to participate may be affected. We conducted a preregistered survey experiment with a representative sample in Germany (n=1850) to examine how information about AI-enabled deliberation influences willingness to participate and perceptions of deliberative quality. Respondents were randomly assigned to treatments that provided them information about deliberative tasks facilitated by either AI or humans. Our findings reveal a significant AI-penalty. Participants were less willing to engage in AI-facilitated deliberation and rated its quality lower than human-led formats. These effects were moderated by individual predispositions. Perceptions of AI's societal benefits and anthropomorphization of AI showed positive interaction effects on people's interest to participate in AI-enabled deliberative formats and positive quality assessments, while AI risk assessments showed negative interactions with information about AI-enabled deliberation. These results suggest AI-enabled deliberation faces substantial public skepticism, potentially even introducing a new deliberative divide. Unlike traditional participation gaps based on education or demographics, this divide is shaped by attitudes toward AI. As democratic engagement increasingly moves online, ensuring AI's role in deliberation does not discourage participation or deepen inequalities will be a key challenge for future research and policy.

Authors:Theodore Knoll, Amna Liaqat, Andrés Monroy-Hernández
Title: ARctic Escape: Promoting Social Connection, Teamwork, and Collaboration Using a Co-Located Augmented Reality Escape Room
Abstract:
We present ARctic Escape, a co-located augmented reality (AR) escape room designed to promote collaboration between dyads through play. While physical escape rooms provide groups with fun, social experiences, they require a gameplay venue, props, and a game master, all of which detract from their ease of access. Existing AR escape rooms demonstrate that AR can make escape room experiences easier to access. Still, many AR escape rooms are single-player and therefore fail to maintain the social and collaborative elements of their physical counterparts. This paper presents ARctic Escape, a two-person AR escape room with clues emphasizing player interaction and teamwork. We evaluated ARctic Escape by conducting semi-structured interviews with four dyads to learn about participants' interpersonal dynamics and experiences during gameplay. We found that participants thought the experience was fun, collaborative, promoted discussion, and inspired new social dynamics, but sometimes the escape room's reliance on virtual content was disorienting.

Authors:Franklin Mingzhe Li, Kaitlyn Ng, Bin Zhu, Patrick Carrington
Title: OSCAR: Object Status and Contextual Awareness for Recipes to Support Non-Visual Cooking
Abstract:
Following recipes while cooking is an important but difficult task for visually impaired individuals. We developed OSCAR (Object Status Context Awareness for Recipes), a novel approach that provides recipe progress tracking and context-aware feedback on the completion of cooking tasks through tracking object statuses. OSCAR leverages both Large-Language Models (LLMs) and Vision-Language Models (VLMs) to manipulate recipe steps, extract object status information, align visual frames with object status, and provide cooking progress tracking log. We evaluated OSCAR's recipe following functionality using 173 YouTube cooking videos and 12 real-world non-visual cooking videos to demonstrate OSCAR's capability to track cooking steps and provide contextual guidance. Our results highlight the effectiveness of using object status to improve performance compared to baseline by over 20% across different VLMs, and we present factors that impact prediction performance. Furthermore, we contribute a dataset of real-world non-visual cooking videos with step annotations as an evaluation benchmark.

Authors:Frank Bagehorn, Kristina Brimijoin, Elizabeth M. Daly, Jessica He, Michael Hind, Luis Garces-Erice, Christopher Giblin, Ioana Giurgiu, Jacquelyn Martino, Rahul Nair, David Piorkowski, Ambrish Rawat, John Richards, Sean Rooney, Dhaval Salwala, Seshu Tirupathi, Peter Urbanetz, Kush R. Varshney, Inge Vejsbjerg, Mira L. Wolf-Bauwens
Title: AI Risk Atlas: Taxonomy and Tooling for Navigating AI Risks and Resources
Abstract:
The rapid evolution of generative AI has expanded the breadth of risks associated with AI systems. While various taxonomies and frameworks exist to classify these risks, the lack of interoperability between them creates challenges for researchers, practitioners, and policymakers seeking to operationalise AI governance. To address this gap, we introduce the AI Risk Atlas, a structured taxonomy that consolidates AI risks from diverse sources and aligns them with governance frameworks. Additionally, we present the Risk Atlas Nexus, a collection of open-source tools designed to bridge the divide between risk definitions, benchmarks, datasets, and mitigation strategies. This knowledge-driven approach leverages ontologies and knowledge graphs to facilitate risk identification, prioritization, and mitigation. By integrating AI-assisted compliance workflows and automation strategies, our framework lowers the barrier to responsible AI adoption. We invite the broader research and open-source community to contribute to this evolving initiative, fostering cross-domain collaboration and ensuring AI governance keeps pace with technological advancements.

Authors:Thomas Mildner, Daniel Fidel, Evropi Stefanidi, Pawel W. Wozniak, Rainer Malaka, Jasmin Niess
Title: A Comparative Study of How People With and Without ADHD Recognise and Avoid Dark Patterns on Social Media
Abstract:
Dark patterns are deceptive strategies that recent work in human-computer interaction (HCI) has captured throughout digital domains, including social networking sites (SNSs). While research has identified difficulties among people to recognise dark patterns effectively, few studies consider vulnerable populations and their experience in this regard, including people with attention deficit hyperactivity disorder (ADHD), who may be especially susceptible to attention-grabbing tricks. Based on an interactive web study with 135 participants, we investigate SNS users' ability to recognise and avoid dark patterns by comparing results from participants with and without ADHD. In line with prior work, we noticed overall low recognition of dark patterns with no significant differences between the two groups. Yet, ADHD individuals were able to avoid specific dark patterns more often. Our results advance previous work by understanding dark patterns in a realistic environment and offer insights into their effect on vulnerable populations.

Authors:Le Yue, Tram Thi Minh Tran, Xinyan Yu, Marius Hoggenmueller
Title: Enhancing Autonomous Vehicle-Pedestrian Interaction in Shared Spaces: The Impact of Intended Path-Projection
Abstract:
External Human-Machine Interfaces (eHMIs) are critical for seamless interactions between autonomous vehicles (AVs) and pedestrians in shared spaces. However, they often struggle to adapt to these environments, where pedestrian movement is fluid and right-of-way is ambiguous. To address these challenges, we propose PaveFlow, an eHMI that projects the AV's intended path onto the ground in real time, providing continuous spatial information rather than a binary stop/go signal. Through a VR study (N=18), we evaluated PaveFlow's effectiveness under two AV density conditions (single vs. multiple AVs) and a baseline condition without PaveFlow. The results showed that PaveFlow significantly improved pedestrian perception of safety, trust, and user experience while reducing cognitive workload. This performance remained consistent across both single and multiple AV conditions, despite persistent tensions in priority negotiation. These findings suggest that path projection enhances eHMI transparency by offering richer movement cues, which may better support AV-pedestrian interaction in shared spaces.

Authors:Si Chen, Reid Metoyer, Khiem Le, Adam Acunin, Izzy Molnar, Alex Ambrose, James Lang, Nitesh Chawla, Ronald Metoyer
Title: Bridging the AI Adoption Gap: Designing an Interactive Pedagogical Agent for Higher Education Instructors
Abstract:
Instructors play a pivotal role in integrating AI into education, yet their adoption of AI-powered tools remains inconsistent. Despite this, limited research explores how to design AI tools that support broader instructor adoption. This study applies a human-centered design approach, incorporating qualitative methods, to investigate the design of interactive pedagogical agents that provide instructional suggestions in response to instructors' questions. We conducted a formative study involving interviews with five pedagogy experts to examine existing strategies for supporting instructors' pedagogical needs. Building on these insights, we facilitated a participatory design session with ten pedagogy experts, where participants reviewed a storyboard depicting a chatbot designed for instructors with varying levels of AI literacy and differing attitudes toward AI. Experts also evaluated the quality of LLM-generated suggestions based on common teaching challenges. Our findings highlight the need for chatbot interactions that foster trust, especially for AI-conservative instructors. Experts emphasized the importance of social transparency (for example, showing how peers use the tool) and allowing instructors to flexibly control how much or how little they engage with the system. We also propose design recommendations to enhance the quality of AI-generated teaching suggestions, such as adapting them to reflect instructors' prior teaching experience. This work underscores the urgent need to support AI-conservative instructors, as AI literacy and attitudes are closely intertwined. Without thoughtful design, there is a risk of widening pedagogical divides and reducing students' learning opportunities.

Authors:Tim Maurer, Abdulrahman Mohamed Selim, Hasan Md Tusfiqur Alam, Matthias Eiletz, Michael Barz, Daniel Sonntag
Title: InFL-UX: A Toolkit for Web-Based Interactive Federated Learning
Abstract:
This paper presents InFL-UX, an interactive, proof-of-concept browser-based Federated Learning (FL) toolkit designed to integrate user contributions seamlessly into the machine learning (ML) workflow. InFL-UX enables users across multiple devices to upload datasets, define classes, and collaboratively train classification models directly in the browser using modern web technologies. Unlike traditional FL toolkits, which often focus on backend simulations, InFL-UX provides a simple user interface for researchers to explore how users interact with and contribute to FL systems in real-world, interactive settings. By prioritising usability and decentralised model training, InFL-UX bridges the gap between FL and Interactive Machine Learning (IML), empowering non-technical users to actively participate in ML classification tasks.

Authors:Yuyan Wu, Yiwen Dong, Sumer Vaid, Gabriella M. Harari, Hae Young Noh
Title: Personalized Emotion Detection from Floor Vibrations Induced by Footsteps
Abstract:
Emotion recognition is critical for various applications such as early detection of mental health disorders and emotion based smart home systems. Previous studies used various sensing methods for emotion recognition, such as wearable sensors, cameras, and microphones. However, these methods have limitations in long term domestic, including intrusiveness and privacy concerns. To overcome these limitations, this paper introduces a nonintrusive and privacy friendly personalized emotion recognition system, EmotionVibe, which leverages footstep induced floor vibrations for emotion recognition. The main idea of EmotionVibe is that individuals' emotional states influence their gait patterns, subsequently affecting the floor vibrations induced by their footsteps. However, there are two main research challenges: 1) the complex and indirect relationship between human emotions and footstep induced floor vibrations and 2) the large between person variations within the relationship between emotions and gait patterns. To address these challenges, we first empirically characterize this complex relationship and develop an emotion sensitive feature set including gait related and vibration related features from footstep induced floor vibrations. Furthermore, we personalize the emotion recognition system for each user by calculating gait similarities between the target person (i.e., the person whose emotions we aim to recognize) and those in the training dataset and assigning greater weights to training people with similar gait patterns in the loss function. We evaluated our system in a real-world walking experiment with 20 participants, summing up to 37,001 footstep samples. EmotionVibe achieved the mean absolute error (MAE) of 1.11 and 1.07 for valence and arousal score estimations, respectively, reflecting 19.0% and 25.7% error reduction compared to the baseline method.

Authors:Prakash Shukla, Phuong Bui, Sean S Levy, Max Kowalski, Ali Baigelenov, Paul Parsons
Title: De-skilling, Cognitive Offloading, and Misplaced Responsibilities: Potential Ironies of AI-Assisted Design
Abstract:
The rapid adoption of generative AI (GenAI) in design has sparked discussions about its benefits and unintended consequences. While AI is often framed as a tool for enhancing productivity by automating routine tasks, historical research on automation warns of paradoxical effects, such as de-skilling and misplaced responsibilities. To assess UX practitioners' perceptions of AI, we analyzed over 120 articles and discussions from UX-focused subreddits. Our findings indicate that while practitioners express optimism about AI reducing repetitive work and augmenting creativity, they also highlight concerns about over-reliance, cognitive offloading, and the erosion of critical design skills. Drawing from human-automation interaction literature, we discuss how these perspectives align with well-documented automation ironies and function allocation challenges. We argue that UX professionals should critically evaluate AI's role beyond immediate productivity gains and consider its long-term implications for creative autonomy and expertise. This study contributes empirical insights into practitioners' perspectives and links them to broader debates on automation in design.

Authors:Ali Baigelenov, Prakash Shukla, Zixu Zhang, Paul Parsons
Title: Are Cognitive Biases as Important as they Seem for Data Visualization?
Abstract:
Research on cognitive biases and heuristics has become increasingly popular in the visualization literature in recent years. Researchers have studied the effects of biases on visualization interpretation and subsequent decision-making. While this work is important, we contend that the view on biases has presented human cognitive abilities in an unbalanced manner, placing too much emphasis on the flaws and limitations of human decision-making, and potentially suggesting that it should not be trusted. Several decision researchers have argued that the flip side of biases -- i.e., mental shortcuts or heuristics -- demonstrate human ingenuity and serve as core markers of adaptive expertise. In this paper, we review the perspectives and sentiments of the visualization community on biases and describe literature arguing for more balanced views of biases and heuristics. We hope this paper will encourage visualization researchers to consider a fuller picture of human cognitive limitations and strategies for making decisions in complex environments.

Authors:Faraz Faruqi, Maxine Perroni-Scharf, Jaskaran Singh Walia, Yunyi Zhu, Shuyue Feng, Donald Degraen, Stefanie Mueller
Title: TactStyle: Generating Tactile Textures with Generative AI for Digital Fabrication
Abstract:
Recent work in Generative AI enables the stylization of 3D models based on image prompts. However, these methods do not incorporate tactile information, leading to designs that lack the expected tactile properties. We present TactStyle, a system that allows creators to stylize 3D models with images while incorporating the expected tactile properties. TactStyle accomplishes this using a modified image-generation model fine-tuned to generate heightfields for given surface textures. By optimizing 3D model surfaces to embody a generated texture, TactStyle creates models that match the desired style and replicate the tactile experience. We utilize a large-scale dataset of textures to train our texture generation model. In a psychophysical experiment, we evaluate the tactile qualities of a set of 3D-printed original textures and TactStyle's generated textures. Our results show that TactStyle successfully generates a wide range of tactile features from a single image input, enabling a novel approach to haptic design.

Authors:Xiang Li, Per Ola Kristensson
Title: Optimizing Curve-Based Selection with On-Body Surfaces in Virtual Environments
Abstract:
Virtual Reality (VR) interfaces often rely on linear ray-casting for object selection but struggle with precision in dense or occluded environments. This late-breaking work introduces an optimized dual-layered selection mechanism combining dynamic Bezier Curves, controlled via finger gestures, with on-body interaction surfaces to enhance precision and immersion. Bezier Curves offer fine-grained control and flexibility in complex scenarios, while on-body surfaces project nearby virtual objects onto the user's forearm, leveraging proprioception and tactile feedback. A preliminary qualitative study ($N$ = 24) compared two interaction paradigms (Bezier Curve vs. Linear Ray) and two interaction media (On-body vs. Mid-air). Participants praised the Bezier Curve's ability to target occluded objects but noted the physical demand. On-body interactions were favored for their immersive qualities, while mid-air interactions were appreciated for maintaining focus on the virtual scene. These findings highlight the importance of balancing ease of learning and precise control when designing VR selection techniques, opening avenues for further exploration of curve-based and on-body interactions in dense virtual environments.

Authors:Victor Nikhil Antony, Maia Stiber, Chien-Ming Huang
Title: Xpress: A System For Dynamic, Context-Aware Robot Facial Expressions using Language Models
Abstract:
Facial expressions are vital in human communication and significantly influence outcomes in human-robot interaction (HRI), such as likeability, trust, and companionship. However, current methods for generating robotic facial expressions are often labor-intensive, lack adaptability across contexts and platforms, and have limited expressive ranges--leading to repetitive behaviors that reduce interaction quality, particularly in long-term scenarios. We introduce Xpress, a system that leverages language models (LMs) to dynamically generate context-aware facial expressions for robots through a three-phase process: encoding temporal flow, conditioning expressions on context, and generating facial expression code. We demonstrated Xpress as a proof-of-concept through two user studies (n=15x2) and a case study with children and parents (n=13), in storytelling and conversational scenarios to assess the system's context-awareness, expressiveness, and dynamism. Results demonstrate Xpress's ability to dynamically produce expressive and contextually appropriate facial expressions, highlighting its versatility and potential in HRI applications.

Authors:Julian Rasch, Matthias Wilhalm, Florian Müller, Francesco Chiossi
Title: AR You on Track? Investigating Effects of Augmented Reality Anchoring on Dual-Task Performance While Walking
Abstract:
With the increasing spread of AR head-mounted displays suitable for everyday use, interaction with information becomes ubiquitous, even while walking. However, this requires constant shifts of our attention between walking and interacting with virtual information to fulfill both tasks adequately. Accordingly, we as a community need a thorough understanding of the mutual influences of walking and interacting with digital information to design safe yet effective interactions. Thus, we systematically investigate the effects of different AR anchors (hand, head, torso) and task difficulties on user experience and performance. We engage participants (n=26) in a dual-task paradigm involving a visual working memory task while walking. We assess the impact of dual-tasking on both virtual and walking performance, and subjective evaluations of mental and physical load. Our results show that head-anchored AR content least affected walking while allowing for fast and accurate virtual task interaction, while hand-anchored content increased reaction times and workload.

Authors:Thomas Norrenbrock, Timo Kaiser, Sovan Biswas, Ramesh Manuvinakurike, Bodo Rosenhahn
Title: QPM: Discrete Optimization for Globally Interpretable Image Classification
Abstract:
Understanding the classifications of deep neural networks, e.g. used in safety-critical situations, is becoming increasingly important. While recent models can locally explain a single decision, to provide a faithful global explanation about an accurate model's general behavior is a more challenging open task. Towards that goal, we introduce the Quadratic Programming Enhanced Model (QPM), which learns globally interpretable class representations. QPM represents every class with a binary assignment of very few, typically 5, features, that are also assigned to other classes, ensuring easily comparable contrastive class representations. This compact binary assignment is found using discrete optimization based on predefined similarity measures and interpretability constraints. The resulting optimal assignment is used to fine-tune the diverse features, so that each of them becomes the shared general concept between the assigned classes. Extensive evaluations show that QPM delivers unprecedented global interpretability across small and large-scale datasets while setting the state of the art for the accuracy of interpretable models.

Authors:Yugin Tan, Kai Xin Soh, Renwen Zhang, Jungup Lee, Han Meng, Biswadeep Sen, Yi-Chieh Lee
Title: Empowering Social Service with AI: Insights from a Participatory Design Study with Practitioners
Abstract:
In social service, administrative burdens and decision-making challenges often hinder practitioners from performing effective casework. Generative AI (GenAI) offers significant potential to streamline these tasks, yet exacerbates concerns about overreliance, algorithmic bias, and loss of identity within the profession. We explore these issues through a two-stage participatory design study. We conducted formative co-design workshops (\textit{n=27}) to create a prototype GenAI tool, followed by contextual inquiry sessions with practitioners (\textit{n=24}) using the tool with real case data. We reveal opportunities for AI integration in documentation, assessment, and worker supervision, while highlighting risks related to GenAI limitations, skill retention, and client safety. Drawing comparisons with GenAI tools in other fields, we discuss design and usage guidelines for such tools in social service practice.

Authors:Albin Zeqiri, Julian Britten, Clara Schramm, Pascal Jansen, Michael Rietzler, Enrico Rukzio
Title: PlantPal: Leveraging Precision Agriculture Robots to Facilitate Remote Engagement in Urban Gardening
Abstract:
Urban gardening is widely recognized for its numerous health and environmental benefits. However, the lack of suitable garden spaces, demanding daily schedules and limited gardening expertise present major roadblocks for citizens looking to engage in urban gardening. While prior research has explored smart home solutions to support urban gardeners, these approaches currently do not fully address these practical barriers. In this paper, we present PlantPal, a system that enables the cultivation of garden spaces irrespective of one's location, expertise level, or time constraints. PlantPal enables the shared operation of a precision agriculture robot (PAR) that is equipped with garden tools and a multi-camera system. Insights from a 3-week deployment (N=18) indicate that PlantPal facilitated the integration of gardening tasks into daily routines, fostered a sense of connection with one's field, and provided an engaging experience despite the remote setting. We contribute design considerations for future robot-assisted urban gardening concepts.

Authors:Juhoon Lee, Seoyoung Kim, Yeon Su Park, Juho Kim, Jeong-woo Jang, Joseph Seering
Title: Less Talk, More Trust: Understanding Players' In-game Assessment of Communication Processes in League of Legends
Abstract:
In-game team communication in online multiplayer games has shown the potential to foster efficient collaboration and positive social interactions. Yet players often associate communication within ad hoc teams with frustration and wariness. Though previous works have quantitatively analyzed communication patterns at scale, few have identified the motivations of how a player makes in-the-moment communication decisions. In this paper, we conducted an observation study with 22 League of Legends players by interviewing them during Solo Ranked games on their use of four in-game communication media (chat, pings, emotes, votes). We performed thematic analysis to understand players' in-context assessment and perception of communication attempts. We demonstrate that players evaluate communication opportunities on proximate game states bound by player expectations and norms. Our findings illustrate players' tendency to view communication, regardless of its content, as a precursor to team breakdowns. We build upon these findings to motivate effective player-oriented communication design in online games.

Authors:Susobhan Ghosh, Pei-Yao Hung, Lara N. Coughlin, Erin E. Bonar, Yongyi Guo, Inbal Nahum-Shani, Maureen Walton, Mark W. Newman, Susan A. Murphy
Title: "It felt more real": Investigating the User Experience of the MiWaves Personalizing JITAI Pilot Study
Abstract:
Cannabis use among emerging adults is increasing globally, posing significant health risks and creating a need for effective interventions. We present an exploratory analysis of the MiWaves pilot study, a digital intervention aimed at supporting cannabis use reduction among emerging adults (ages 18-25). Our findings indicate the potential of self-monitoring check-ins and trend visualizations in fostering self-awareness and promoting behavioral reflection in participants. MiWaves intervention message timing and frequency were also generally well-received by the participants. The participants' perception of effort were queried on intervention messages with different tasks, and our findings suggest that messages with tasks like exploring links and typing in responses are perceived as requiring more effort as compared to messages with tasks involving reading and acknowledging. Finally, we discuss the findings and limitations from this study and analysis, and their impact on informing future iterations on MiWaves.

Authors:Francesco Vona, Maximilian Warsinke, Tanja Kojic, Jan-Niklas Voit-Antons, Sebastian Moller
Title: User-Centric Evaluation Methods for Digital Twin Applications in Extended Reality
Abstract:
The integration of Digital Twins with Extended Reality technologies, such as Virtual Reality and Augmented Reality, is transforming industries by enabling more immersive, interactive experiences and enhancing real time decision making. User centered evaluations are crucial for aligning XR enhanced DT systems with user expectations, enhancing acceptance and utility in real world settings. This paper proposes a user centric evaluation method for XR enhanced DT applications to assess usability, cognitive load, and user experience. By employing a range of assessment tools, including questionnaires and observational studies across various use cases, such as virtual tourism, city planning, and industrial maintenance, this method provides a structured approach to capturing the users perspective.

Authors:Jibang Wu, Chenghao Yang, Simon Mahns, Yi Wu, Chaoqi Wang, Hao Zhu, Fei Fang, Haifeng Xu
Title: Grounded Persuasive Language Generation for Automated Marketing
Abstract:
This paper develops an agentic framework that employs large language models (LLMs) to automate the generation of persuasive and grounded marketing content, using real estate listing descriptions as our focal application domain. Our method is designed to align the generated content with user preferences while highlighting useful factual attributes. This agent consists of three key modules: (1) Grounding Module, mimicking expert human behavior to predict marketable features; (2) Personalization Module, aligning content with user preferences; (3) Marketing Module, ensuring factual accuracy and the inclusion of localized features. We conduct systematic human-subject experiments in the domain of real estate marketing, with a focus group of potential house buyers. The results demonstrate that marketing descriptions generated by our approach are preferred over those written by human experts by a clear margin while maintaining the same level of factual accuracy. Our findings suggest a promising agentic approach to automate large-scale targeted marketing while ensuring factuality of content generation.

Authors:Runlong Ye, Matthew Varona, Oliver Huang, Patrick Yung Kang Lee, Michael Liut, Carolina Nobre
Title: The Design Space of Recent AI-assisted Research Tools for Ideation, Sensemaking, and Scientific Creativity
Abstract:
Generative AI (GenAI) tools are radically expanding the scope and capability of automation in knowledge work such as academic research. While promising for augmenting cognition and streamlining processes, AI-assisted research tools may also increase automation bias and hinder critical thinking. To examine recent developments, we surveyed publications from leading HCI venues over the past three years, closely analyzing thirteen tools to better understand the novel capabilities of these AI-assisted systems and the design spaces they enable: seven employing traditional AI or customized transformer-based approaches, and six integrating open-access large language models (LLMs). Our analysis characterizes the emerging design space, distinguishes between tools focused on workflow mimicry versus generative exploration, and yields four critical design recommendations to guide the development of future systems that foster meaningful cognitive engagement: providing user agency and control, differentiating divergent/convergent thinking support, ensuring adaptability, and prioritizing transparency/accuracy. This work discusses how these insights signal a shift from mere workflow replication towards generative co-creation, presenting new opportunities for the community to craft intuitive, AI-driven research interfaces and interactions.

Authors:Ali Baigelenov, Prakash Shukla, Paul Parsons
Title: How Visualization Designers Perceive and Use Inspiration
Abstract:
Inspiration plays an important role in design, yet its specific impact on data visualization design practice remains underexplored. This study investigates how professional visualization designers perceive and use inspiration in their practice. Through semi-structured interviews, we examine their sources of inspiration, the value they place on them, and how they navigate the balance between inspiration and imitation. Our findings reveal that designers draw from a diverse array of sources, including existing visualizations, real-world phenomena, and personal experiences. Participants describe a mix of active and passive inspiration practices, often iterating on sources to create original designs. This research offers insights into the role of inspiration in visualization practice, the need to expand visualization design theory, and the implications for the development of visualization tools that support inspiration and for training future visualization designers.

Authors:Luis Antonio Gutiérrez Guanilo, Mir Tafseer Nayeem, Cristian López, Davood Rafiei
Title: eC-Tab2Text: Aspect-Based Text Generation from e-Commerce Product Tables
Abstract:
Large Language Models (LLMs) have demonstrated exceptional versatility across diverse domains, yet their application in e-commerce remains underexplored due to a lack of domain-specific datasets. To address this gap, we introduce eC-Tab2Text, a novel dataset designed to capture the intricacies of e-commerce, including detailed product attributes and user-specific queries. Leveraging eC-Tab2Text, we focus on text generation from product tables, enabling LLMs to produce high-quality, attribute-specific product reviews from structured tabular data. Fine-tuned models were rigorously evaluated using standard Table2Text metrics, alongside correctness, faithfulness, and fluency assessments. Our results demonstrate substantial improvements in generating contextually accurate reviews, highlighting the transformative potential of tailored datasets and fine-tuning methodologies in optimizing e-commerce workflows. This work highlights the potential of LLMs in e-commerce workflows and the essential role of domain-specific datasets in tailoring them to industry-specific challenges.

Authors:Karthik Sreedhar, Alice Cai, Jenny Ma, Jeffrey V. Nickerson, Lydia B. Chilton
Title: Simulating Cooperative Prosocial Behavior with Multi-Agent LLMs: Evidence and Mechanisms for AI Agents to Inform Policy Decisions
Abstract:
Human prosocial cooperation is essential for our collective health, education, and welfare. However, designing social systems to maintain or incentivize prosocial behavior is challenging because people can act selfishly to maximize personal gain. This complex and unpredictable aspect of human behavior makes it difficult for policymakers to foresee the implications of their designs. Recently, multi-agent LLM systems have shown remarkable capabilities in simulating human-like behavior, and replicating some human lab experiments. This paper studies how well multi-agent systems can simulate prosocial human behavior, such as that seen in the public goods game (PGG), and whether multi-agent systems can exhibit ``unbounded actions'' seen outside the lab in real world scenarios. We find that multi-agent LLM systems successfully replicate human behavior from lab experiments of the public goods game with three experimental treatments - priming, transparency, and varying endowments. Beyond replicating existing experiments, we find that multi-agent LLM systems can replicate the expected human behavior when combining experimental treatments, even if no previous study combined those specific treatments. Lastly, we find that multi-agent systems can exhibit a rich set of unbounded actions that people do in the real world outside of the lab -- such as collaborating and even cheating. In sum, these studies are steps towards a future where LLMs can be used to inform policy decisions that encourage people to act in a prosocial manner.

Authors:Yongsu Ahn, Yu-Ru Lin, Malihe Alikhani, Eunjeong Cheon
Title: Human-centered explanation does not fit all: The interplay of sociotechnical, cognitive, and individual factors in the effect AI explanations in algorithmic decision-making
Abstract:
Recent XAI studies have investigated what constitutes a \textit{good} explanation in AI-assisted decision-making. Despite the widely accepted human-friendly properties of explanations, such as contrastive and selective, existing studies have yielded inconsistent findings. To address these gaps, our study focuses on the cognitive dimensions of explanation evaluation, by evaluating six explanations with different contrastive strategies and information selectivity and scrutinizing factors behind their valuation process. Our analysis results find that contrastive explanations are not the most preferable or understandable in general; Rather, different contrastive and selective explanations were appreciated to a different extent based on who they are, when, how, and what to explain -- with different level of cognitive load and engagement and sociotechnical contexts. Given these findings, we call for a nuanced view of explanation strategies, with implications for designing AI interfaces to accommodate individual and contextual differences in AI-assisted decision-making.

Authors:Ryo Takahashi, Yoshihiro Kawahara
Title: Wireless charging and readout via textile coil for continuous full-body wearable computing
Abstract:
The growing use of wearable devices for activity tracking, healthcare, and haptics faces challenges due to the bulkiness and short lifespan of batteries. Integration of a textile-based wireless charging and readout system into everyday clothing can enable seamless power supply and data collection around the body. However, expanding such system to cover the entire body is challenging, as it increases electromagnetic interference with the body, degrading the performance of wireless system. This article introduces a meandered textile coil designed for body-scale, efficient wireless charging and readout. The meander coil can confine a strong inductive field near the body surface, ensuring W-class safe charging and sensitive readout with uW-class low power. Moreover, its zigzag design is simple enough for mass production on industrial knitting machines. Therefore, the body-scale meander coil can continuously operate battery-free wearable devices across the body, leading to ubiquitous deployment of continuous full-body wearable computing into everyday clothing.

Authors:Yaxiong Lei, Yuheng Wang, Fergus Buchanan, Mingyue Zhao, Yusuke Sugano, Shijing He, Mohamed Khamis, Juan Ye
Title: Quantifying the Impact of Motion on 2D Gaze Estimation in Real-World Mobile Interactions
Abstract:
Mobile gaze tracking involves inferring a user's gaze point or direction on a mobile device's screen from facial images captured by the device's front camera. While this technology inspires an increasing number of gaze-interaction applications, achieving consistent accuracy remains challenging due to dynamic user-device spatial relationships and varied motion conditions inherent in mobile contexts. This paper provides empirical evidence on how user mobility and behaviour affect mobile gaze tracking accuracy. We conduct two user studies collecting behaviour and gaze data under various motion conditions - from lying to maze navigation - and during different interaction tasks. Quantitative analysis has revealed behavioural regularities among daily tasks and identified head distance, head pose, and device orientation as key factors affecting accuracy, with errors increasing by up to 48.91% in dynamic conditions compared to static ones. These findings highlight the need for more robust, adaptive eye-tracking systems that account for head movements and device deflection to maintain accuracy across diverse mobile contexts.

Authors:Vincent Aleven, Conrad Borchers, Yun Huang, Tomohiro Nagashima, Bruce McLaren, Paulo Carvalho, Octav Popescu, Jonathan Sewall, Kenneth Koedinger
Title: An Integrated Platform for Studying Learning with Intelligent Tutoring Systems: CTAT+TutorShop
Abstract:
Intelligent tutoring systems (ITSs) are effective in helping students learn; further research could make them even more effective. Particularly desirable is research into how students learn with these systems, how these systems best support student learning, and what learning sciences principles are key in ITSs. CTAT+Tutorshop provides a full stack integrated platform that facilitates a complete research lifecycle with ITSs, which includes using ITS data to discover learner challenges, to identify opportunities for system improvements, and to conduct experimental studies. The platform includes authoring tools to support and accelerate development of ITS, which provide automatic data logging in a format compatible with DataShop, an independent site that supports the analysis of ed tech log data to study student learnings. Among the many technology platforms that exist to support learning sciences research, CTAT+Tutorshop may be the only one that offers researchers the possibility to author elements of ITSs, or whole ITSs, as part of designing studies. This platform has been used to develop and conduct an estimated 147 research studies which have run in a wide variety of laboratory and real-world educational settings, including K-12 and higher education, and have addressed a wide range of research questions. This paper presents five case studies of research conducted on the CTAT+Tutorshop platform, and summarizes what has been accomplished and what is possible for future researchers. We reflect on the distinctive elements of this platform that have made it so effective in facilitating a wide range of ITS research.

Authors:Tianyu Song, Felix Pabst, Ulrich Eck, Nassir Navab
Title: Enhancing Patient Acceptance of Robotic Ultrasound through Conversational Virtual Agent and Immersive Visualizations
Abstract:
Robotic ultrasound systems can enhance medical diagnostics, but patient acceptance is a challenge. We propose a system combining an AI-powered conversational virtual agent with three mixed reality visualizations to improve trust and comfort. The virtual agent, powered by a large language model, engages in natural conversations and guides the ultrasound robot, enhancing interaction reliability. The visualizations include augmented reality, augmented virtuality, and fully immersive virtual reality, each designed to create patient-friendly experiences. A user study demonstrated significant improvements in trust and acceptance, offering valuable insights for designing mixed reality and virtual agents in autonomous medical procedures.

Authors:Zisu Li, Jiawei Li, Zeyu Xiong, Shumeng Zhang, Faraz Faruqi, Stefanie Mueller, Chen Liang, Xiaojuan Ma, Mingming Fan
Title: InteRecon: Towards Reconstructing Interactivity of Personal Memorable Items in Mixed Reality
Abstract:
Digital capturing of memorable personal items is a key way to archive personal memories. Although current digitization methods (e.g., photos, videos, 3D scanning) can replicate the physical appearance of an item, they often cannot preserve its real-world interactivity. We present Interactive Digital Item (IDI), a concept of reconstructing both the physical appearance and, more importantly, the interactivity of an item. We first conducted a formative study to understand users' expectations of IDI, identifying key physical interactivity features, including geometry, interfaces, and embedded content of items. Informed by these findings, we developed InteRecon, an AR prototype enabling personal reconstruction functions for IDI creation. An exploratory study was conducted to assess the feasibility of using InteRecon and explore the potential of IDI to enrich personal memory archives. Results show that InteRecon is feasible for IDI creation, and the concept of IDI brings new opportunities for augmenting personal memory archives.

Authors:Sunnie S. Y. Kim, Jennifer Wortman Vaughan, Q. Vera Liao, Tania Lombrozo, Olga Russakovsky
Title: Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies
Abstract:
Large language models (LLMs) can produce erroneous responses that sound fluent and convincing, raising the risk that users will rely on these responses as if they were correct. Mitigating such overreliance is a key challenge. Through a think-aloud study in which participants use an LLM-infused application to answer objective questions, we identify several features of LLM responses that shape users' reliance: explanations (supporting details for answers), inconsistencies in explanations, and sources. Through a large-scale, pre-registered, controlled experiment (N=308), we isolate and study the effects of these features on users' reliance, accuracy, and other measures. We find that the presence of explanations increases reliance on both correct and incorrect responses. However, we observe less reliance on incorrect responses when sources are provided or when explanations exhibit inconsistencies. We discuss the implications of these findings for fostering appropriate reliance on LLMs.

Authors:Marco Rondina, Antonio Vetrò, Riccardo Coppola, Oumaima Regragrui, Alessandro Fabris, Gianmaria Silvello, Gian Antonio Susto, Juan Carlos De Martin
Title: Testing software for non-discrimination: an updated and extended audit in the Italian car insurance domain
Abstract:
Context. As software systems become more integrated into society's infrastructure, the responsibility of software professionals to ensure compliance with various non-functional requirements increases. These requirements include security, safety, privacy, and, increasingly, non-discrimination. Motivation. Fairness in pricing algorithms grants equitable access to basic services without discriminating on the basis of protected attributes. Method. We replicate a previous empirical study that used black box testing to audit pricing algorithms used by Italian car insurance companies, accessible through a popular online system. With respect to the previous study, we enlarged the number of tests and the number of demographic variables under analysis. Results. Our work confirms and extends previous findings, highlighting the problematic permanence of discrimination across time: demographic variables significantly impact pricing to this day, with birthplace remaining the main discriminatory factor against individuals not born in Italian cities. We also found that driver profiles can determine the number of quotes available to the user, denying equal opportunities to all. Conclusion. The study underscores the importance of testing for non-discrimination in software systems that affect people's everyday lives. Performing algorithmic audits over time makes it possible to evaluate the evolution of such algorithms. It also demonstrates the role that empirical software engineering can play in making software systems more accountable.

Authors:Han Meng, Renwen Zhang, Ganyi Wang, Yitian Yang, Peinuan Qin, Jungup Lee, Yi-Chieh Lee
Title: Deconstructing Depression Stigma: Integrating AI-driven Data Collection and Analysis with Causal Knowledge Graphs
Abstract:
Mental-illness stigma is a persistent social problem, hampering both treatment-seeking and recovery. Accordingly, there is a pressing need to understand it more clearly, but analyzing the relevant data is highly labor-intensive. Therefore, we designed a chatbot to engage participants in conversations; coded those conversations qualitatively with AI assistance; and, based on those coding results, built causal knowledge graphs to decode stigma. The results we obtained from 1,002 participants demonstrate that conversation with our chatbot can elicit rich information about people's attitudes toward depression, while our AI-assisted coding was strongly consistent with human-expert coding. Our novel approach combining large language models (LLMs) and causal knowledge graphs uncovered patterns in individual responses and illustrated the interrelationships of psychological constructs in the dataset as a whole. The paper also discusses these findings' implications for HCI researchers in developing digital interventions, decomposing human psychological constructs, and fostering inclusive attitudes.

Authors:Jenny S Wang, Samar Haider, Amir Tohidi, Anushkaa Gupta, Yuxuan Zhang, Chris Callison-Burch, David Rothschild, Duncan J Watts
Title: Media Bias Detector: Designing and Implementing a Tool for Real-Time Selection and Framing Bias Analysis in News Coverage
Abstract:
Mainstream media, through their decisions on what to cover and how to frame the stories they cover, can mislead readers without using outright falsehoods. Therefore, it is crucial to have tools that expose these editorial choices underlying media bias. In this paper, we introduce the Media Bias Detector, a tool for researchers, journalists, and news consumers. By integrating large language models, we provide near real-time granular insights into the topics, tone, political lean, and facts of news articles aggregated to the publisher level. We assessed the tool's impact by interviewing 13 experts from journalism, communications, and political science, revealing key insights into usability and functionality, practical applications, and AI's role in powering media bias tools. We explored this in more depth with a follow-up survey of 150 news consumers. This work highlights opportunities for AI-driven tools that empower users to critically engage with media content, particularly in politically charged environments.

Authors:Wenhan Lyu, Shuang Zhang, Tingting, Chung, Yifan Sun, Yixuan Zhang
Title: Understanding the Practices, Perceptions, and (Dis)Trust of Generative AI among Instructors: A Mixed-methods Study in the U.S. Higher Education
Abstract:
Generative AI (GenAI) has brought opportunities and challenges for higher education as it integrates into teaching and learning environments. As instructors navigate this new landscape, understanding their engagement with and attitudes toward GenAI is crucial. We surveyed 178 instructors from a single U.S. university to examine their current practices, perceptions, trust, and distrust of GenAI in higher education in March 2024. While most surveyed instructors reported moderate to high familiarity with GenAI-related concepts, their actual use of GenAI tools for direct instructional tasks remained limited. Our quantitative results show that trust and distrust in GenAI are related yet distinct; high trust does not necessarily imply low distrust, and vice versa. We also found significant differences in surveyed instructors' familiarity with GenAI across different trust and distrust groups. Our qualitative results show nuanced manifestations of trust and distrust among surveyed instructors and various approaches to support calibrated trust in GenAI. We discuss practical implications focused on (dis)trust calibration among instructors.

Authors:Mingjun Li, Natasha Kholgade Banerjee, Sean Banerjee
Title: Predicting 3D Motion from 2D Video for Behavior-Based VR Biometrics
Abstract:
Critical VR applications in domains such as healthcare, education, and finance that use traditional credentials, such as PIN, password, or multi-factor authentication, stand the chance of being compromised if a malicious person acquires the user credentials or if the user hands over their credentials to an ally. Recently, a number of approaches on user authentication have emerged that use motions of VR head-mounted displays (HMDs) and hand controllers during user interactions in VR to represent the user's behavior as a VR biometric signature. One of the fundamental limitations of behavior-based approaches is that current on-device tracking for HMDs and controllers lacks capability to perform tracking of full-body joint articulation, losing key signature data encapsulated by the user articulation. In this paper, we propose an approach that uses 2D body joints, namely shoulder, elbow, wrist, hip, knee, and ankle, acquired from the right side of the participants using an external 2D camera. Using a Transformer-based deep neural network, our method uses the 2D data of body joints that are not tracked by the VR device to predict past and future 3D tracks of the right controller, providing the benefit of augmenting 3D knowledge in authentication. Our approach provides a minimum equal error rate (EER) of 0.025, and a maximum EER drop of 0.040 over prior work that uses single-unit 3D trajectory as the input.

Authors:Danqing Shi, Yao Wang, Yunpeng Bai, Andreas Bulling, Antti Oulasvirta
Title: Chartist: Task-driven Eye Movement Control for Chart Reading
Abstract:
To design data visualizations that are easy to comprehend, we need to understand how people with different interests read them. Computational models of predicting scanpaths on charts could complement empirical studies by offering estimates of user performance inexpensively; however, previous models have been limited to gaze patterns and overlooked the effects of tasks. Here, we contribute Chartist, a computational model that simulates how users move their eyes to extract information from the chart in order to perform analysis tasks, including value retrieval, filtering, and finding extremes. The novel contribution lies in a two-level hierarchical control architecture. At the high level, the model uses LLMs to comprehend the information gained so far and applies this representation to select a goal for the lower-level controllers, which, in turn, move the eyes in accordance with a sampling policy learned via reinforcement learning. The model is capable of predicting human-like task-driven scanpaths across various tasks. It can be applied in fields such as explainable AI, visualization design evaluation, and optimization. While it displays limitations in terms of generalizability and accuracy, it takes modeling in a promising direction, toward understanding human behaviors in interacting with charts.

Authors:Julian Rasch, Julia Töws, Teresa Hirzle, Florian Müller, Martin Schmitz
Title: CreepyCoCreator? Investigating AI Representation Modes for 3D Object Co-Creation in Virtual Reality
Abstract:
Generative AI in Virtual Reality offers the potential for collaborative object-building, yet challenges remain in aligning AI contributions with user expectations. In particular, users often struggle to understand and collaborate with AI when its actions are not transparently represented. This paper thus explores the co-creative object-building process through a Wizard-of-Oz study, focusing on how AI can effectively convey its intent to users during object customization in Virtual Reality. Inspired by human-to-human collaboration, we focus on three representation modes: the presence of an embodied avatar, whether the AI's contributions are visualized immediately or incrementally, and whether the areas modified are highlighted in advance. The findings provide insights into how these factors affect user perception and interaction with object-generating AI tools in Virtual Reality as well as satisfaction and ownership of the created objects. The results offer design implications for co-creative world-building systems, aiming to foster more effective and satisfying collaborations between humans and AI in Virtual Reality.

Authors:Zicheng Zhu, Yugin Tan, Naomi Yamashita, Yi-Chieh Lee, Renwen Zhang
Title: The Benefits of Prosociality towards AI Agents: Examining the Effects of Helping AI Agents on Human Well-Being
Abstract:
Prosocial behaviors, such as helping others, are well-known to enhance human well-being. While there is a growing trend of humans helping AI agents, it remains unclear whether the well-being benefits of helping others extend to interactions with non-human entities. To address this, we conducted an experiment (N = 295) to explore how helping AI agents impacts human well-being, especially when the agents fulfill human basic psychological needs--relatedness, competence, and autonomy--during the interaction. Our findings showed that helping AI agents reduced participants' feelings of loneliness. When AI met participants' needs for competence and autonomy during the helping process, there was a further decrease in loneliness and an increase in positive affect. However, when AI did not meet participants' need for relatedness, participants experienced an increase in positive affect. We discuss the implications of these findings for understanding how AI can support human well-being.

Authors:Songlin Xu, Hao-Ning Wen, Hongyi Pan, Dallas Dominguez, Dongyin Hu, Xinyu Zhang
Title: Classroom Simulacra: Building Contextual Student Generative Agents in Online Education for Learning Behavioral Simulation
Abstract:
Student simulation supports educators to improve teaching by interacting with virtual students. However, most existing approaches ignore the modulation effects of course materials because of two challenges: the lack of datasets with granularly annotated course materials, and the limitation of existing simulation models in processing extremely long textual data. To solve the challenges, we first run a 6-week education workshop from N = 60 students to collect fine-grained data using a custom built online education system, which logs students' learning behaviors as they interact with lecture materials over time. Second, we propose a transferable iterative reflection (TIR) module that augments both prompting-based and finetuning-based large language models (LLMs) for simulating learning behaviors. Our comprehensive experiments show that TIR enables the LLMs to perform more accurate student simulation than classical deep learning models, even with limited demonstration data. Our TIR approach better captures the granular dynamism of learning performance and inter-student correlations in classrooms, paving the way towards a ''digital twin'' for online education.

Authors:Victor Nikhil Antony, Clara Jeon, Jiasheng Li, Ge Gao, Huaishu Peng, Anastasia K. Ostrowski, Chien-Ming Huang
Title: The Design of On-Body Robots for Older Adults
Abstract:
Wearable technology has significantly improved the quality of life for older adults, and the emergence of on-body, movable robots presents new opportunities to further enhance well-being. Yet, the interaction design for these robots remains under-explored, particularly from the perspective of older adults. We present findings from a two-phase co-design process involving 13 older adults to uncover design principles for on-body robots for this population. We identify a rich spectrum of potential applications and characterize a design space to inform how on-body robots should be built for older adults. Our findings highlight the importance of considering factors like co-presence, embodiment, and multi-modal communication. Our work offers design insights to facilitate the integration of on-body robots into daily life and underscores the value of involving older adults in the co-design process to promote usability and acceptance of emerging wearable robotic technologies.

Authors:Giuseppe Desolda, Andrea Esposito, Francesco Greco, Cesare Tucci, Paolo Buono, Antonio Piccinno
Title: Understanding User Mental Models in AI-Driven Code Completion Tools: Insights from an Elicitation Study
Abstract:
Integrated Development Environments increasingly implement AI-powered code completion tools (CCTs), which promise to enhance developer efficiency, accuracy, and productivity. However, interaction challenges with CCTs persist, mainly due to mismatches between developers' mental models and the unpredictable behavior of AI-generated suggestions, which is an aspect underexplored in the literature. We conducted an elicitation study with 56 developers using co-design workshops to elicit their mental models when interacting with CCTs. Different important findings that might drive the interaction design with CCTs emerged. For example, developers expressed diverse preferences on when and how code suggestions should be triggered (proactive, manual, hybrid), where and how they are displayed (inline, sidebar, popup, chatbot), as well as the level of detail. It also emerged that developers need to be supported by customization of activation timing, display modality, suggestion granularity, and explanation content, to better fit the CCT to their preferences. To demonstrate the feasibility of these and the other guidelines that emerged during the study, we developed ATHENA, a proof-of-concept CCT that dynamically adapts to developers' coding preferences and environments, ensuring seamless integration into diverse workflows.

Authors:Wai Tong, Haobo Li, Meng Xia, Wong Kam-Kwai, Ting-Chuen Pong, Huamin Qu, Yalong Yang
Title: Exploring Spatial Hybrid User Interface for Visual Sensemaking
Abstract:
We built a spatial hybrid system that combines a personal computer (PC) and virtual reality (VR) for visual sensemaking, addressing limitations in both environments. Although VR offers immense potential for interactive data visualization (e.g., large display space and spatial navigation), it can also present challenges such as imprecise interactions and user fatigue. At the same time, a PC offers precise and familiar interactions but has limited display space and interaction modality. Therefore, we iteratively designed a spatial hybrid system (PC+VR) to complement these two environments by enabling seamless switching between PC and VR environments. To evaluate the system's effectiveness and user experience, we compared it to using a single computing environment (i.e., PC-only and VR-only). Our study results (N=18) showed that spatial PC+VR could combine the benefits of both devices to outperform user preference for VR-only without a negative impact on performance from device switching overhead. Finally, we discussed future design implications.

Authors:Arpit Narechania, Alex Endert, Atanu R Sinha
Title: Guidance Source Matters: How Guidance from AI, Expert, or a Group of Analysts Impacts Visual Data Preparation and Analysis
Abstract:
The progress in generative AI has fueled AI-powered tools like co-pilots and assistants to provision better guidance, particularly during data analysis. However, research on guidance has not yet examined the perceived efficacy of the source from which guidance is offered and the impact of this source on the user's perception and usage of guidance. We ask whether users perceive all guidance sources as equal, with particular interest in three sources: (i) AI, (ii) human expert, and (iii) a group of human analysts. As a benchmark, we consider a fourth source, (iv) unattributed guidance, where guidance is provided without attribution to any source, enabling isolation of and comparison with the effects of source-specific guidance. We design a five-condition between-subjects study, with one condition for each of the four guidance sources and an additional (v) no-guidance condition, which serves as a baseline to evaluate the influence of any kind of guidance. We situate our study in a custom data preparation and analysis tool wherein we task users to select relevant attributes from an unfamiliar dataset to inform a business report. Depending on the assigned condition, users can request guidance, which the system then provides in the form of attribute suggestions. To ensure internal validity, we control for the quality of guidance across source-conditions. Through several metrics of usage and perception, we statistically test five preregistered hypotheses and report on additional analysis. We find that the source of guidance matters to users, but not in a manner that matches received wisdom. For instance, users utilize guidance differently at various stages of analysis, including expressing varying levels of regret, despite receiving guidance of similar quality. Notably, users in the AI condition reported both higher post-task benefit and regret.

Authors:Yuxuan Li, Hirokazu Shirado, Sauvik Das
Title: Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models
Abstract:
While advances in fairness and alignment have helped mitigate overt biases exhibited by large language models (LLMs) when explicitly prompted, we hypothesize that these models may still exhibit implicit biases when simulating human behavior. To test this hypothesis, we propose a technique to systematically uncover such biases across a broad range of sociodemographic categories by assessing decision-making disparities among agents with LLM-generated, sociodemographically-informed personas. Using our technique, we tested six LLMs across three sociodemographic groups and four decision-making scenarios. Our results show that state-of-the-art LLMs exhibit significant sociodemographic disparities in nearly all simulations, with more advanced models exhibiting greater implicit biases despite reducing explicit biases. Furthermore, when comparing our findings to real-world disparities reported in empirical studies, we find that the biases we uncovered are directionally aligned but markedly amplified. This directional alignment highlights the utility of our technique in uncovering systematic biases in LLMs rather than random variations; moreover, the presence and amplification of implicit biases emphasizes the need for novel strategies to address these biases.

Authors:Stephanie Houde, Kristina Brimijoin, Michael Muller, Steven I. Ross, Dario Andres Silva Moran, Gabriel Enrique Gonzalez, Siya Kunde, Morgan A. Foreman, Justin D. Weisz
Title: Controlling AI Agent Participation in Group Conversations: A Human-Centered Approach
Abstract:
Conversational AI agents are commonly applied within single-user, turn-taking scenarios. The interaction mechanics of these scenarios are trivial: when the user enters a message, the AI agent produces a response. However, the interaction dynamics are more complex within group settings. How should an agent behave in these settings? We report on two experiments aimed at uncovering users' experiences of an AI agent's participation within a group, in the context of group ideation (brainstorming). In the first study, participants benefited from and preferred having the AI agent in the group, but participants disliked when the agent seemed to dominate the conversation and they desired various controls over its interactive behaviors. In the second study, we created functional controls over the agent's behavior, operable by group members, to validate their utility and probe for additional requirements. Integrating our findings across both studies, we developed a taxonomy of controls for when, what, and where a conversational AI agent in a group should respond, who can control its behavior, and how those controls are specified and implemented. Our taxonomy is intended to aid AI creators to think through important considerations in the design of mixed-initiative conversational agents.

Authors:Michael Xieyang Liu, Savvas Petridis, Vivian Tsai, Alexander J. Fiannaca, Alex Olwal, Michael Terry, Carrie J. Cai
Title: Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning
Abstract:
Multimodal large language models (MLLMs), with their expansive world knowledge and reasoning capabilities, present a unique opportunity for end-users to create personalized AI sensors capable of reasoning about complex situations. A user could describe a desired sensing task in natural language (e.g., "alert if my toddler is getting into mischief"), with the MLLM analyzing the camera feed and responding within seconds. In a formative study, we found that users saw substantial value in defining their own sensors, yet struggled to articulate their unique personal requirements and debug the sensors through prompting alone. To address these challenges, we developed Gensors, a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs. Gensors 1) assists users in eliciting requirements through both automatically-generated and manually created sensor criteria, 2) facilitates debugging by allowing users to isolate and test individual criteria in parallel, 3) suggests additional criteria based on user-provided images, and 4) proposes test cases to help users "stress test" sensors on potentially unforeseen scenarios. In a user study, participants reported significantly greater sense of control, understanding, and ease of communication when defining sensors using Gensors. Beyond addressing model limitations, Gensors supported users in debugging, eliciting requirements, and expressing unique personal requirements to the sensor through criteria-based reasoning; it also helped uncover users' "blind spots" by exposing overlooked criteria and revealing unanticipated failure modes. Finally, we discuss how unique characteristics of MLLMs--such as hallucinations and inconsistent responses--can impact the sensor-creation process. These findings contribute to the design of future intelligent sensing systems that are intuitive and customizable by everyday users.

Authors:Farah Baracat, Luca Manneschi, Elisa Donati
Title: Heterogeneous Population Encoding for Multi-joint Regression using sEMG Signals
Abstract:
Regression-based decoding of continuous movements is essential for human-machine interfaces (HMIs), such as prosthetic control. This study explores a feature-based approach to encoding Surface Electromyography (sEMG) signals, focusing on the role of variability in neural-inspired population encoding. By employing heterogeneous populations of Leaky Integrate-and- Fire (LIF) neurons with varying sizes and diverse parameter distributions, we investigate how population size and variability in encoding parameters, such as membrane time constants and thresholds, influence decoding performance. Using a simple linear readout, we demonstrate that variability improves robustness and generalizability compared to single-neuron encoders. These findings emphasize the importance of optimizing variability and population size for efficient and scalable regression tasks in spiking neural networks (SNNs), paving the way for robust, low-power HMI implementations.

Authors:Junti Zhang, Zicheng Zhu, Jingshu Li, Yi-Chieh Lee
Title: Mining Evidence about Your Symptoms: Mitigating Availability Bias in Online Self-Diagnosis
Abstract:
People frequently exposed to health information on social media tend to overestimate their symptoms during online self-diagnosis due to availability bias. This may lead to incorrect self-medication and place additional burdens on healthcare providers to correct patients' misconceptions. In this work, we conducted two mixed-method studies to identify design goals for mitigating availability bias in online self-diagnosis. We investigated factors that distort self-assessment of symptoms after exposure to social media. We found that availability bias is pronounced when social media content resonated with individuals, making them disregard their own evidences. To address this, we developed and evaluated three chatbot-based symptom checkers designed to foster evidence-based self-reflection for bias mitigation given their potential to encourage thoughtful responses. Results showed that chatbot-based symptom checkers with cognitive intervention strategies mitigated the impact of availability bias in online self-diagnosis.

Authors:Changyang He, Yue Deng, Alessandro Fabris, Bo Li, Asia Biega
Title: Developing a Fair Online Recruitment Framework Based on Job-seekers' Fairness Concerns
Abstract:
The susceptibility to biases and discrimination is a pressing issue in today's labor markets. Though digital recruitment systems play an increasingly significant role in human resources management, thus far we lack a systematic understanding of human-centered design principles for fair online hiring. This work proposes a fair recruitment framework based on job-seekers' fairness concerns shared in an online forum. Through qualitative analysis, we uncover four overarching themes of job-seekers' fairness concerns, including discrimination against sensitive attributes, interaction biases, improper interpretations of qualifications, and power imbalance. Based on these findings, we derive design implications for algorithms and interfaces in recruitment systems, integrating them into a fair recruitment framework spanning different hiring stages and fairness considerations.

Authors:Jane Hoffswell, Victor Soares Bursztyn, Shunan Guo, Jesse Martinez, Eunyee Koh
Title: Representing Visualization Insights as a Dense Insight Network
Abstract:
We propose a dense insight network framework to encode the relationships between automatically generated insights from a complex dashboard based on their shared characteristics. Our insight network framework includes five high-level categories of relationships (e.g., type, topic, value, metadata, and compound scores). The goal of this insight network framework is to provide a foundation for implementing new insight interpretation and exploration strategies, including both user-driven and automated approaches. To illustrate the complexity and flexibility of our framework, we first describe a visualization playground to directly visualize key network characteristics; this playground also demonstrates potential interactive capabilities for decomposing the dense insight network. Then, we discuss a case study application for ranking insights based on the underlying network characteristics captured by our framework, before prompting a large language model to generate a concise, natural language summary. Finally, we reflect on next steps for leveraging our insight network framework to design and evaluate new systems.

Authors:Thomas F. Eisenmann, Andres Karjus, Mar Canet Sola, Levin Brinkmann, Bramantyo Ibrahim Supriyatno, Iyad Rahwan
Title: Expertise elevates AI usage: experimental evidence comparing laypeople and professional artists
Abstract:
Novel capacities of generative AI to analyze and generate cultural artifacts raise inevitable questions about the nature and value of artistic education and human expertise. Has AI already leveled the playing field between professional artists and laypeople, or do trained artistic expressive capacity, curation skills and experience instead enhance the ability to use these new tools? In this pre-registered study, we conduct experimental comparisons between 50 active artists and a demographically matched sample of laypeople. We designed two tasks to approximate artistic practice for testing their capabilities in both faithful and creative image creation: replicating a reference image, and moving as far away as possible from it. We developed a bespoke platform where participants used a modern text-to-image model to complete both tasks. We also collected and compared participants' sentiments towards AI. On average, artists produced more faithful and creative outputs than their lay counterparts, although only by a small margin. While AI may ease content creation, professional expertise is still valuable - even within the confined space of generative AI itself. Finally, we also explored how well an exemplary vision-capable large language model (GPT-4o) would complete the same tasks, if given the role of an image generation agent, and found it performed on par in copying but outperformed even artists in the creative task. The very best results were still produced by humans in both tasks. These outcomes highlight the importance of integrating artistic skills with AI training to prepare artists and other visual professionals for a technologically evolving landscape. We see a potential in collaborative synergy with generative AI, which could reshape creative industries and education in the arts.

Authors:Yue Deng, Changyang He, Yixin Zou, Bo Li
Title: "Auntie, Please Don't Fall for Those Smooth Talkers": How Chinese Younger Family Members Safeguard Seniors from Online Fraud
Abstract:
Online fraud substantially harms individuals and seniors are disproportionately targeted. While family is crucial for seniors, little research has empirically examined how they protect seniors against fraud. To address this gap, we employed an inductive thematic analysis of 124 posts and 16,872 comments on RedNote (Xiaohongshu), exploring the family support ecosystem for senior-targeted online fraud in China. We develop a taxonomy of senior-targeted online fraud from a familial perspective, revealing younger members often spot frauds hard for seniors to detect, such as unusual charges. Younger family members fulfill multiple safeguarding roles, including preventative measures, fraud identification, fraud persuasion, loss recovery, and education. They also encounter numerous challenges, such as seniors' refusal of help and considerable mental and financial stress. Drawing on these, we develop a conceptual framework to characterize family support in senior-targeted fraud, and outline implications for researchers and practitioners to consider the broader stakeholder ecosystem and cultural aspects.

Authors:En-Qi Tseng, Pei-Cing Huang, Chan Hsu, Peng-Yi Wu, Chan-Tung Ku, Yihuang Kang
Title: CodEv: An Automated Grading Framework Leveraging Large Language Models for Consistent and Constructive Feedback
Abstract:
Grading programming assignments is crucial for guiding students to improve their programming skills and coding styles. This study presents an automated grading framework, CodEv, which leverages Large Language Models (LLMs) to provide consistent and constructive feedback. We incorporate Chain of Thought (CoT) prompting techniques to enhance the reasoning capabilities of LLMs and ensure that the grading is aligned with human evaluation. Our framework also integrates LLM ensembles to improve the accuracy and consistency of scores, along with agreement tests to deliver reliable feedback and code review comments. The results demonstrate that the framework can yield grading results comparable to human evaluators, by using smaller LLMs. Evaluation and consistency tests of the LLMs further validate our approach, confirming the reliability of the generated scores and feedback.

Authors:Miriana Calvano, Antonio Curci, Giuseppe Desolda, Andrea Esposito, Rosa Lanzilotti, Antonio Piccinno
Title: Building Symbiotic AI: Reviewing the AI Act for a Human-Centred, Principle-Based Framework
Abstract:
Artificial Intelligence (AI) spreads quickly as new technologies and services take over modern society. The need to regulate AI design, development, and use is strictly necessary to avoid unethical and potentially dangerous consequences to humans. The European Union (EU) has released a new legal framework, the AI Act, to regulate AI by undertaking a risk-based approach to safeguard humans during interaction. At the same time, researchers offer a new perspective on AI systems, commonly known as Human-Centred AI (HCAI), highlighting the need for a human-centred approach to their design. In this context, Symbiotic AI (a subtype of HCAI) promises to enhance human capabilities through a deeper and continuous collaboration between human intelligence and AI. This article presents the results of a Systematic Literature Review (SLR) that aims to identify principles that characterise the design and development of Symbiotic AI systems while considering humans as the core of the process. Through content analysis, four principles emerged from the review that must be applied to create Human-Centred AI systems that can establish a symbiotic relationship with humans. In addition, current trends and challenges were defined to indicate open questions that may guide future research for the development of SAI systems that comply with the AI Act.

Authors:José María Buades Rubio, Gabriel Moyà-Alcover, Antoni Jaume-i-Capó, Nataša Petrović
Title: Crowdsourced human-based computational approach for tagging peripheral blood smear sample images from Sickle Cell Disease patients using non-expert users
Abstract:
In this paper, we present a human-based computation approach for the analysis of peripheral blood smear (PBS) images images in patients with Sickle Cell Disease (SCD). We used the Mechanical Turk microtask market to crowdsource the labeling of PBS images. We then use the expert-tagged erythrocytesIDB dataset to assess the accuracy and reliability of our proposal. Our results showed that when a robust consensus is achieved among the Mechanical Turk workers, probability of error is very low, based on comparison with expert analysis. This suggests that our proposed approach can be used to annotate datasets of PBS images, which can then be used to train automated methods for the diagnosis of SCD. In future work, we plan to explore the potential integration of our findings with outcomes obtained through automated methodologies. This could lead to the development of more accurate and reliable methods for the diagnosis of SCD

Authors:Xinyao Ma, Rui Zhu, Zihao Wang, Jingwei Xiong, Qingyu Chen, Haixu Tang, L. Jean Camp, Lucila Ohno-Machado
Title: Enhancing Patient-Centric Communication: Leveraging LLMs to Simulate Patient Perspectives
Abstract:
Large Language Models (LLMs) have demonstrated impressive capabilities in role-playing scenarios, particularly in simulating domain-specific experts using tailored prompts. This ability enables LLMs to adopt the persona of individuals with specific backgrounds, offering a cost-effective and efficient alternative to traditional, resource-intensive user studies. By mimicking human behavior, LLMs can anticipate responses based on concrete demographic or professional profiles. In this paper, we evaluate the effectiveness of LLMs in simulating individuals with diverse backgrounds and analyze the consistency of these simulated behaviors compared to real-world outcomes. In particular, we explore the potential of LLMs to interpret and respond to discharge summaries provided to patients leaving the Intensive Care Unit (ICU). We evaluate and compare with human responses the comprehensibility of discharge summaries among individuals with varying educational backgrounds, using this analysis to assess the strengths and limitations of LLM-driven simulations. Notably, when LLMs are primed with educational background information, they deliver accurate and actionable medical guidance 88% of the time. However, when other information is provided, performance significantly drops, falling below random chance levels. This preliminary study shows the potential benefits and pitfalls of automatically generating patient-specific health information from diverse populations. While LLMs show promise in simulating health personas, our results highlight critical gaps that must be addressed before they can be reliably used in clinical settings. Our findings suggest that a straightforward query-response model could outperform a more tailored approach in delivering health information. This is a crucial first step in understanding how LLMs can be optimized for personalized health communication while maintaining accuracy.

Authors:Ruchira Ray, Leona Pang, Sanjana Srivastava, Li Fei-Fei, Samantha Shorey, Roberto Martín-Martín
Title: Why Automate This? Exploring Correlations between Desire for Robotic Automation, Invested Time and Well-Being
Abstract:
Understanding the motivations underlying the human inclination to automate tasks is vital to developing truly helpful robots integrated into daily life. Accordingly, we ask: are individuals more inclined to automate chores based on the time they consume or the feelings experienced while performing them? This study explores these preferences and whether they vary across different social groups (i.e., gender category and income level). Leveraging data from the BEHAVIOR-1K dataset, the American Time-Use Survey, and the American Time-Use Survey Well-Being Module, we investigate the relationship between the desire for automation, time spent on daily activities, and their associated feelings - Happiness, Meaningfulness, Sadness, Painfulness, Stressfulness, or Tiredness. Our key findings show that, despite common assumptions, time spent does not strongly relate to the desire for automation for the general population. For the feelings analyzed, only happiness and pain are key indicators. Significant differences by gender and economic level also emerged: Women prefer to automate stressful activities, whereas men prefer to automate those that make them unhappy; mid-income individuals prioritize automating less enjoyable and meaningful activities, while low and high-income show no significant correlations. We hope our research helps motivate technologies to develop robots that match the priorities of potential users, moving domestic robotics toward more socially relevant solutions. We open-source all the data, including an online tool that enables the community to replicate our analysis and explore additional trends at https://robin-lab.cs.utexas.edu/why-automate-this/.

Authors:Maia Stiber, Russell Taylor, Chien-Ming Huang
Title: Robot Error Awareness Through Human Reactions: Implementation, Evaluation, and Recommendations
Abstract:
Effective error detection is crucial to prevent task disruption and maintain user trust. Traditional methods often rely on task-specific models or user reporting, which can be inflexible or slow. Recent research suggests social signals, naturally exhibited by users in response to robot errors, can enable more flexible, timely error detection. However, most studies rely on post hoc analysis, leaving their real-time effectiveness uncertain and lacking user-centric evaluation. In this work, we developed a proactive error detection system that combines user behavioral signals (facial action units and speech), user feedback, and error context for automatic error detection. In a study (N = 28), we compared our proactive system to a status quo reactive approach. Results show our system 1) reliably and flexibly detects error, 2) detects errors faster than the reactive approach, and 3) is perceived more favorably by users than the reactive one. We discuss recommendations for enabling robot error awareness in future HRI systems.

Authors:Shaoyue Wen, Michael Middleton, Songming Ping, Nayan N Chawla, Guande Wu, Bradley S Feest, Chihab Nadri, Yunmei Liu, David Kaber, Maryam Zahabi, Ryan P. McMahan, Sonia Castelo, Ryan Mckendrick, Jing Qian, Claudio Silva
Title: AdaptiveCoPilot: Design and Testing of a NeuroAdaptive LLM Cockpit Guidance System in both Novice and Expert Pilots
Abstract:
Pilots operating modern cockpits often face high cognitive demands due to complex interfaces and multitasking requirements, which can lead to overload and decreased performance. This study introduces AdaptiveCoPilot, a neuroadaptive guidance system that adapts visual, auditory, and textual cues in real time based on the pilot's cognitive workload, measured via functional Near-Infrared Spectroscopy (fNIRS). A formative study with expert pilots (N=3) identified adaptive rules for modality switching and information load adjustments during preflight tasks. These insights informed the design of AdaptiveCoPilot, which integrates cognitive state assessments, behavioral data, and adaptive strategies within a context-aware Large Language Model (LLM). The system was evaluated in a virtual reality (VR) simulated cockpit with licensed pilots (N=8), comparing its performance against baseline and random feedback conditions. The results indicate that the pilots using AdaptiveCoPilot exhibited higher rates of optimal cognitive load states on the facets of working memory and perception, along with reduced task completion times. Based on the formative study, experimental findings, qualitative interviews, we propose a set of strategies for future development of neuroadaptive pilot guidance systems and highlight the potential of neuroadaptive systems to enhance pilot performance and safety in aviation environments.

Authors:Mohan Li, Martin Gjoreski, Pietro Barbiero, Gašper Slapničar, Mitja Luštrek, Nicholas D. Lane, Marc Langheinrich
Title: A Survey on Federated Learning in Human Sensing
Abstract:
Human Sensing, a field that leverages technology to monitor human activities, psycho-physiological states, and interactions with the environment, enhances our understanding of human behavior and drives the development of advanced services that improve overall quality of life. However, its reliance on detailed and often privacy-sensitive data as the basis for its machine learning (ML) models raises significant legal and ethical concerns. The recently proposed ML approach of Federated Learning (FL) promises to alleviate many of these concerns, as it is able to create accurate ML models without sending raw user data to a central server. While FL has demonstrated its usefulness across a variety of areas, such as text prediction and cyber security, its benefits in Human Sensing are under-explored, given the particular challenges in this domain. This survey conducts a comprehensive analysis of the current state-of-the-art studies on FL in Human Sensing, and proposes a taxonomy and an eight-dimensional assessment for FL approaches. Through the eight-dimensional assessment, we then evaluate whether the surveyed studies consider a specific FL-in-Human-Sensing challenge or not. Finally, based on the overall analysis, we discuss open challenges and highlight five research aspects related to FL in Human Sensing that require urgent research attention. Our work provides a comprehensive corpus of FL studies and aims to assist FL practitioners in developing and evaluating solutions that effectively address the real-world complexities of Human Sensing.

Authors:Tianrun Qiu, Changxin Chen, Sizhe Cheng, Yiming Yang, Yixiao Guo, Zhicong Lu, Yuxin Ma
Title: GamerAstra: Enhancing Video Game Accessibility for Blind and Low-Vision Players through a Multi-Agent AI Framework
Abstract:
Blind and low-vision (BLV) players encounter critical challenges in engaging with video games due to the inaccessibility of visual elements, difficulties in navigating interfaces, and limitations in sending interaction input. Moreover, the development of specialized accessibility features typically requires substantial programming effort and is often implemented on a game-by-game basis. To address these challenges, we introduce \textit{GamerAstra}, a generalized accessibility framework that leverages a multi-agent design to facilitate access to video games for BLV players. It integrates multi-modal techniques including large language models and vision-language models, enabling interaction with games lacking native accessibility support. The framework further incorporates customizable assistance granularities to support varying degrees of visual impairment and enhances interface navigation through multiple input modalities. The evaluation through technical assessments and user studies indicate that \textit{GamerAstra} effectively enhances playability and delivers a more immersive gaming experience for BLV players. These findings also underscore potential avenues for advancing intelligent accessibility frameworks in the gaming domain.

Authors:Benjamin Watson, Neff Walker, Larry F Hodges
Title: Supra-threshold control of peripheral LOD
Abstract:
Level of detail (LOD) is widely used to control visual feedback in interactive applications. LOD control is typically based on perception at threshold - the conditions in which a stimulus first becomes perceivable. Yet most LOD manipulations are quite perceivable and occur well above threshold. Moreover, research shows that supra-threshold perception differs drastically from perception at threshold. In that case, should supra-threshold LOD control also differ from LOD control at threshold? In two experiments, we examine supra-threshold LOD control in the visual periphery and find that indeed, it should differ drastically from LOD control at threshold. Specifically, we find that LOD must support a task-dependent level of reliable perceptibility. Above that level, perceptibility of LOD control manipulations should be minimized, and detail contrast is a better predictor of perceptibility than detail size. Below that level, perceptibility must be maximized, and LOD should be improved as eccentricity rises or contrast drops. This directly contradicts prevailing threshold-based LOD control schemes, and strongly suggests a reexamination of LOD control for foveal display.

Authors:Sam Yu-Te Lee, Chenyang Ji, Shicheng Wen, Lifu Huang, Dongyu Liu, Kwan-Liu Ma
Title: VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents
Abstract:
Text analytics has traditionally required specialized knowledge in Natural Language Processing (NLP) or text analysis, which presents a barrier for entry-level analysts. Recent advances in large language models (LLMs) have changed the landscape of NLP by enabling more accessible and automated text analysis (e.g., topic detection, summarization, information extraction, etc.). We introduce VIDEE, a system that supports entry-level data analysts to conduct advanced text analytics with intelligent agents. VIDEE instantiates a human-agent collaroration workflow consisting of three stages: (1) Decomposition, which incorporates a human-in-the-loop Monte-Carlo Tree Search algorithm to support generative reasoning with human feedback, (2) Execution, which generates an executable text analytics pipeline, and (3) Evaluation, which integrates LLM-based evaluation and visualizations to support user validation of execution results. We conduct two quantitative experiments to evaluate VIDEE's effectiveness and analyze common agent errors. A user study involving participants with varying levels of NLP and text analytics experience -- from none to expert -- demonstrates the system's usability and reveals distinct user behavior patterns. The findings identify design implications for human-agent collaboration, validate the practical utility of VIDEE for non-expert users, and inform future improvements to intelligent text analytics systems.

Authors:Benjamin Watson, Neff Walker, Larry F Hodges
Title: Managing level of detail through head-tracked peripheral degradation: a model and resulting design principles
Abstract:
Previous work has demonstrated the utility of reductions in the level of detail (LOD) in the periphery of head-tracked, large field of view displays. This paper provides a psychophysically based model, centered around an eye/head movement tradeoff, that explains the effectiveness of peripheral degradation and suggests how peripherally degraded displays should be designed. An experiment evaluating the effect on search performance of the shape and area of the high detail central area (inset) in peripherally degraded displays was performed, results indicated that inset shape is not a significant factor in performance. Inset area, however, was significant: performance with displays subtending at least 30 degrees of horizontal and vertical angle was not significantly different from performance with an undegraded display. These results agreed with the proposed model.

Authors:Benjamin Watson, Neff Walker, Larry F Hodges, Martin Reddy
Title: An evaluation of level of detail degradation in head-mounted display peripheries
Abstract:
A paradigm for the design of systems that manage level of detail in virtual environments is proposed. As an example of the prototyping step in this paradigm, a user study was performed to evaluate the effectiveness of high detail insets used with head-mounted displays. Ten subjects were given a simple search task that required the location and identification of a single target object. All subjects used seven different displays (the independent variable), varying in inset size and peripheral detail, to perform this task. Frame rate, target location, subject input method, and order of display use were all controlled. Primary dependent measures were search time on trials with correct identification, and the percentage of all trials correctly identified. ANOVAs of the results showed that insetless, high detail displays did not lead to significantly different search times or accuracies than displays with insets. In fact, only the insetless, low detail display returned significantly different results. Further research is being performed to examine the effect of varying task complexity, inset size, and level of detail.

Authors:Stephanie Käs, Anton Burenko, Louis Markert, Onur Alp Culha, Dennis Mack, Timm Linder, Bastian Leibe
Title: How do Foundation Models Compare to Skeleton-Based Approaches for Gesture Recognition in Human-Robot Interaction?
Abstract:
Gestures enable non-verbal human-robot communication, especially in noisy environments like agile production. Traditional deep learning-based gesture recognition relies on task-specific architectures using images, videos, or skeletal pose estimates as input. Meanwhile, Vision Foundation Models (VFMs) and Vision Language Models (VLMs) with their strong generalization abilities offer potential to reduce system complexity by replacing dedicated task-specific modules. This study investigates adapting such models for dynamic, full-body gesture recognition, comparing V-JEPA (a state-of-the-art VFM), Gemini Flash 2.0 (a multimodal VLM), and HD-GCN (a top-performing skeleton-based approach). We introduce NUGGET, a dataset tailored for human-robot communication in intralogistics environments, to evaluate the different gesture recognition approaches. In our experiments, HD-GCN achieves best performance, but V-JEPA comes close with a simple, task-specific classification head - thus paving a possible way towards reducing system complexity, by using it as a shared multi-task model. In contrast, Gemini struggles to differentiate gestures based solely on textual descriptions in the zero-shot setting, highlighting the need of further research on suitable input representations for gestures.

Authors:Momin N. Siddiqui, Roy Pea, Hari Subramonyam
Title: AI in the Writing Process: How Purposeful AI Support Fosters Student Writing
Abstract:
The ubiquity of technologies like ChatGPT has raised concerns about their impact on student writing, particularly regarding reduced learner agency and superficial engagement with content. While standalone chat-based LLMs often produce suboptimal writing outcomes, evidence suggests that purposefully designed AI writing support tools can enhance the writing process. This paper investigates how different AI support approaches affect writers' sense of agency and depth of knowledge transformation. Through a randomized control trial with 90 undergraduate students, we compare three conditions: (1) a chat-based LLM writing assistant, (2) an integrated AI writing tool to support diverse subprocesses, and (3) a standard writing interface (control). Our findings demonstrate that, among AI-supported conditions, students using the integrated AI writing tool exhibited greater agency over their writing process and engaged in deeper knowledge transformation overall. These results suggest that thoughtfully designed AI writing support targeting specific aspects of the writing process can help students maintain ownership of their work while facilitating improved engagement with content.

Authors:Adrien Coppens, Valérie Maquil
Title: Integrating AIs With Body Tracking Technology for Human Behaviour Analysis: Challenges and Opportunities
Abstract:
The automated analysis of human behaviour provides many opportunities for the creation of interactive systems and the post-experiment investigations for user studies. Commodity depth cameras offer reasonable body tracking accuracy at a low price point, without the need for users to wear or hold any extra equipment. The resulting systems typically perform body tracking through a dedicated machine learning model, but they can be enhanced with additional AI components providing extra capabilities. This leads to opportunities but also challenges, for example regarding the orchestration of such AI components and the engineering of the resulting tracking pipeline. In this paper, we discuss these elements, based on our experience with the creation of a remote collaboration system across distant wall-sized displays, that we built using existing and readily available building blocks, including AI-based recognition models.

Authors:Bhanuka Gamage, Nicola McDowell, Dijana Kovacic, Leona Holloway, Thanh-Toan Do, Nicholas Price, Arthur Lowery, Kim Marriott
Title: Smart Glasses for CVI: Co-Designing Extended Reality Solutions to Support Environmental Perception by People with Cerebral Visual Impairment
Abstract:
Cerebral Visual Impairment (CVI) is the set to be the leading cause of vision impairment, yet remains underrepresented in assistive technology research. Unlike ocular conditions, CVI affects higher-order visual processing-impacting object recognition, facial perception, and attention in complex environments. This paper presents a co-design study with two adults with CVI investigating how smart glasses, i.e. head-mounted extended reality displays, can support understanding and interaction with the immediate environment. Guided by the Double Diamond design framework, we conducted a two-week diary study, two ideation workshops, and ten iterative development sessions using the Apple Vision Pro. Our findings demonstrate that smart glasses can meaningfully address key challenges in locating objects, reading text, recognising people, engaging in conversations, and managing sensory stress. With the rapid advancement of smart glasses and increasing recognition of CVI as a distinct form of vision impairment, this research addresses a timely and under-explored intersection of technology and need.

Authors:Nan Cao, Xiaoyu Qi, Chuer Chen, Xiaoke Yan
Title: CODS : A Theoretical Model for Computational Design Based on Design Space
Abstract:
We introduce CODS (Computational Optimization in Design Space), a theoretical model that frames computational design as a constrained optimization problem over a structured, multi-dimensional design space. Unlike existing methods that rely on handcrafted heuristics or domain-specific rules, CODS provides a generalizable and interpretable framework that supports diverse design tasks. Given a user requirement and a well-defined design space, CODS automatically derives soft and hard constraints using large language models through a structured prompt engineering pipeline. These constraints guide the optimization process to generate design solutions that are coherent, expressive, and aligned with user intent. We validate our approach across two domains-visualization design and knitwear generation-demonstrating superior performance in design quality, intent alignment, and user preference compared to existing LLM-based methods. CODS offers a unified foundation for scalable, controllable, and AI-powered design automation.

Authors:Shambhavi Bhushan, Danielle R Thomas, Conrad Borchers, Isha Raghuvanshi, Ralph Abboud, Erin Gatz, Shivang Gupta, Kenneth Koedinger
Title: Detecting LLM-Generated Short Answers and Effects on Learner Performance
Abstract:
The increasing availability of large language models (LLMs) has raised concerns about their potential misuse in online learning. While tools for detecting LLM-generated text exist and are widely used by researchers and educators, their reliability varies. Few studies have compared the accuracy of detection methods, defined criteria to identify content generated by LLM, or evaluated the effect on learner performance from LLM misuse within learning. In this study, we define LLM-generated text within open responses as those produced by any LLM without paraphrasing or refinement, as evaluated by human coders. We then fine-tune GPT-4o to detect LLM-generated responses and assess the impact on learning from LLM misuse. We find that our fine-tuned LLM outperforms the existing AI detection tool GPTZero, achieving an accuracy of 80% and an F1 score of 0.78, compared to GPTZero's accuracy of 70% and macro F1 score of 0.50, demonstrating superior performance in detecting LLM-generated responses. We also find that learners suspected of LLM misuse in the open response question were more than twice as likely to correctly answer the corresponding posttest MCQ, suggesting potential misuse across both question types and indicating a bypass of the learning process. We pave the way for future work by demonstrating a structured, code-based approach to improve LLM-generated response detection and propose using auxiliary statistical indicators such as unusually high assessment scores on related tasks, readability scores, and response duration. In support of open science, we contribute data and code to support the fine-tuning of similar models for similar use cases.

Authors:Francesco Chiossi, Julian Rasch, Robin Welsch, Albrecht Schmidt, Florian Michahelles
Title: Designing Intent: A Multimodal Framework for Human-Robot Cooperation in Industrial Workspaces
Abstract:
As robots enter collaborative workspaces, ensuring mutual understanding between human workers and robotic systems becomes a prerequisite for trust, safety, and efficiency. In this position paper, we draw on the cooperation scenario of the AIMotive project in which a human and a cobot jointly perform assembly tasks to argue for a structured approach to intent communication. Building on the Situation Awareness-based Agent Transparency (SAT) framework and the notion of task abstraction levels, we propose a multidimensional design space that maps intent content (SAT1, SAT3), planning horizon (operational to strategic), and modality (visual, auditory, haptic). We illustrate how this space can guide the design of multimodal communication strategies tailored to dynamic collaborative work contexts. With this paper, we lay the conceptual foundation for a future design toolkit aimed at supporting transparent human-robot interaction in the workplace. We highlight key open questions and design challenges, and propose a shared agenda for multimodal, adaptive, and trustworthy robotic collaboration in hybrid work environments.

Authors:Jiayue Melissa Shi, Dong Whi Yoo, Keran Wang, Violeta J. Rodriguez, Ravi Karkar, Koustuv Saha
Title: Mapping Caregiver Needs to AI Chatbot Design: Strengths and Gaps in Mental Health Support for Alzheimer's and Dementia Caregivers
Abstract:
Family caregivers of individuals with Alzheimer's Disease and Related Dementia (AD/ADRD) face significant emotional and logistical challenges that place them at heightened risk for stress, anxiety, and depression. Although recent advances in generative AI -- particularly large language models (LLMs) -- offer new opportunities to support mental health, little is known about how caregivers perceive and engage with such technologies. To address this gap, we developed Carey, a GPT-4o-based chatbot designed to provide informational and emotional support to AD/ADRD caregivers. Using Carey as a technology probe, we conducted semi-structured interviews with 16 family caregivers following scenario-driven interactions grounded in common caregiving stressors. Through inductive coding and reflexive thematic analysis, we surface a systemic understanding of caregiver needs and expectations across six themes -- on-demand information access, emotional support, safe space for disclosure, crisis management, personalization, and data privacy. For each of these themes, we also identified the nuanced tensions in the caregivers' desires and concerns. We present a mapping of caregiver needs, AI chatbot's strengths, gaps, and design recommendations. Our findings offer theoretical and practical insights to inform the design of proactive, trustworthy, and caregiver-centered AI systems that better support the evolving mental health needs of AD/ADRD caregivers.

Authors:Jiayue Melissa Shi, Keran Wang, Dong Whi Yoo, Ravi Karkar, Koustuv Saha
Title: Balancing Caregiving and Self-Care: Exploring Mental Health Needs of Alzheimer's and Dementia Caregivers
Abstract:
Alzheimer's Disease and Related Dementias (AD/ADRD) are progressive neurodegenerative conditions that impair memory, thought processes, and functioning. Family caregivers of individuals with AD/ADRD face significant mental health challenges due to long-term caregiving responsibilities. Yet, current support systems often overlook the evolving nature of their mental wellbeing needs. Our study examines caregivers' mental wellbeing concerns, focusing on the practices they adopt to manage the burden of caregiving and the technologies they use for support. Through semi-structured interviews with 25 family caregivers of individuals with AD/ADRD, we identified the key causes and effects of mental health challenges, and developed a temporal mapping of how caregivers' mental wellbeing evolves across three distinct stages of the caregiving journey. Additionally, our participants shared insights into improvements for existing mental health technologies, emphasizing the need for accessible, scalable, and personalized solutions that adapt to caregivers' changing needs over time. These findings offer a foundation for designing dynamic, stage-sensitive interventions that holistically support caregivers' mental wellbeing, benefiting both caregivers and care recipients.

Authors:Lan Gao, Oscar Chen, Rachel Lee, Nick Feamster, Chenhao Tan, Marshini Chetty
Title: "I Cannot Write This Because It Violates Our Content Policy": Understanding Content Moderation Policies and User Experiences in Generative AI Products
Abstract:
While recent research has focused on developing safeguards for generative AI (GAI) model-level content safety, little is known about how content moderation to prevent malicious content performs for end-users in real-world GAI products. To bridge this gap, we investigated content moderation policies and their enforcement in GAI online tools -- consumer-facing web-based GAI applications. We first analyzed content moderation policies of 14 GAI online tools. While these policies are comprehensive in outlining moderation practices, they usually lack details on practical implementations and are not specific about how users can aid in moderation or appeal moderation decisions. Next, we examined user-experienced content moderation successes and failures through Reddit discussions on GAI online tools. We found that although moderation systems succeeded in blocking malicious generations pervasively, users frequently experienced frustration in failures of both moderation systems and user support after moderation. Based on these findings, we suggest improvements for content moderation policy and user experiences in real-world GAI products.

Authors:Leon Janzen, Florentin Putz, Marc-André Kaufhold, Kolja Straub, Matthias Hollick
Title: The User Perspective on Island-Ready 6G Communication: A Survey of Future Smartphone Usage in Crisis-Struck Areas with Local Cellular Connectivity
Abstract:
Using smartphone apps during crises is well-established, proving critical for efficient crisis response. However, such apps become futile without an Internet connection, which is a common issue during crises. The ongoing 6G standardization explores the capability to provide local cellular connectivity for areas cut off from the Internet in crises. This paper introduces to the HCI community the concept of cellular island connectivity in isolated areas, promising a seamless transition from normal operation to island operation with local-only cellular connectivity. It presents findings from a survey (N = 857) among adult smartphone users from major German cities regarding their smartphone usage preferences in this model. Results show a shift in app demand, with users favoring general-purpose apps over dedicated crisis apps in specific scenarios. We prioritize smartphone services based on their criticality, distinguishing between apps essential for crisis response and those supporting routines. Our findings provide operators, developers, and authorities insights into making user-centric design decisions for implementing island-ready 6G communication.

Authors:Yi He, Yuqi Liu, Chenpu Li, Ruoyan Chen, Chuer Chen, Shengqi Dang, Nan Cao
Title: ChartBlender: An Interactive System for Authoring and Synchronizing Visualization Charts in Video
Abstract:
Embedding data visualizations in video can enhance the communication of complex information. However, this process is often labor-intensive, requiring designers to adjust visualizations frame by frame manually. In this work, we present ChartBlender, a novel system that streamlines this process by enabling users to create data visualizations, embed them seamlessly into video scenes, and automatically synchronize them with both camera motion and moving objects. Particularly, ChartBlender incorporates a tracking algorithm that supports both object and camera tracking, ensuring robust alignment of visualizations with dynamic video content. To maintain visual clarity and aesthetic coherence, we also explore the design space of video-suited visualizations and develop a library of customizable templates optimized for video embedding. We evaluate \oursName\ChartBlender through two controlled experiments and expert interviews with five domain experts. Results show that our system enables accurate synchronization and accelerates the production of data-driven videos.

Authors:Ruiyan Zhu, Xi Cheng, Ke Liu, Brian Zhu, Daniel Jin, Neeraj Parihar, Zhoutian Xu, Oliver Gao
Title: SheetMind: An End-to-End LLM-Powered Multi-Agent Framework for Spreadsheet Automation
Abstract:
We present SheetMind, a modular multi-agent framework powered by large language models (LLMs) for spreadsheet automation via natural language instructions. The system comprises three specialized agents: a Manager Agent that decomposes complex user instructions into subtasks; an Action Agent that translates these into structured commands using a Backus Naur Form (BNF) grammar; and a Reflection Agent that validates alignment between generated actions and the user's original intent. Integrated into Google Sheets via a Workspace extension, SheetMind supports real-time interaction without requiring scripting or formula knowledge. Experiments on benchmark datasets demonstrate an 80 percent success rate on single step tasks and approximately 70 percent on multi step instructions, outperforming ablated and baseline variants. Our results highlight the effectiveness of multi agent decomposition and grammar based execution for bridging natural language and spreadsheet functionalities.

Authors:Avinash Baidya, Kamalika Das, Xiang Gao
Title: The Behavior Gap: Evaluating Zero-shot LLM Agents in Complex Task-Oriented Dialogs
Abstract:
Large Language Model (LLM)-based agents have significantly impacted Task-Oriented Dialog Systems (TODS) but continue to face notable performance challenges, especially in zero-shot scenarios. While prior work has noted this performance gap, the behavioral factors driving the performance gap remain under-explored. This study proposes a comprehensive evaluation framework to quantify the behavior gap between AI agents and human experts, focusing on discrepancies in dialog acts, tool usage, and knowledge utilization. Our findings reveal that this behavior gap is a critical factor negatively impacting the performance of LLM agents. Notably, as task complexity increases, the behavior gap widens (correlation: 0.963), leading to a degradation of agent performance on complex task-oriented dialogs. For the most complex task in our study, even the GPT-4o-based agent exhibits low alignment with human behavior, with low F1 scores for dialog acts (0.464), excessive and often misaligned tool usage with a F1 score of 0.139, and ineffective usage of external knowledge. Reducing such behavior gaps leads to significant performance improvement (24.3% on average). This study highlights the importance of comprehensive behavioral evaluations and improved alignment strategies to enhance the effectiveness of LLM-based TODS in handling complex tasks.

Authors:Chuer Chen, Xiaoke Yan, Xiaoyu Qi, Nan Cao
Title: IDEA: Augmenting Design Intelligence through Design Space Exploration
Abstract:
Design spaces serve as a conceptual framework that enables designers to explore feasible solutions through the selection and combination of design elements. However, effective decision-making remains heavily dependent on the designer's experience, and the absence of mathematical formalization prevents computational support for automated design processes. To bridge this gap, we introduce a structured representation that models design spaces with orthogonal dimensions and discrete selectable elements. Building on this model, we present IDEA, a decision-making framework for augmenting design intelligence through design space exploration to generate effective outcomes. Specifically, IDEA leverages large language models (LLMs) for constraint generation, incorporates a Monte Carlo Tree Search (MCTS) algorithm guided by these constraints to explore the design space efficiently, and instantiates abstract decisions into domain-specific implementations. We validate IDEA in two design scenarios: data-driven article composition and pictorial visualization generation, supported by example results, expert interviews, and a user study. The evaluation demonstrates the IDEA's adaptability across domains and its capability to produce superior design outcomes.

Authors:Kiana Jafari Meimandi, Gabriela Aránguiz-Dias, Grace Ra Kim, Lana Saadeddin, Mykel J. Kochenderfer
Title: The Measurement Imbalance in Agentic AI Evaluation Undermines Industry Productivity Claims
Abstract:
As industry reports claim agentic AI systems deliver double-digit productivity gains and multi-trillion dollar economic potential, the validity of these claims has become critical for investment decisions, regulatory policy, and responsible technology adoption. However, this paper demonstrates that current evaluation practices for agentic AI systems exhibit a systemic imbalance that calls into question prevailing industry productivity claims. Our systematic review of 84 papers (2023--2025) reveals an evaluation imbalance where technical metrics dominate assessments (83%), while human-centered (30%), safety (53%), and economic assessments (30%) remain peripheral, with only 15% incorporating both technical and human dimensions. This measurement gap creates a fundamental disconnect between benchmark success and deployment value. We present evidence from healthcare, finance, and retail sectors where systems excelling on technical metrics failed in real-world implementation due to unmeasured human, temporal, and contextual factors. Our position is not against agentic AI's potential, but rather that current evaluation frameworks systematically privilege narrow technical metrics while neglecting dimensions critical to real-world success. We propose a balanced four-axis evaluation model and call on the community to lead this paradigm shift because benchmark-driven optimization shapes what we build. By redefining evaluation practices, we can better align industry claims with deployment realities and ensure responsible scaling of agentic systems in high-stakes domains.

Authors:Amr Gomaa, Simon Engel, Elena Meiser, Abdulrahman Mohamed Selim, Tobias Jungbluth, Aeneas Leon Sommer, Sarah Kohlmann, Michael Barz, Maurice Rekrut, Michael Feld, Daniel Sonntag, Antonio Krüger
Title: Your Interface, Your Control: Adapting Takeover Requests for Seamless Handover in Semi-Autonomous Vehicles
Abstract:
With the automotive industry transitioning towards conditionally automated driving, takeover warning systems are crucial for ensuring safe collaborative driving between users and semi-automated vehicles. However, previous work has focused on static warning systems that do not accommodate different driver states. Therefore, we propose an adaptive takeover warning system that is personalised to drivers, enhancing their experience and safety. We conducted two user studies investigating semi-autonomous driving scenarios in rural and urban environments while participants performed non-driving-related tasks such as text entry and visual search. We investigated the effects of varying time budgets and head-up versus head-down displays for takeover requests on drivers' situational awareness and mental state. Through our statistical and clustering analyses, we propose strategies for designing adaptable takeover systems, e.g., using longer time budgets and head-up displays for non-hazardous takeover events in high-complexity environments while using shorter time budgets and head-down displays for hazardous events in low-complexity environments.

Authors:Ziliang Zhang, Cong Liu, Hyoseung Kim
Title: Understanding and Mitigating Network Latency Effect on Teleoperated-Robot with Extended Reality
Abstract:
Robot teleoperation with extended reality (XR teleoperation) enables intuitive interaction by allowing remote robots to mimic user motions with real-time 3D feedback. However, existing systems face significant motion-to-motion (M2M) latency--the delay between the user's latest motion and the corresponding robot feedback--leading to high teleoperation error and mission completion time. This issue stems from the system's exclusive reliance on network communication, making it highly vulnerable to network degradation. To address these challenges, we introduce TeleXR, the first end-to-end, fully open-sourced XR teleoperation framework that decouples robot control and XR visualization from network dependencies. TeleXR leverages local sensing data to reconstruct delayed or missing information of the counterpart, thereby significantly reducing network-induced issues. This approach allows both the XR and robot to run concurrently with network transmission while maintaining high robot planning accuracy. TeleXR also features contention-aware scheduling to mitigate GPU contention and bandwidth-adaptive point cloud scaling to cope with limited bandwidth.

Authors:Hayoung Jung, Shravika Mittal, Ananya Aatreya, Navreet Kaur, Munmun De Choudhury, Tanushree Mitra
Title: MythTriage: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform
Abstract:
Understanding the prevalence of misinformation in health topics online can inform public health policies and interventions. However, measuring such misinformation at scale remains a challenge, particularly for high-stakes but understudied topics like opioid-use disorder (OUD)--a leading cause of death in the U.S. We present the first large-scale study of OUD-related myths on YouTube, a widely-used platform for health information. With clinical experts, we validate 8 pervasive myths and release an expert-labeled video dataset. To scale labeling, we introduce MythTriage, an efficient triage pipeline that uses a lightweight model for routine cases and defers harder ones to a high-performing, but costlier, large language model (LLM). MythTriage achieves up to 0.86 macro F1-score while estimated to reduce annotation time and financial cost by over 76% compared to experts and full LLM labeling. We analyze 2.9K search results and 343K recommendations, uncovering how myths persist on YouTube and offering actionable insights for public health and platform moderation.

Authors:Anjali Singh, Zhitong Guan, Soo Young Rieh
Title: Enhancing Critical Thinking in Generative AI Search with Metacognitive Prompts
Abstract:
The growing use of Generative AI (GenAI) conversational search tools has raised concerns about their effects on people's metacognitive engagement, critical thinking, and learning. As people increasingly rely on GenAI to perform tasks such as analyzing and applying information, they may become less actively engaged in thinking and learning. This study examines whether metacognitive prompts - designed to encourage people to pause, reflect, assess their understanding, and consider multiple perspectives - can support critical thinking during GenAI-based search. We conducted a user study (N=40) with university students to investigate the impact of metacognitive prompts on their thought processes and search behaviors while searching with a GenAI tool. We found that these prompts led to more active engagement, prompting students to explore a broader range of topics and engage in deeper inquiry through follow-up queries. Students reported that the prompts were especially helpful for considering overlooked perspectives, promoting evaluation of AI responses, and identifying key takeaways. Additionally, the effectiveness of these prompts was influenced by students' metacognitive flexibility. Our findings highlight the potential of metacognitive prompts to foster critical thinking and provide insights for designing and implementing metacognitive support in human-AI interactions.

Authors:Bhanuka Gamage, Leona Holloway, Nicola McDowell, Thanh-Toan Do, Nicholas Price, Arthur Lowery, Kim Marriott
Title: Vision-Based Assistive Technologies for People with Cerebral Visual Impairment: A Review and Focus Study
Abstract:
Over the past decade, considerable research has investigated Vision-Based Assistive Technologies (VBAT) to support people with vision impairments to understand and interact with their immediate environment using machine learning, computer vision, image enhancement, and/or augmented/virtual reality. However, this has almost totally overlooked a growing demographic: people with Cerebral Visual Impairment (CVI). Unlike ocular vision impairments, CVI arises from damage to the brain's visual processing centres. Through a scoping review, this paper reveals a significant research gap in addressing the needs of this demographic. Three focus studies involving 7 participants with CVI explored the challenges, current strategies, and opportunities for VBAT. We also discussed the assistive technology needs of people with CVI compared with ocular low vision. Our findings highlight the opportunity for the Human-Computer Interaction and Assistive Technologies research community to explore and address this underrepresented domain, thereby enhancing the quality of life for people with CVI.

Authors:Bhanuka Gamage, Leona Holloway, Nicola McDowell, Thanh-Toan Do, Nicholas Seow Chiang Price, Arthur James Lowery, Kim Marriott
Title: Broadening Our View: Assistive Technology for Cerebral Visual Impairment
Abstract:
Over the past decade, considerable research has been directed towards assistive technologies to support people with vision impairments using machine learning, computer vision, image enhancement, and/or augmented/virtual reality. However, this has almost totally overlooked a growing demographic: people with Cerebral Visual Impairment (CVI). Unlike Ocular Vision Impairments (OVI), CVI arises from damage to the brain's visual processing centres. This paper introduces CVI and reveals a wide research gap in addressing the needs of this demographic. Through a scoping review, we identified 14 papers at the intersection of these technologies and CVI. Of these, only three papers described assistive technologies focused on people living with CVI, with the others focusing on diagnosis, understanding, simulation or rehabilitation. Our findings highlight the opportunity for the Human-Computer Interaction and Assistive Technologies research community to explore and address this underrepresented domain, thereby enhancing the quality of life for people with CVI.

Authors:Saharsh Barve, Andy Mao, Jiayue Melissa Shi, Prerna Juneja, Koustuv Saha
Title: Can we Debias Social Stereotypes in AI-Generated Images? Examining Text-to-Image Outputs and User Perceptions
Abstract:
Recent advances in generative AI have enabled visual content creation through text-to-image (T2I) generation. However, despite their creative potential, T2I models often replicate and amplify societal stereotypes -- particularly those related to gender, race, and culture -- raising important ethical concerns. This paper proposes a theory-driven bias detection rubric and a Social Stereotype Index (SSI) to systematically evaluate social biases in T2I outputs. We audited three major T2I model outputs -- DALL-E-3, Midjourney-6.1, and Stability AI Core -- using 100 queries across three categories -- geocultural, occupational, and adjectival. Our analysis reveals that initial outputs are prone to include stereotypical visual cues, including gendered professions, cultural markers, and western beauty norms. To address this, we adopted our rubric to conduct targeted prompt refinement using LLMs, which significantly reduced bias -- SSI dropped by 61% for geocultural, 69% for occupational, and 51% for adjectival queries. We complemented our quantitative analysis through a user study examining perceptions, awareness, and preferences around AI-generated biased imagery. Our findings reveal a key tension -- although prompt refinement can mitigate stereotypes, it can limit contextual alignment. Interestingly, users often perceived stereotypical images to be more aligned with their expectations. We discuss the need to balance ethical debiasing with contextual relevance and call for T2I systems that support global diversity and inclusivity while not compromising the reflection of real-world social complexity.

Authors:Tianwa Chen, Barbara Weber, Graeme Shanks, Gianluca Demartini, Marta Indulska, Shazia Sadiq
Title: How Do Experts Make Sense of Integrated Process Models?
Abstract:
A range of integrated modeling approaches have been developed to enable a holistic representation of business process logic together with all relevant business rules. These approaches address inherent problems with separate documentation of business process models and business rules. In this study, we explore how expert process workers make sense of the information provided through such integrated modeling approaches. To do so, we complement verbal protocol analysis with eye-tracking metrics to reveal nuanced user behaviours involved in the main phases of sensemaking, namely information foraging and information processing. By studying expert process workers engaged in tasks based on integrated modeling of business processes and rules, we provide insights that pave the way for a better understanding of sensemaking practices and improved development of business process and business rule integration approaches. Our research underscores the importance of offering personalized support mechanisms that increase the efficacy and efficiency of sensemaking practices for process knowledge workers.

Authors:Vaishali Dhanoa, Anton Wolter, Gabriela Molina León, Hans-Jörg Schulz, Niklas Elmqvist
Title: Agentic Visualization: Extracting Agent-based Design Patterns from Visualization Systems
Abstract:
Autonomous agents powered by Large Language Models are transforming AI, creating an imperative for the visualization field to embrace agentic frameworks. However, our field's focus on a human in the sensemaking loop raises critical questions about autonomy, delegation, and coordination for such \textit{agentic visualization} that preserve human agency while amplifying analytical capabilities. This paper addresses these questions by reinterpreting existing visualization systems with semi-automated or fully automatic AI components through an agentic lens. Based on this analysis, we extract a collection of design patterns for agentic visualization, including agentic roles, communication and coordination. These patterns provide a foundation for future agentic visualization systems that effectively harness AI agents while maintaining human insight and control.

Authors:Arnav Verma, Kushin Mukherjee, Christopher Potts, Elisa Kreiss, Judith E. Fan
Title: CHART-6: Human-Centered Evaluation of Data Visualization Understanding in Vision-Language Models
Abstract:
Data visualizations are powerful tools for communicating patterns in quantitative data. Yet understanding any data visualization is no small feat -- succeeding requires jointly making sense of visual, numerical, and linguistic inputs arranged in a conventionalized format one has previously learned to parse. Recently developed vision-language models are, in principle, promising candidates for developing computational models of these cognitive operations. However, it is currently unclear to what degree these models emulate human behavior on tasks that involve reasoning about data visualizations. This gap reflects limitations in prior work that has evaluated data visualization understanding in artificial systems using measures that differ from those typically used to assess these abilities in humans. Here we evaluated eight vision-language models on six data visualization literacy assessments designed for humans and compared model responses to those of human participants. We found that these models performed worse than human participants on average, and this performance gap persisted even when using relatively lenient criteria to assess model performance. Moreover, while relative performance across items was somewhat correlated between models and humans, all models produced patterns of errors that were reliably distinct from those produced by human participants. Taken together, these findings suggest significant opportunities for further development of artificial systems that might serve as useful models of how humans reason about data visualizations. All code and data needed to reproduce these results are available at: https://osf.io/e25mu/?view_only=399daff5a14d4b16b09473cf19043f18.

Authors:Bradley Coles, Yahya Hmaiti, Joseph J. LaViola
Title: Exploring Perception-Based Techniques for Redirected Walking in VR: A Comprehensive Survey
Abstract:
We present a comprehensive survey of perception-based redirected walking (RDW) techniques in virtual reality (VR), presenting a taxonomy that serves as a framework for understanding and designing RDW algorithms. RDW enables users to explore virtual environments (VEs) larger than their physical space, addressing the constraints of real walking in limited home VR setups. Our review spans 232 papers, with 165 included in the final analysis. We categorize perception-based RDW techniques based on gains, gain application, target orientation calculation, and optional general enhancements, identifying key patterns and relationships. We present data on how current work aligns within this classification system and suggest how this data can guide future work into areas that are relatively under explored. This taxonomy clarifies perception-based RDW techniques, guiding the design and application of RDW systems, and suggests future research directions to enhance VR user experience.

Authors:Yuan-Hao Jiang, Kezong Tang, Zi-Wei Chen, Yuang Wei, Tian-Yi Liu, Jiayi Wu
Title: MAS-KCL: Knowledge component graph structure learning with large language model-based agentic workflow
Abstract:
Knowledge components (KCs) are the fundamental units of knowledge in the field of education. A KC graph illustrates the relationships and dependencies between KCs. An accurate KC graph can assist educators in identifying the root causes of learners' poor performance on specific KCs, thereby enabling targeted instructional interventions. To achieve this, we have developed a KC graph structure learning algorithm, named MAS-KCL, which employs a multi-agent system driven by large language models for adaptive modification and optimization of the KC graph. Additionally, a bidirectional feedback mechanism is integrated into the algorithm, where AI agents leverage this mechanism to assess the value of edges within the KC graph and adjust the distribution of generation probabilities for different edges, thereby accelerating the efficiency of structure learning. We applied the proposed algorithm to 5 synthetic datasets and 4 real-world educational datasets, and experimental results validate its effectiveness in learning path recognition. By accurately identifying learners' learning paths, teachers are able to design more comprehensive learning plans, enabling learners to achieve their educational goals more effectively, thus promoting the sustainable development of education.

Authors:Shangqun Yu, Hochul Hwang, Trung M. Dang, Joydeep Biswas, Nicholas A. Giudice, Sunghoon Ivan Lee, Donghyun Kim
Title: Human-Centered Development of Guide Dog Robots: Quiet and Stable Locomotion Control
Abstract:
A quadruped robot is a promising system that can offer assistance comparable to that of dog guides due to its similar form factor. However, various challenges remain in making these robots a reliable option for blind and low-vision (BLV) individuals. Among these challenges, noise and jerky motion during walking are critical drawbacks of existing quadruped robots. While these issues have largely been overlooked in guide dog robot research, our interviews with guide dog handlers and trainers revealed that acoustic and physical disturbances can be particularly disruptive for BLV individuals, who rely heavily on environmental sounds for navigation. To address these issues, we developed a novel walking controller for slow stepping and smooth foot swing/contact while maintaining human walking speed, as well as robust and stable balance control. The controller integrates with a perception system to facilitate locomotion over non-flat terrains, such as stairs. Our controller was extensively tested on the Unitree Go1 robot and, when compared with other control methods, demonstrated significant noise reduction -- half of the default locomotion controller. In this study, we adopt a mixed-methods approach to evaluate its usability with BLV individuals. In our indoor walking experiments, participants compared our controller to the robot's default controller. Results demonstrated superior acceptance of our controller, highlighting its potential to improve the user experience of guide dog robots. Video demonstration (best viewed with audio) available at: https://youtu.be/8-pz_8Hqe6s.

Authors:Shuo Wang, Tong Ren, Nan Cheng, Rong Wang, Li Zhang
Title: Patient-Specific Dynamic Digital-Physical Twin for Coronary Intervention Training: An Integrated Mixed Reality Approach
Abstract:
Background and Objective: Precise preoperative planning and effective physician training for coronary interventions are increasingly important. Despite advances in medical imaging technologies, transforming static or limited dynamic imaging data into comprehensive dynamic cardiac models remains challenging. Existing training systems lack accurate simulation of cardiac physiological dynamics. This study develops a comprehensive dynamic cardiac model research framework based on 4D-CTA, integrating digital twin technology, computer vision, and physical model manufacturing to provide precise, personalized tools for interventional cardiology. Methods: Using 4D-CTA data from a 60-year-old female with three-vessel coronary stenosis, we segmented cardiac chambers and coronary arteries, constructed dynamic models, and implemented skeletal skinning weight computation to simulate vessel deformation across 20 cardiac phases. Transparent vascular physical models were manufactured using medical-grade silicone. We developed cardiac output analysis and virtual angiography systems, implemented guidewire 3D reconstruction using binocular stereo vision, and evaluated the system through angiography validation and CABG training applications. Results: Morphological consistency between virtual and real angiography reached 80.9%. Dice similarity coefficients for guidewire motion ranged from 0.741-0.812, with mean trajectory errors below 1.1 mm. The transparent model demonstrated advantages in CABG training, allowing direct visualization while simulating beating heart challenges. Conclusion: Our patient-specific digital-physical twin approach effectively reproduces both anatomical structures and dynamic characteristics of coronary vasculature, offering a dynamic environment with visual and tactile feedback valuable for education and clinical planning.

Authors:Akaash Kolluri, Renn Su, Farnaz Jahanbakhsh, Dora Zhao, Tiziano Piccardi, Michael S. Bernstein
Title: Alexandria: A Library of Pluralistic Values for Realtime Re-Ranking of Social Media Feeds
Abstract:
Social media feed ranking algorithms fail when they too narrowly focus on engagement as their objective. The literature has asserted a wide variety of values that these algorithms should account for as well -- ranging from well-being to productive discourse -- far more than can be encapsulated by a single topic or theory. In response, we present a $\textit{library of values}$ for social media algorithms: a pluralistic set of 78 values as articulated across the literature, implemented into LLM-powered content classifiers that can be installed individually or in combination for real-time re-ranking of social media feeds. We investigate this approach by developing a browser extension, $\textit{Alexandria}$, that re-ranks the X/Twitter feed in real time based on the user's desired values. Through two user studies, both qualitative (N=12) and quantitative (N=257), we found that diverse user needs require a large library of values, enabling more nuanced preferences and greater user control. With this work, we argue that the values criticized as missing from social media ranking algorithms can be operationalized and deployed today through end-user tools.

Authors:Tin Trung Nguyen, Jiannan Xu, Phuong-Anh Nguyen-Le, Jonathan Lazar, Donald Braman, Hal Daumé, Zubin Jelveh
Title: Which Demographic Features Are Relevant for Individual Fairness Evaluation of U.S. Recidivism Risk Assessment Tools?
Abstract:
Despite its constitutional relevance, the technical ``individual fairness'' criterion has not been operationalized in U.S. state or federal statutes/regulations. We conduct a human subjects experiment to address this gap, evaluating which demographic features are relevant for individual fairness evaluation of recidivism risk assessment (RRA) tools. Our analyses conclude that the individual similarity function should consider age and sex, but it should ignore race.

Authors:Jennifer Haase, Paul H. P. Hanel, Sebastian Pokutta
Title: S-DAT: A Multilingual, GenAI-Driven Framework for Automated Divergent Thinking Assessment
Abstract:
This paper introduces S-DAT (Synthetic-Divergent Association Task), a scalable, multilingual framework for automated assessment of divergent thinking (DT) -a core component of human creativity. Traditional creativity assessments are often labor-intensive, language-specific, and reliant on subjective human ratings, limiting their scalability and cross-cultural applicability. In contrast, S-DAT leverages large language models and advanced multilingual embeddings to compute semantic distance -- a language-agnostic proxy for DT. We evaluate S-DAT across eleven diverse languages, including English, Spanish, German, Russian, Hindi, and Japanese (Kanji, Hiragana, Katakana), demonstrating robust and consistent scoring across linguistic contexts. Unlike prior DAT approaches, the S-DAT shows convergent validity with other DT measures and correct discriminant validity with convergent thinking. This cross-linguistic flexibility allows for more inclusive, global-scale creativity research, addressing key limitations of earlier approaches. S-DAT provides a powerful tool for fairer, more comprehensive evaluation of cognitive flexibility in diverse populations and can be freely assessed online: https://sdat.iol.zib.de/.

Authors:Lucas McCullum, Pelagie Ami Agassi, Leo Anthony Celi, Daniel K. Ebner, Chrystinne Oliveira Fernandes, Rachel S. Hicklen, Mkliwa Koumbia, Lisa Soleymani Lehmann, David Restrepo
Title: Performance Gains of LLMs With Humans in a World of LLMs Versus Humans
Abstract:
Currently, a considerable research effort is devoted to comparing LLMs to a group of human experts, where the term "expert" is often ill-defined or variable, at best, in a state of constantly updating LLM releases. Without proper safeguards in place, LLMs will threaten to cause harm to the established structure of safe delivery of patient care which has been carefully developed throughout history to keep the safety of the patient at the forefront. A key driver of LLM innovation is founded on community research efforts which, if continuing to operate under "humans versus LLMs" principles, will expedite this trend. Therefore, research efforts moving forward must focus on effectively characterizing the safe use of LLMs in clinical settings that persist across the rapid development of novel LLM models. In this communication, we demonstrate that rather than comparing LLMs to humans, there is a need to develop strategies enabling efficient work of humans with LLMs in an almost symbiotic manner.

Authors:Suleyman Ozdel, Johannes Meyer, Yasmeen Abdrabou, Enkelejda Kasneci
Title: User Identification with LFI-Based Eye Movement Data Using Time and Frequency Domain Features
Abstract:
Laser interferometry (LFI)-based eye-tracking systems provide an alternative to traditional camera-based solutions, offering improved privacy by eliminating the risk of direct visual identification. However, the high-frequency signals captured by LFI-based trackers may still contain biometric information that enables user identification. This study investigates user identification from raw high-frequency LFI-based eye movement data by analyzing features extracted from both the time and frequency domains. Using velocity and distance measurements without requiring direct gaze data, we develop a multi-class classification model to accurately distinguish between individuals across various activities. Our results demonstrate that even without direct visual cues, eye movement patterns exhibit sufficient uniqueness for user identification, achieving 93.14% accuracy and a 2.52% EER with 5-second windows across both static and dynamic tasks. Additionally, we analyze the impact of sampling rate and window size on model performance, providing insights into the feasibility of LFI-based biometric recognition. Our findings demonstrate the novel potential of LFI-based eye-tracking for user identification, highlighting both its promise for secure authentication and emerging privacy risks. This work paves the way for further research into high-frequency eye movement data.

Authors:Songchen Zhou, Mark Armstrong, Giulia Barbareschi, Toshihiro Ajioka, Zheng Hu, Ryoichi Ando, Kentaro Yoshifuji, Masatane Muto, Kouta Minamizawa
Title: Augmented Body Communicator: Enhancing daily body expression for people with upper limb limitations through LLM and a robotic arm
Abstract:
Individuals with upper limb movement limitations face challenges in interacting with others. Although robotic arms are currently used primarily for functional tasks, there is considerable potential to explore ways to enhance users' body language capabilities during social interactions. This paper introduces an Augmented Body Communicator system that integrates robotic arms and a large language model. Through the incorporation of kinetic memory, disabled users and their supporters can collaboratively design actions for the robot arm. The LLM system then provides suggestions on the most suitable action based on contextual cues during interactions. The system underwent thorough user testing with six participants who have conditions affecting upper limb mobility. Results indicate that the system improves users' ability to express themselves. Based on our findings, we offer recommendations for developing robotic arms that support disabled individuals with body language capabilities and functional tasks.

Authors:Yiwen Zhang, Jianing Hao, Zhan Wang, Hongling Sheng, Wei Zeng
Title: Facilitating Video Story Interaction with Multi-Agent Collaborative System
Abstract:
Video story interaction enables viewers to engage with and explore narrative content for personalized experiences. However, existing methods are limited to user selection, specially designed narratives, and lack customization. To address this, we propose an interactive system based on user intent. Our system uses a Vision Language Model (VLM) to enable machines to understand video stories, combining Retrieval-Augmented Generation (RAG) and a Multi-Agent System (MAS) to create evolving characters and scene experiences. It includes three stages: 1) Video story processing, utilizing VLM and prior knowledge to simulate human understanding of stories across three modalities. 2) Multi-space chat, creating growth-oriented characters through MAS interactions based on user queries and story stages. 3) Scene customization, expanding and visualizing various story scenes mentioned in dialogue. Applied to the Harry Potter series, our study shows the system effectively portrays emergent character social behavior and growth, enhancing the interactive experience in the video story world.

Authors:Jiatao Li, Yanheng Li, Xiaojun Wan
Title: Analyzing Cognitive Differences Among Large Language Models through the Lens of Social Worldview
Abstract:
Large Language Models (LLMs) have become integral to daily life, widely adopted in communication, decision-making, and information retrieval, raising critical questions about how these systems implicitly form and express socio-cognitive attitudes or "worldviews". While existing research extensively addresses demographic and ethical biases, broader dimensions-such as attitudes toward authority, equality, autonomy, and fate-remain under-explored. In this paper, we introduce the Social Worldview Taxonomy (SWT), a structured framework grounded in Cultural Theory, operationalizing four canonical worldviews (Hierarchy, Egalitarianism, Individualism, Fatalism) into measurable sub-dimensions. Using SWT, we empirically identify distinct and interpretable cognitive profiles across 28 diverse LLMs. Further, inspired by Social Referencing Theory, we experimentally demonstrate that explicit social cues systematically shape these cognitive attitudes, revealing both general response patterns and nuanced model-specific variations. Our findings enhance the interpretability of LLMs by revealing implicit socio-cognitive biases and their responsiveness to social feedback, thus guiding the development of more transparent and socially responsible language technologies.

Authors:Yu Zhang, Xinyue Chen, Weili Zheng, Yuhan Guo, Guozheng Li, Siming Chen, Xiaoru Yuan
Title: VisTaxa: Developing a Taxonomy of Historical Visualizations
Abstract:
Historical visualizations are a rich resource for visualization research. While taxonomy is commonly used to structure and understand the design space of visualizations, existing taxonomies primarily focus on contemporary visualizations and largely overlook historical visualizations. To address this gap, we describe an empirical method for taxonomy development. We introduce a coding protocol and the VisTaxa system for taxonomy labeling and comparison. We demonstrate using our method to develop a historical visualization taxonomy by coding 400 images of historical visualizations. We analyze the coding result and reflect on the coding process. Our work is an initial step toward a systematic investigation of the design space of historical visualizations.

Authors:Katelyn Xiaoying Mei, Rock Yuren Pang, Alex Lyford, Lucy Lu Wang, Katharina Reinecke
Title: Passing the Buck to AI: How Individuals' Decision-Making Patterns Affect Reliance on AI
Abstract:
Psychological research has identified different patterns individuals have while making decisions, such as vigilance (making decisions after thorough information gathering), hypervigilance (rushed and anxious decision-making), and buckpassing (deferring decisions to others). We examine whether these decision-making patterns shape peoples' likelihood of seeking out or relying on AI. In an online experiment with 810 participants tasked with distinguishing food facts from myths, we found that a higher buckpassing tendency was positively correlated with both seeking out and relying on AI suggestions, while being negatively correlated with the time spent reading AI explanations. In contrast, the higher a participant tended towards vigilance, the more carefully they scrutinized the AI's information, as indicated by an increased time spent looking through the AI's explanations. These findings suggest that a person's decision-making pattern plays a significant role in their adoption and reliance on AI, which provides a new understanding of individual differences in AI-assisted decision-making.

Authors:Jaeyoon Song, Zahra Ashktorab, Thomas W. Malone
Title: Togedule: Scheduling Meetings with Large Language Models and Adaptive Representations of Group Availability
Abstract:
Scheduling is a perennial-and often challenging-problem for many groups. Existing tools are mostly static, showing an identical set of choices to everyone, regardless of the current status of attendees' inputs and preferences. In this paper, we propose Togedule, an adaptive scheduling tool that uses large language models to dynamically adjust the pool of choices and their presentation format. With the initial prototype, we conducted a formative study (N=10) and identified the potential benefits and risks of such an adaptive scheduling tool. Then, after enhancing the system, we conducted two controlled experiments, one each for attendees and organizers (total N=66). For each experiment, we compared scheduling with verbal messages, shared calendars, or Togedule. Results show that Togedule significantly reduces the cognitive load of attendees indicating their availability and improves the speed and quality of the decisions made by organizers.

Authors:Jaewook Lee, Filippo Aleotti, Diego Mazala, Guillermo Garcia-Hernando, Sara Vicente, Oliver James Johnston, Isabel Kraus-Liang, Jakub Powierza, Donghoon Shin, Jon E. Froehlich, Gabriel Brostow, Jessica Van Brummelen
Title: ImaginateAR: AI-Assisted In-Situ Authoring in Augmented Reality
Abstract:
While augmented reality (AR) enables new ways to play, tell stories, and explore ideas rooted in the physical world, authoring personalized AR content remains difficult for non-experts, often requiring professional tools and time. Prior systems have explored AI-driven XR design but typically rely on manually defined VR environments and fixed asset libraries, limiting creative flexibility and real-world relevance. We introduce ImaginateAR, the first mobile tool for AI-assisted AR authoring to combine offline scene understanding, fast 3D asset generation, and LLMs -- enabling users to create outdoor scenes through natural language interaction. For example, saying "a dragon enjoying a campfire" (P7) prompts the system to generate and arrange relevant assets, which can then be refined manually. Our technical evaluation shows that our custom pipelines produce more accurate outdoor scene graphs and generate 3D meshes faster than prior methods. A three-part user study (N=20) revealed preferred roles for AI, how users create in freeform use, and design implications for future AR authoring tools. ImaginateAR takes a step toward empowering anyone to create AR experiences anywhere -- simply by speaking their imagination.

Authors:Kyuha Jung, Gyuho Lee, Yuanhui Huang, Yunan Chen
Title: "I've talked to ChatGPT about my issues last night.": Examining Mental Health Conversations with Large Language Models through Reddit Analysis
Abstract:
We investigate the role of large language models (LLMs) in supporting mental health by analyzing Reddit posts and comments about mental health conversations with ChatGPT. Our findings reveal that users value ChatGPT as a safe, non-judgmental space, often favoring it over human support due to its accessibility, availability, and knowledgeable responses. ChatGPT provides a range of support, including actionable advice, emotional support, and validation, while helping users better understand their mental states. Additionally, we found that ChatGPT offers innovative support for individuals facing mental health challenges, such as assistance in navigating difficult conversations, preparing for therapy sessions, and exploring therapeutic interventions. However, users also voiced potential risks, including the spread of incorrect health advice, ChatGPT's overly validating nature, and privacy concerns. We discuss the implications of LLMs as tools for mental health support in both everyday health and clinical therapy settings and suggest strategies to mitigate risks in LLM-powered interactions.

Authors:Meisam J. Sekiavandi, Laurits Dixen, Jostein Fimland, Sree Keerthi Desu, Antonia-Bianca Zserai, Ye Sul Lee, Maria Barrett, Paolo Burelli
Title: Advancing Face-to-Face Emotion Communication: A Multimodal Dataset (AFFEC)
Abstract:
Emotion recognition has the potential to play a pivotal role in enhancing human-computer interaction by enabling systems to accurately interpret and respond to human affect. Yet, capturing emotions in face-to-face contexts remains challenging due to subtle nonverbal cues, variations in personal traits, and the real-time dynamics of genuine interactions. Existing emotion recognition datasets often rely on limited modalities or controlled conditions, thereby missing the richness and variability found in real-world scenarios. In this work, we introduce Advancing Face-to-Face Emotion Communication (AFFEC), a multimodal dataset designed to address these gaps. AFFEC encompasses 84 simulated emotional dialogues across six distinct emotions, recorded from 73 participants over more than 5,000 trials and annotated with more than 20,000 labels. It integrates electroencephalography (EEG), eye-tracking, galvanic skin response (GSR), facial videos, and Big Five personality assessments. Crucially, AFFEC explicitly distinguishes between felt emotions (the participant's internal affect) and perceived emotions (the observer's interpretation of the stimulus). Baseline analyses spanning unimodal features and straightforward multimodal fusion demonstrate that even minimal processing yields classification performance significantly above chance, especially for arousal. Incorporating personality traits further improves predictions of felt emotions, highlighting the importance of individual differences. By bridging controlled experimentation with more realistic face-to-face stimuli, AFFEC offers a unique resource for researchers aiming to develop context-sensitive, adaptive, and personalized emotion recognition models.

Authors:Dong Whi Yoo, Jiayue Melissa Shi, Violeta J. Rodriguez, Koustuv Saha
Title: AI Chatbots for Mental Health: Values and Harms from Lived Experiences of Depression
Abstract:
Recent advancements in LLMs enable chatbots to interact with individuals on a range of queries, including sensitive mental health contexts. Despite uncertainties about their effectiveness and reliability, the development of LLMs in these areas is growing, potentially leading to harms. To better identify and mitigate these harms, it is critical to understand how the values of people with lived experiences relate to the harms. In this study, we developed a technology probe, a GPT-4o based chatbot called Zenny, enabling participants to engage with depression self-management scenarios informed by previous research. We used Zenny to interview 17 individuals with lived experiences of depression. Our thematic analysis revealed key values: informational support, emotional support, personalization, privacy, and crisis management. This work explores the relationship between lived experience values, potential harms, and design recommendations for mental health AI chatbots, aiming to enhance self-management support while minimizing risks.

Authors:Andrew M. Bean, Rebecca Payne, Guy Parsons, Hannah Rose Kirk, Juan Ciro, Rafael Mosquera, Sara Hincapié Monsalve, Aruna S. Ekanayaka, Lionel Tarassenko, Luc Rocher, Adam Mahdi
Title: Clinical knowledge in LLMs does not translate to human interactions
Abstract:
Global healthcare providers are exploring use of large language models (LLMs) to provide medical advice to the public. LLMs now achieve nearly perfect scores on medical licensing exams, but this does not necessarily translate to accurate performance in real-world settings. We tested if LLMs can assist members of the public in identifying underlying conditions and choosing a course of action (disposition) in ten medical scenarios in a controlled study with 1,298 participants. Participants were randomly assigned to receive assistance from an LLM (GPT-4o, Llama 3, Command R+) or a source of their choice (control). Tested alone, LLMs complete the scenarios accurately, correctly identifying conditions in 94.9% of cases and disposition in 56.3% on average. However, participants using the same LLMs identified relevant conditions in less than 34.5% of cases and disposition in less than 44.2%, both no better than the control group. We identify user interactions as a challenge to the deployment of LLMs for medical advice. Standard benchmarks for medical knowledge and simulated patient interactions do not predict the failures we find with human participants. Moving forward, we recommend systematic human user testing to evaluate interactive capabilities prior to public deployments in healthcare.

Authors:Hauke Sandhaus, Angel Hsing-Chi Hwang, Wendy Ju, Qian Yang
Title: My Precious Crash Data: Barriers and Opportunities in Encouraging Autonomous Driving Companies to Share Safety-Critical Data
Abstract:
Safety-critical data, such as crash and near-crash records, are crucial to improving autonomous vehicle (AV) design and development. Sharing such data across AV companies, academic researchers, regulators, and the public can help make all AVs safer. However, AV companies rarely share safety-critical data externally. This paper aims to pinpoint why AV companies are reluctant to share safety-critical data, with an eye on how these barriers can inform new approaches to promote sharing. We interviewed twelve AV company employees who actively work with such data in their day-to-day work. Findings suggest two key, previously unknown barriers to data sharing: (1) Datasets inherently embed salient knowledge that is key to improving AV safety and are resource-intensive. Therefore, data sharing, even within a company, is fraught with politics. (2) Interviewees believed AV safety knowledge is private knowledge that brings competitive edges to their companies, rather than public knowledge for social good. We discuss the implications of these findings for incentivizing and enabling safety-critical AV data sharing, specifically, implications for new approaches to (1) debating and stratifying public and private AV safety knowledge, (2) innovating data tools and data sharing pipelines that enable easier sharing of public AV safety data and knowledge; (3) offsetting costs of curating safety-critical data and incentivizing data sharing.

Authors:Chuer Chen, Yuqi Liu, Danqing Shi, Shixiong Cao, Nan Cao
Title: DataScout: Automatic Data Fact Retrieval for Statement Augmentation with an LLM-Based Agent
Abstract:
A data story typically integrates data facts from multiple perspectives and stances to construct a comprehensive and objective narrative. However, retrieving these facts demands time for data search and challenges the creator's analytical skills. In this work, we introduce DataScout, an interactive system that automatically performs reasoning and stance-based data facts retrieval to augment the user's statement. Particularly, DataScout leverages an LLM-based agent to construct a retrieval tree, enabling collaborative control of its expansion between users and the agent. The interface visualizes the retrieval tree as a mind map that eases users to intuitively steer the retrieval direction and effectively engage in reasoning and analysis. We evaluate the proposed system through case studies and in-depth expert interviews. Our evaluation demonstrates that DataScout can effectively retrieve multifaceted data facts from different stances, helping users verify their statements and enhance the credibility of their stories.

Authors:Chuer Chen, Shengqi Dang, Yuqi Liu, Nanxuan Zhao, Yang Shi, Nan Cao
Title: MV-Crafter: An Intelligent System for Music-guided Video Generation
Abstract:
Music videos, as a prevalent form of multimedia entertainment, deliver engaging audio-visual experiences to audiences and have gained immense popularity among singers and fans. Creators can express their interpretations of music naturally through visual elements. However, the creation process of music video demands proficiency in script design, video shooting, and music-video synchronization, posing significant challenges for non-professionals. Previous work has designed automated music video generation frameworks. However, they suffer from complexity in input and poor output quality. In response, we present MV-Crafter, a system capable of producing high-quality music videos with synchronized music-video rhythm and style. Our approach involves three technical modules that simulate the human creation process: the script generation module, video generation module, and music-video synchronization module. MV-Crafter leverages a large language model to generate scripts considering the musical semantics. To address the challenge of synchronizing short video clips with music of varying lengths, we propose a dynamic beat matching algorithm and visual envelope-induced warping method to ensure precise, monotonic music-video synchronization. Besides, we design a user-friendly interface to simplify the creation process with intuitive editing features. Extensive experiments have demonstrated that MV-Crafter provides an effective solution for improving the quality of generated music videos.

Authors:Chitralekha Gupta, Hanjun Wu, Praveen Sasikumar, Shreyas Sridhar, Priambudi Bagaskara, Suranga Nanayakkara
Title: Factually: Exploring Wearable Fact-Checking for Augmented Truth Discernment
Abstract:
Wearable devices are transforming human capabilities by seamlessly augmenting cognitive functions. In this position paper, we propose a voice-based, interactive learning companion designed to amplify and extend cognitive abilities through informal learning. Our vision is threefold: (1) to enable users to discover new knowledge on-the-go through contextual interactive quizzes, fostering critical thinking and mindfulness, (2) to proactively detect misinformation, empowering users to critically assess information in real time, and (3) to provide spoken language correction and prompting hints for second language learning and effective communication. As an initial step toward this vision, we present Factually - a proactive, wearable fact-checking system integrated into devices like smartwatches or rings. Factually discreetly alerts users to potential falsehoods via vibrotactile feedback, helping them assess information critically. We demonstrate its utility through three illustrative scenarios, highlighting its potential to extend cognitive abilities for real-time misinformation detection. Early qualitative feedback suggests that Factually can enhance users' fact-checking capabilities, offering both practical and experiential benefits.

Authors:Tianyu Zhang, Dongheng Zhang, Ruixu Geng, Xuecheng Xie, Shuai Yang, Yan Chen
Title: Lessons from Deploying Learning-based CSI Localization on a Large-Scale ISAC Platform
Abstract:
In recent years, Channel State Information (CSI), recognized for its fine-grained spatial characteristics, has attracted increasing attention in WiFi-based indoor localization. However, despite its potential, CSI-based approaches have yet to achieve the same level of deployment scale and commercialization as those based on Received Signal Strength Indicator (RSSI). A key limitation lies in the fact that most existing CSI-based systems are developed and evaluated in controlled, small-scale environments, limiting their generalizability. To bridge this gap, we explore the deployment of a large-scale CSI-based localization system involving over 400 Access Points (APs) in a real-world building under the Integrated Sensing and Communication (ISAC) paradigm. We highlight two critical yet often overlooked factors: the underutilization of unlabeled data and the inherent heterogeneity of CSI measurements. To address these challenges, we propose a novel CSI-based learning framework for WiFi localization, tailored for large-scale ISAC deployments on the server side. Specifically, we employ a novel graph-based structure to model heterogeneous CSI data and reduce redundancy. We further design a pretext pretraining task that incorporates spatial and temporal priors to effectively leverage large-scale unlabeled CSI data. Complementarily, we introduce a confidence-aware fine-tuning strategy to enhance the robustness of localization results. In a leave-one-smartphone-out experiment spanning five floors and 25, 600 m2, we achieve a median localization error of 2.17 meters and a floor accuracy of 99.49%. This performance corresponds to an 18.7% reduction in mean absolute error (MAE) compared to the best-performing baseline.

Authors:Zhecheng Wang, Jiaju Ma, Eitan Grinspun, Bryan Wang, Tovi Grossman
Title: Script2Screen: Supporting Dialogue Scriptwriting with Interactive Audiovisual Generation
Abstract:
Scriptwriting has traditionally been text-centric, a modality that only partially conveys the produced audiovisual experience. A formative study with professional writers informed us that connecting textual and audiovisual modalities can aid ideation and iteration, especially for writing dialogues. In this work, we present Script2Screen, an AI-assisted tool that integrates scriptwriting with audiovisual scene creation in a unified, synchronized workflow. Focusing on dialogues in scripts, Script2Screen generates expressive scenes with emotional speeches and animated characters through a novel text-to-audiovisual-scene pipeline. The user interface provides fine-grained controls, allowing writers to fine-tune audiovisual elements such as character gestures, speech emotions, and camera angles. A user study with both novice and professional writers from various domains demonstrated that Script2Screen's interactive audiovisual generation enhances the scriptwriting process, facilitating iterative refinement while complementing, rather than replacing, their creative efforts.

Authors:Linkun Liu, Jian Sun, Ye Tian
Title: Should Benevolent Deception be Allowed in EHMI? A Mechanism Explanation Based on Game Theory
Abstract:
The application of external human-machine interface (EHMI) on autonomous vehicles (AVs) facilitates information exchange. Existing research fails to consider the impact of the sequence of actions, as well as the effects of EHMI applications and deception, raising the question of whether benevolent, well-intentioned deception should be permitted (i.e., misleading statements that are intended to benefit both parties). We established a game theory based EHMI information disclosure framework for AVs in this study. In considering benevolent deception, this framework divided the decision-making process into three stages, respectively encompassing three key questions: whether to disclose, when to disclose, and what type of intention information to disclose. The results show that theoretical advantages of deception exist in certain cases when AV expects to maximize the safety of the interaction. In 40 out of 484 cases (8.3%), safety can be enhanced through successful deception. Those successful deceptions fall into two categories: 1) In 28 of these cases, the straight-going AV expected the left-turning human-driven vehicle (HV) to yield, while HV exhibited lower speed and higher acceleration; 2) In 12 of these cases, AV expected HV to proceed first, while HV exhibited higher speed and lower acceleration. We also conducted a VR-based driving simulation experiment, and the results confirmed our conclusion. Additionally, we found that when participants had low trust in the EHMI, its use negatively impacted interaction efficiency instead. This study serves as an exploratory behavioral mechanism study based on specific hypotheses for future EHMI design and ethical decision-making of autonomous driving system.

Authors:Runlong Ye, Patrick Yung Kang Lee, Matthew Varona, Oliver Huang, Carolina Nobre
Title: ScholarMate: A Mixed-Initiative Tool for Qualitative Knowledge Work and Information Sensemaking
Abstract:
Synthesizing knowledge from large document collections is a critical yet increasingly complex aspect of qualitative research and knowledge work. While AI offers automation potential, effectively integrating it into human-centric sensemaking workflows remains challenging. We present ScholarMate, an interactive system designed to augment qualitative analysis by unifying AI assistance with human oversight. ScholarMate enables researchers to dynamically arrange and interact with text snippets on a non-linear canvas, leveraging AI for theme suggestions, multi-level summarization, and evidence-based theme naming, while ensuring transparency through traceability to source documents. Initial pilot studies indicated that users value this mixed-initiative approach, finding the balance between AI suggestions and direct manipulation crucial for maintaining interpretability and trust. We further demonstrate the system's capability through a case study analyzing 24 papers. By balancing automation with human control, ScholarMate enhances efficiency and supports interpretability, offering a valuable approach for productive human-AI collaboration in demanding sensemaking tasks common in knowledge work.

Authors:Mohammed Almutairi, Charles Chiang, Yuxin Bai, Diego Gomez-Zara
Title: tAIfa: Enhancing Team Effectiveness and Cohesion with AI-Generated Automated Feedback
Abstract:
Providing timely and actionable feedback is crucial for effective collaboration, learning, and coordination within teams. However, many teams face challenges in receiving feedback that aligns with their goals and promotes cohesion. We introduce tAIfa (``Team AI Feedback Assistant''), an AI agent that uses Large Language Models (LLMs) to provide personalized, automated feedback to teams and their members. tAIfa analyzes team interactions, identifies strengths and areas for improvement, and delivers targeted feedback based on communication patterns. We conducted a between-subjects study with 18 teams testing whether using tAIfa impacted their teamwork. Our findings show that tAIfa improved communication and contributions within the teams. This paper contributes to the Human-AI Interaction literature by presenting a computational framework that integrates LLMs to provide automated feedback, introducing tAIfa as a tool to enhance team engagement and cohesion, and providing insights into future AI applications to support team collaboration.

Authors:Maria-Teresa De Rosa Palmini, Eva Cetinic
Title: Exploring Language Patterns of Prompts in Text-to-Image Generation and Their Impact on Visual Diversity
Abstract:
Following the initial excitement, Text-to-Image (TTI) models are now being examined more critically. While much of the discourse has focused on biases and stereotypes embedded in large-scale training datasets, the sociotechnical dynamics of user interactions with these models remain underexplored. This study examines the linguistic and semantic choices users make when crafting prompts and how these choices influence the diversity of generated outputs. Analyzing over six million prompts from the Civiverse dataset on the CivitAI platform across seven months, we categorize users into three groups based on their levels of linguistic experimentation: consistent repeaters, occasional repeaters, and non-repeaters. Our findings reveal that as user participation grows over time, prompt language becomes increasingly homogenized through the adoption of popular community tags and descriptors, with repeated prompts comprising 40-50% of submissions. At the same time, semantic similarity and topic preferences remain relatively stable, emphasizing common subjects and surface aesthetics. Using Vendi scores to quantify visual diversity, we demonstrate a clear correlation between lexical similarity in prompts and the visual similarity of generated images, showing that linguistic repetition reinforces less diverse representations. These findings highlight the significant role of user-driven factors in shaping AI-generated imagery, beyond inherent model biases, and underscore the need for tools and practices that encourage greater linguistic and thematic experimentation within TTI systems to foster more inclusive and diverse AI-generated content.

Authors:Haoming Wang, Boyuan Yang, Xiangyu Yin, Wei Gao
Title: Never Start from Scratch: Expediting On-Device LLM Personalization via Explainable Model Selection
Abstract:
Personalization of Large Language Models (LLMs) is important in practical applications to accommodate the individual needs of different mobile users. Due to data privacy concerns, LLM personalization often needs to be locally done at the user's mobile device, but such on-device personalization is constrained by both the limitation of on-device compute power and insufficiency of user's personal data. In this paper, we address these constraints by fine-tuning an already personalized LLM with user's personal data, and present XPerT, a new technique that ensure proper selection of such already personalized LLMs based on explainability about how they were being fine-tuned. We implemented and evaluated XPerT on various smartphone models with mainstream LLMs, and experiment results show that XPerT reduces the computation costs of on-device LLM personalization by 83%, and improves its data efficiency by 51%.

Authors:Wanfang Xu, Lixiang Zhao, Haiwen Song, Xinheng Song, Zhaolin Lu, Yu Liu, Min Chen, Eng Gee Lim, Lingyun Yu
Title: Mozualization: Crafting Music and Visual Representation with Multimodal AI
Abstract:
In this work, we introduce Mozualization, a music generation and editing tool that creates multi-style embedded music by integrating diverse inputs, such as keywords, images, and sound clips (e.g., segments from various pieces of music or even a playful cat's meow). Our work is inspired by the ways people express their emotions -- writing mood-descriptive poems or articles, creating drawings with warm or cool tones, or listening to sad or uplifting music. Building on this concept, we developed a tool that transforms these emotional expressions into a cohesive and expressive song, allowing users to seamlessly incorporate their unique preferences and inspirations. To evaluate the tool and, more importantly, gather insights for its improvement, we conducted a user study involving nine music enthusiasts. The study assessed user experience, engagement, and the impact of interacting with and listening to the generated music.

Authors:Karan Taneja, Anjali Singh, Ashok K. Goel
Title: Towards a Multimodal Document-grounded Conversational AI System for Education
Abstract:
Multimedia learning using text and images has been shown to improve learning outcomes compared to text-only instruction. But conversational AI systems in education predominantly rely on text-based interactions while multimodal conversations for multimedia learning remain unexplored. Moreover, deploying conversational AI in learning contexts requires grounding in reliable sources and verifiability to create trust. We present MuDoC, a Multimodal Document-grounded Conversational AI system based on GPT-4o, that leverages both text and visuals from documents to generate responses interleaved with text and images. Its interface allows verification of AI generated content through seamless navigation to the source. We compare MuDoC to a text-only system to explore differences in learner engagement, trust in AI system, and their performance on problem-solving tasks. Our findings indicate that both visuals and verifiability of content enhance learner engagement and foster trust; however, no significant impact in performance was observed. We draw upon theories from cognitive and learning sciences to interpret the findings and derive implications, and outline future directions for the development of multimodal conversational AI systems in education.

Authors:Yawen Guo, Di Hu, Jiayuan Wang, Kai Zheng, Danielle Perret, Deepti Pandita, Steven Tam
Title: Ambient Listening in Clinical Practice: Evaluating EPIC Signal Data Before and After Implementation and Its Impact on Physician Workload
Abstract:
The widespread adoption of EHRs following the HITECH Act has increased the clinician documentation burden, contributing to burnout. Emerging technologies, such as ambient listening tools powered by generative AI, offer real-time, scribe-like documentation capabilities to reduce physician workload. This study evaluates the impact of ambient listening tools implemented at UCI Health by analyzing EPIC Signal data to assess changes in note length and time spent on notes. Results show significant reductions in note-taking time and an increase in note length, particularly during the first-month post-implementation. Findings highlight the potential of AI-powered documentation tools to improve clinical efficiency. Future research should explore adoption barriers, long-term trends, and user experiences to enhance the scalability and sustainability of ambient listening technology in clinical practice.

Authors:Quentin Romero Lauro, Shreya Shankar, Sepanta Zeighami, Aditya Parameswaran
Title: RAG Without the Lag: Interactive Debugging for Retrieval-Augmented Generation Pipelines
Abstract:
Retrieval-augmented generation (RAG) pipelines have become the de-facto approach for building AI assistants with access to external, domain-specific knowledge. Given a user query, RAG pipelines typically first retrieve (R) relevant information from external sources, before invoking a Large Language Model (LLM), augmented (A) with this information, to generate (G) responses. Modern RAG pipelines frequently chain multiple retrieval and generation components, in any order. However, developing effective RAG pipelines is challenging because retrieval and generation components are intertwined, making it hard to identify which component(s) cause errors in the eventual output. The parameters with the greatest impact on output quality often require hours of pre-processing after each change, creating prohibitively slow feedback cycles. To address these challenges, we present RAGGY, a developer tool that combines a Python library of composable RAG primitives with an interactive interface for real-time debugging. We contribute the design and implementation of RAGGY, insights into expert debugging patterns through a qualitative study with 12 engineers, and design implications for future RAG tools that better align with developers' natural workflows.

Authors:Jamie Lee, Kyuha Jung, Erin Gregg Newman, Emilie Chow, Yunan Chen
Title: Understanding Adolescents' Perceptions of Benefits and Risks in Health AI Technologies through Design Fiction
Abstract:
Despite the growing research on users' perceptions of health AI, adolescents' perspectives remain underexplored. This study explores adolescents' perceived benefits and risks of health AI technologies in clinical and personal health settings. Employing Design Fiction, we conducted interviews with 16 adolescents (aged 13-17) using four fictional design scenarios that represent current and future health AI technologies as probes. Our findings reveal that with a positive yet cautious attitude, adolescents envision unique benefits and risks specific to their age group. While health AI technologies were seen as valuable learning resources, they also raised concerns about confidentiality with their parents. Additionally, we identified several factors, such as severity of health conditions and previous experience with AI, influencing their perceptions of trust and privacy in health AI. We explore how these insights can inform the future of design of health AI technologies to support learning, engagement, and trust as adolescents navigate their healthcare journey.

Authors:Soorya Ram Shimgekar, Violeta J. Rodriguez, Paul A. Bloom, Dong Whi Yoo, Koustuv Saha
Title: Interpersonal Theory of Suicide as a Lens to Examine Suicidal Ideation in Online Spaces
Abstract:
Suicide is a critical global public health issue, with millions experiencing suicidal ideation (SI) each year. Online spaces enable individuals to express SI and seek peer support. While prior research has revealed the potential of detecting SI using machine learning and natural language analysis, a key limitation is the lack of a theoretical framework to understand the underlying factors affecting high-risk suicidal intent. To bridge this gap, we adopted the Interpersonal Theory of Suicide (IPTS) as an analytic lens to analyze 59,607 posts from Reddit's r/SuicideWatch, categorizing them into SI dimensions (Loneliness, Lack of Reciprocal Love, Self Hate, and Liability) and risk factors (Thwarted Belongingness, Perceived Burdensomeness, and Acquired Capability of Suicide). We found that high-risk SI posts express planning and attempts, methods and tools, and weaknesses and pain. In addition, we also examined the language of supportive responses through psycholinguistic and content analyses to find that individuals respond differently to different stages of Suicidal Ideation (SI) posts. Finally, we explored the role of AI chatbots in providing effective supportive responses to suicidal ideation posts. We found that although AI improved structural coherence, expert evaluations highlight persistent shortcomings in providing dynamic, personalized, and deeply empathetic support. These findings underscore the need for careful reflection and deeper understanding in both the development and consideration of AI-driven interventions for effective mental health support.

Authors:Ivica Kostric, Krisztian Balog, Ujwal Gadiraju
Title: Should We Tailor the Talk? Understanding the Impact of Conversational Styles on Preference Elicitation in Conversational Recommender Systems
Abstract:
Conversational recommender systems (CRSs) provide users with an interactive means to express preferences and receive real-time personalized recommendations. The success of these systems is heavily influenced by the preference elicitation process. While existing research mainly focuses on what questions to ask during preference elicitation, there is a notable gap in understanding what role broader interaction patterns including tone, pacing, and level of proactiveness play in supporting users in completing a given task. This study investigates the impact of different conversational styles on preference elicitation, task performance, and user satisfaction with CRSs. We conducted a controlled experiment in the context of scientific literature recommendation, contrasting two distinct conversational styles, high involvement (fast paced, direct, and proactive with frequent prompts) and high considerateness (polite and accommodating, prioritizing clarity and user comfort) alongside a flexible experimental condition where users could switch between the two. Our results indicate that adapting conversational strategies based on user expertise and allowing flexibility between styles can enhance both user satisfaction and the effectiveness of recommendations in CRSs. Overall, our findings hold important implications for the design of future CRSs.

Authors:Cynthia Zastudil, David H. Smith, Yusef Tohamy, Rayhona Nasimova, Gavin Montross, Stephen MacNeil
Title: Neurodiversity in Computing Education Research: A Systematic Literature Review
Abstract:
Ensuring equitable access to computing education for all students-including those with autism, dyslexia, or ADHD-is essential to developing a diverse and inclusive workforce. To understand the state of disability research in computing education, we conducted a systematic literature review of research on neurodiversity in computing education. Our search resulted in 1,943 total papers, which we filtered to 14 papers based on our inclusion criteria. Our mixed-methods approach analyzed research methods, participants, contribution types, and findings. The three main contribution types included empirical contributions based on user studies (57.1%), opinion contributions and position papers (50%), and survey contributions (21.4%). Interviews were the most common methodology (75% of empirical contributions). There were often inconsistencies in how research methods were described (e.g., number of participants and interview and survey materials). Our work shows that research on neurodivergence in computing education is still very preliminary. Most papers provided curricular recommendations that lacked empirical evidence to support those recommendations. Three areas of future work include investigating the impacts of active learning, increasing awareness and knowledge about neurodiverse students' experiences, and engaging neurodivergent students in the design of pedagogical materials and computing education research.

Authors:Vincent Freiberger, Arthur Fleig, Erik Buchmann
Title: Explainable AI in Usable Privacy and Security: Challenges and Opportunities
Abstract:
Large Language Models (LLMs) are increasingly being used for automated evaluations and explaining them. However, concerns about explanation quality, consistency, and hallucinations remain open research challenges, particularly in high-stakes contexts like privacy and security, where user trust and decision-making are at stake. In this paper, we investigate these issues in the context of PRISMe, an interactive privacy policy assessment tool that leverages LLMs to evaluate and explain website privacy policies. Based on a prior user study with 22 participants, we identify key concerns regarding LLM judgment transparency, consistency, and faithfulness, as well as variations in user preferences for explanation detail and engagement. We discuss potential strategies to mitigate these concerns, including structured evaluation criteria, uncertainty estimation, and retrieval-augmented generation (RAG). We identify a need for adaptive explanation strategies tailored to different user profiles for LLM-as-a-judge. Our goal is to showcase the application area of usable privacy and security to be promising for Human-Centered Explainable AI (HCXAI) to make an impact.

Authors:Jennifer Haase, Paul H. P. Hanel, Sebastian Pokutta
Title: Has the Creativity of Large-Language Models peaked? An analysis of inter- and intra-LLM variability
Abstract:
Following the widespread adoption of ChatGPT in early 2023, numerous studies reported that large language models (LLMs) can match or even surpass human performance in creative tasks. However, it remains unclear whether LLMs have become more creative over time, and how consistent their creative output is. In this study, we evaluated 14 widely used LLMs -- including GPT-4, Claude, Llama, Grok, Mistral, and DeepSeek -- across two validated creativity assessments: the Divergent Association Task (DAT) and the Alternative Uses Task (AUT). Contrary to expectations, we found no evidence of increased creative performance over the past 18-24 months, with GPT-4 performing worse than in previous studies. For the more widely used AUT, all models performed on average better than the average human, with GPT-4o and o3-mini performing best. However, only 0.28% of LLM-generated responses reached the top 10% of human creativity benchmarks. Beyond inter-model differences, we document substantial intra-model variability: the same LLM, given the same prompt, can produce outputs ranging from below-average to original. This variability has important implications for both creativity research and practical applications. Ignoring such variability risks misjudging the creative potential of LLMs, either inflating or underestimating their capabilities. The choice of prompts affected LLMs differently. Our findings underscore the need for more nuanced evaluation frameworks and highlight the importance of model selection, prompt design, and repeated assessment when using Generative AI (GenAI) tools in creative contexts.

Authors:Mingda Han, Huanqi Yang, Wenhao Li, Weitao Xu, Xiuzhen Cheng, Prasant Mohapatra, Pengfei Hu
Title: RF Sensing Security and Malicious Exploitation: A Comprehensive Survey
Abstract:
Radio Frequency (RF) sensing technologies have experienced significant growth due to the widespread adoption of RF devices and the Internet of Things (IoT). These technologies enable numerous applications across healthcare, smart homes, industrial automation, and human-computer interaction. However, the non-intrusive and ubiquitous nature of RF sensing - combined with its environmental sensitivity and data dependency - makes these systems inherently vulnerable not only as attack targets, but also as powerful attack vectors. This survey presents a comprehensive analysis of RF sensing security, covering both system-level vulnerabilities - such as signal spoofing, adversarial perturbations, and model poisoning - and the misuse of sensing capabilities for attacks like cross-boundary surveillance, side-channel inference, and semantic privacy breaches. We propose unified threat models to structure these attack vectors and further conduct task-specific vulnerability assessments across key RF sensing applications, identifying their unique attack surfaces and risk profiles. In addition, we systematically review defense strategies across system layers and threat-specific scenarios, incorporating both active and passive paradigms to provide a structured and practical view of protection mechanisms. Compared to prior surveys, our work distinguishes itself by offering a multi-dimensional classification framework based on task type, threat vector, and sensing modality, and by providing fine-grained, scenario-driven analysis that bridges theoretical models and real-world implications. This survey aims to serve as a comprehensive reference for researchers and practitioners seeking to understand, evaluate, and secure the evolving landscape of RF sensing technologies.

Authors:Kuang Yuan, Yifeng Wang, Xiyuxing Zhang, Chengyi Shen, Swarun Kumar, Justin Chan
Title: SonicSieve: Bringing Directional Speech Extraction to Smartphones Using Acoustic Microstructures
Abstract:
Imagine placing your smartphone on a table in a noisy restaurant and clearly capturing the voices of friends seated around you, or recording a lecturer's voice with clarity in a reverberant auditorium. We introduce SonicSieve, the first intelligent directional speech extraction system for smartphones using a bio-inspired acoustic microstructure. Our passive design embeds directional cues onto incoming speech without any additional electronics. It attaches to the in-line mic of low-cost wired earphones which can be attached to smartphones. We present an end-to-end neural network that processes the raw audio mixtures in real-time on mobile devices. Our results show that SonicSieve achieves a signal quality improvement of 5.0 dB when focusing on a 30° angular region. Additionally, the performance of our system based on only two microphones exceeds that of conventional 5-microphone arrays.

Authors:Indu Panigrahi, Sunnie S. Y. Kim, Amna Liaqat, Rohan Jinturkar, Olga Russakovsky, Ruth Fong, Parastoo Abtahi
Title: Interactivity x Explainability: Toward Understanding How Interactivity Can Improve Computer Vision Explanations
Abstract:
Explanations for computer vision models are important tools for interpreting how the underlying models work. However, they are often presented in static formats, which pose challenges for users, including information overload, a gap between semantic and pixel-level information, and limited opportunities for exploration. We investigate interactivity as a mechanism for tackling these issues in three common explanation types: heatmap-based, concept-based, and prototype-based explanations. We conducted a study (N=24), using a bird identification task, involving participants with diverse technical and domain expertise. We found that while interactivity enhances user control, facilitates rapid convergence to relevant information, and allows users to expand their understanding of the model and explanation, it also introduces new challenges. To address these, we provide design recommendations for interactive computer vision explanations, including carefully selected default views, independent input controls, and constrained output spaces.

Authors:Roberta Mota, Ehud Sharlin, Usman Alim
Title: Designing Reality-Based VR Interfaces for Geological Uncertainty
Abstract:
Inherent uncertainty in geological data acquisition leads to the generation of large ensembles of equiprobable 3D reservoir models. Running computationally costly numerical flow simulations across such a vast solution space is infeasible. A more suitable approach is to carefully select a small number of geological models that reasonably capture the overall variability of the ensemble. Identifying these representative models is a critical task that enables the oil and gas industry to generate cost-effective production forecasts. Our work leverages virtual reality (VR) to provide engineers with a system for conducting geological uncertainty analysis, enabling them to perform inherently spatial tasks using an associative 3D interaction space. We present our VR system through the lens of the reality-based interaction paradigm, designing 3D interfaces that enable familiar physical interactions inspired by real-world analogies-such as gesture-based operations and view-dependent lenses. We also report an evaluation conducted with 12 reservoir engineers from an industry partner. Our findings offer insights into the benefits, pitfalls, and opportunities for refining our system design. We catalog our results into a set of design recommendations intended to guide researchers and developers of immersive interfaces-in reservoir engineering and broader application domains.

Authors:Avinash Agarwal, Mayashankar Kumar, Manisha J. Nene
Title: Enhancements for Developing a Comprehensive AI Fairness Assessment Standard
Abstract:
As AI systems increasingly influence critical sectors like telecommunications, finance, healthcare, and public services, ensuring fairness in decision-making is essential to prevent biased or unjust outcomes that disproportionately affect vulnerable entities or result in adverse impacts. This need is particularly pressing as the industry approaches the 6G era, where AI will drive complex functions like autonomous network management and hyper-personalized services. The TEC Standard for Fairness Assessment and Rating of AI Systems provides guidelines for evaluating fairness in AI, focusing primarily on tabular data and supervised learning models. However, as AI applications diversify, this standard requires enhancement to strengthen its impact and broaden its applicability. This paper proposes an expansion of the TEC Standard to include fairness assessments for images, unstructured text, and generative AI, including large language models, ensuring a more comprehensive approach that keeps pace with evolving AI technologies. By incorporating these dimensions, the enhanced framework will promote responsible and trustworthy AI deployment across various sectors.

Authors:Till Aust, Julian Kaduk, Heiko Hamann
Title: Classifying Subjective Time Perception in a Multi-robot Control Scenario Using Eye-tracking Information
Abstract:
As automation and mobile robotics reshape work environments, rising expectations for productivity increase cognitive demands on human operators, leading to potential stress and cognitive overload. Accurately assessing an operator's mental state is critical for maintaining performance and well-being. We use subjective time perception, which can be altered by stress and cognitive load, as a sensitive, low-latency indicator of well-being and cognitive strain. Distortions in time perception can affect decision-making, reaction times, and overall task effectiveness, making it a valuable metric for adaptive human-swarm interaction systems. We study how human physiological signals can be used to estimate a person's subjective time perception in a human-swarm interaction scenario as example. A human operator needs to guide and control a swarm of small mobile robots. We obtain eye-tracking data that is classified for subjective time perception based on questionnaire data. Our results show that we successfully estimate a person's time perception from eye-tracking data. The approach can profit from individual-based pretraining using only 30 seconds of data. In future work, we aim for robots that respond to human operator needs by automatically classifying physiological data in a closed control loop.

Authors:Stephen Brade, Sam Anderson, Rithesh Kumar, Zeyu Jin, Anh Truong
Title: SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation
Abstract:
Novice content creators often invest significant time recording expressive speech for social media videos. While recent advancements in text-to-speech (TTS) technology can generate highly realistic speech in various languages and accents, many struggle with unintuitive or overly granular TTS interfaces. We propose simplifying TTS generation by allowing users to specify high-level context alongside their script. Our Wizard-of-Oz system, SpeakEasy, leverages user-provided context to inform and influence TTS output, enabling iterative refinement with high-level feedback. This approach was informed by two 8-subject formative studies: one examining content creators' experiences with TTS, and the other drawing on effective strategies from voice actors. Our evaluation shows that participants using SpeakEasy were more successful in generating performances matching their personal standards, without requiring significantly more effort than leading industry interfaces.

Authors:Andrea Esposito, Miriana Calvano, Antonio Curci, Francesco Greco, Rosa Lanzilotti, Antonio Piccinno
Title: Explanation-Driven Interventions for Artificial Intelligence Model Customization: Empowering End-Users to Tailor Black-Box AI in Rhinocytology
Abstract:
The integration of Artificial Intelligence (AI) in modern society is transforming how individuals perform tasks. In high-risk domains, ensuring human control over AI systems remains a key design challenge. This article presents a novel End-User Development (EUD) approach for black-box AI models, enabling users to edit explanations and influence future predictions through targeted interventions. By combining explainability, user control, and model adaptability, the proposed method advances Human-Centered AI (HCAI), promoting a symbiotic relationship between humans and adaptive, user-tailored AI systems.

Authors:Jacob Belga, Richard Skarbez, Yahya Hmaiti, Eric J. Chen, Ryan P. McMahan, Joseph J. LaViola
Title: The Fidelity-based Presence Scale (FPS): Modeling the Effects of Fidelity on Sense of Presence
Abstract:
Within the virtual reality (VR) research community, there have been several efforts to develop questionnaires with the aim of better understanding the sense of presence. Despite having numerous surveys, the community does not have a questionnaire that informs which components of a VR application contributed to the sense of presence. Furthermore, previous literature notes the absence of consensus on which questionnaire or questions should be used. Therefore, we conducted a Delphi study, engaging presence experts to establish a consensus on the most important presence questions and their respective verbiage. We then conducted a validation study with an exploratory factor analysis (EFA). The efforts between our two studies led to the creation of the Fidelity-based Presence Scale (FPS). With our consensus-driven approach and fidelity-based factoring, we hope the FPS will enable better communication within the research community and yield important future results regarding the relationship between VR system fidelity and presence.

Authors:Siddharth Srikanth, Varun Bhatt, Boshen Zhang, Werner Hager, Charles Michael Lewis, Katia P. Sycara, Aaquib Tabrez, Stefanos Nikolaidis
Title: Algorithmic Prompt Generation for Diverse Human-like Teaming and Communication with Large Language Models
Abstract:
Understanding how humans collaborate and communicate in teams is essential for improving human-agent teaming and AI-assisted decision-making. However, relying solely on data from large-scale user studies is impractical due to logistical, ethical, and practical constraints, necessitating synthetic models of multiple diverse human behaviors. Recently, agents powered by Large Language Models (LLMs) have been shown to emulate human-like behavior in social settings. But, obtaining a large set of diverse behaviors requires manual effort in the form of designing prompts. On the other hand, Quality Diversity (QD) optimization has been shown to be capable of generating diverse Reinforcement Learning (RL) agent behavior. In this work, we combine QD optimization with LLM-powered agents to iteratively search for prompts that generate diverse team behavior in a long-horizon, multi-step collaborative environment. We first show, through a human-subjects experiment (n=54 participants), that humans exhibit diverse coordination and communication behavior in this domain. We then show that our approach can effectively replicate trends from human teaming data and also capture behaviors that are not easily observed without collecting large amounts of data. Our findings highlight the combination of QD and LLM-powered agents as an effective tool for studying teaming and communication strategies in multi-agent collaboration.

Authors:Markus Langer, Veronika Lazar, Kevin Baum
Title: On the Complexities of Testing for Compliance with Human Oversight Requirements in AI Regulation
Abstract:
Human oversight requirements are a core component of the European AI Act and in AI governance. In this paper, we highlight key challenges in testing for compliance with these requirements. A central difficulty lies in balancing simple, but potentially ineffective checklist-based approaches with resource-intensive and context-sensitive empirical testing of the effectiveness of human oversight of AI. Questions regarding when to update compliance testing, the context-dependent nature of human oversight requirements, and difficult-to-operationalize standards further complicate compliance testing. We argue that these challenges illustrate broader challenges in the future of sociotechnical AI governance, i.e. a future that shifts from ensuring good technological products to good sociotechnical systems.

Authors:John Chen, Alexandros Lotsos, Grace Wang, Lexie Zhao, Bruce Sherin, Uri Wilensky, Michael Horn
Title: Processes Matter: How ML/GAI Approaches Could Support Open Qualitative Coding of Online Discourse Datasets
Abstract:
Open coding, a key inductive step in qualitative research, discovers and constructs concepts from human datasets. However, capturing extensive and nuanced aspects or "coding moments" can be challenging, especially with large discourse datasets. While some studies explore machine learning (ML)/Generative AI (GAI)'s potential for open coding, few evaluation studies exist. We compare open coding results by five recently published ML/GAI approaches and four human coders, using a dataset of online chat messages around a mobile learning software. Our systematic analysis reveals ML/GAI approaches' strengths and weaknesses, uncovering the complementary potential between humans and AI. Line-by-line AI approaches effectively identify content-based codes, while humans excel in interpreting conversational dynamics. We discussed how embedded analytical processes could shape the results of ML/GAI approaches. Instead of replacing humans in open coding, researchers should integrate AI with and according to their analytical processes, e.g., as parallel co-coders.

Authors:Petr Vanc, Karla Stepanova
Title: TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication
Abstract:
As human-robot collaboration advances, natural and flexible communication methods are essential for effective robot control. Traditional methods relying on a single modality or rigid rules struggle with noisy or misaligned data as well as with object descriptions that do not perfectly fit the predefined object names (e.g. 'Pick that red object'). We introduce TransforMerger, a transformer-based reasoning model that infers a structured action command for robotic manipulation based on fused voice and gesture inputs. Our approach merges multimodal data into a single unified sentence, which is then processed by the language model. We employ probabilistic embeddings to handle uncertainty and we integrate contextual scene understanding to resolve ambiguous references (e.g., gestures pointing to multiple objects or vague verbal cues like "this"). We evaluate TransforMerger in simulated and real-world experiments, demonstrating its robustness to noise, misalignment, and missing information. Our results show that TransforMerger outperforms deterministic baselines, especially in scenarios requiring more contextual knowledge, enabling more robust and flexible human-robot communication. Code and datasets are available at: http://imitrob.ciirc.cvut.cz/publications/transformerger.

Authors:Hamed Rahimi, Jeanne Cattoni, Meriem Beghili, Mouad Abrini, Mahdi Khoramshahi, Maribel Pino, Mohamed Chetouani
Title: Reasoning LLMs for User-Aware Multimodal Conversational Agents
Abstract:
Personalization in social robotics is critical for fostering effective human-robot interactions, yet systems often face the cold start problem, where initial user preferences or characteristics are unavailable. This paper proposes a novel framework called USER-LLM R1 for a user-aware conversational agent that addresses this challenge through dynamic user profiling and model initiation. Our approach integrates chain-of-thought (CoT) reasoning models to iteratively infer user preferences and vision-language models (VLMs) to initialize user profiles from multimodal inputs, enabling personalized interactions from the first encounter. Leveraging a Retrieval-Augmented Generation (RAG) architecture, the system dynamically refines user representations within an inherent CoT process, ensuring contextually relevant and adaptive responses. Evaluations on the ElderlyTech-VQA Bench demonstrate significant improvements in ROUGE-1 (+23.2%), ROUGE-2 (+0.6%), and ROUGE-L (+8%) F1 scores over state-of-the-art baselines, with ablation studies underscoring the impact of reasoning model size on performance. Human evaluations further validate the framework's efficacy, particularly for elderly users, where tailored responses enhance engagement and trust. Ethical considerations, including privacy preservation and bias mitigation, are rigorously discussed and addressed to ensure responsible deployment.

Authors:Hanxi Fang, Supawit Chockchowwat, Hari Sundaram, Yongjoo Park
Title: Large-scale Evaluation of Notebook Checkpointing with AI Agents
Abstract:
Saving, or checkpointing, intermediate results during interactive data exploration can potentially boost user productivity. However, existing studies on this topic are limited, as they primarily rely on small-scale experiments with human participants - a fundamental constraint of human subject studies. To address this limitation, we employ AI agents to simulate a large number of complex data exploration scenarios, including revisiting past states and branching into new exploration paths. This strategy enables us to accurately assess the impact of checkpointing while closely mimicking the behavior of real-world data practitioners. Our evaluation results, involving more than 1,000 exploration paths and 2,848 executed code blocks, show that a checkpointing framework for computational notebooks can indeed enhance productivity by minimizing unnecessary code re-executions and redundant variables or code.

Authors:Hanxi Fang, Supawit Chockchowwat, Hari Sundaram, Yongjoo Park
Title: Enhancing Computational Notebooks with Code+Data Space Versioning
Abstract:
There is a gap between how people explore data and how Jupyter-like computational notebooks are designed. People explore data nonlinearly, using execution undos, branching, and/or complete reverts, whereas notebooks are designed for sequential exploration. Recent works like ForkIt are still insufficient to support these multiple modes of nonlinear exploration in a unified way. In this work, we address the challenge by introducing two-dimensional code+data space versioning for computational notebooks and verifying its effectiveness using our prototype system, Kishuboard, which integrates with Jupyter. By adjusting code and data knobs, users of Kishuboard can intuitively manage the state of computational notebooks in a flexible way, thereby achieving both execution rollbacks and checkouts across complex multi-branch exploration history. Moreover, this two-dimensional versioning mechanism can easily be presented along with a friendly one-dimensional history. Human subject studies indicate that Kishuboard significantly enhances user productivity in various data science tasks.

Authors:Muntasir Hoq, Jessica Vandenberg, Shuyin Jiao, Seung Lee, Bradford Mott, Narges Norouzi, James Lester, Bita Akram
Title: Facilitating Instructors-LLM Collaboration for Problem Design in Introductory Programming Classrooms
Abstract:
Advancements in Large Language Models (LLMs), such as ChatGPT, offer significant opportunities to enhance instructional support in introductory programming courses. While extensive research has explored the effectiveness of LLMs in supporting student learning, limited studies have examined how these models can assist instructors in designing instructional activities. This work investigates how instructors' expertise in effective activity design can be integrated with LLMs' ability to generate novel and targeted programming problems, facilitating more effective activity creation for programming classrooms. To achieve this, we employ a participatory design approach to develop an instructor-authoring tool that incorporates LLM support, fostering collaboration between instructors and AI in generating programming exercises. This tool also allows instructors to specify common student mistakes and misconceptions, which informs the adaptive feedback generation process. We conduct case studies with three instructors, analyzing how they use our system to design programming problems for their introductory courses. Through these case studies, we assess instructors' perceptions of the usefulness and limitations of LLMs in authoring problem statements for instructional purposes. Additionally, we compare the efficiency, quality, effectiveness, and coverage of designed activities when instructors create problems with and without structured LLM prompting guidelines. Our findings provide insights into the potential of LLMs in enhancing instructor workflows and improving programming education and provide guidelines for designing effective AI-assisted problem-authoring interfaces.

Authors:Nicholas Clark, Hua Shen, Bill Howe, Tanushree Mitra
Title: Epistemic Alignment: A Mediating Framework for User-LLM Knowledge Delivery
Abstract:
LLMs increasingly serve as tools for knowledge acquisition, yet users cannot effectively specify how they want information presented. When users request that LLMs "cite reputable sources," "express appropriate uncertainty," or "include multiple perspectives," they discover that current interfaces provide no structured way to articulate these preferences. The result is prompt sharing folklore: community-specific copied prompts passed through trust relationships rather than based on measured efficacy. We propose the Epistemic Alignment Framework, a set of ten challenges in knowledge transmission derived from the philosophical literature of epistemology, concerning issues such as evidence quality assessment and calibration of testimonial reliance. The framework serves as a structured intermediary between user needs and system capabilities, creating a common vocabulary to bridge the gap between what users want and what systems deliver. Through a thematic analysis of custom prompts and personalization strategies shared on online communities where these issues are actively discussed, we find users develop elaborate workarounds to address each of the challenges. We then apply our framework to two prominent model providers, OpenAI and Anthropic, through content analysis of their documented policies and product features. Our analysis shows that while these providers have partially addressed the challenges we identified, they fail to establish adequate mechanisms for specifying epistemic preferences, lack transparency about how preferences are implemented, and offer no verification tools to confirm whether preferences were followed. For AI developers, the Epistemic Alignment Framework offers concrete guidance for supporting diverse approaches to knowledge; for users, it works toward information delivery that aligns with their specific needs rather than defaulting to one-size-fits-all approaches.

Authors:Mahjabin Nahar, Eun-Ju Lee, Jin Won Park, Dongwon Lee
Title: Catch Me if You Search: When Contextual Web Search Results Affect the Detection of Hallucinations
Abstract:
While we increasingly rely on large language models (LLMs) for various tasks, these models are known to produce inaccurate content or 'hallucinations' with potentially disastrous consequences. The recent integration of web search results into LLMs prompts the question of whether people utilize them to verify the generated content, thereby accurately detecting hallucinations. An online experiment (N=560) investigated how the provision of search results, either static (i.e., fixed search results provided by LLM) or dynamic (i.e., participant-led searches), affects participants' perceived accuracy of LLM-generated content (i.e., genuine, minor hallucination, major hallucination), self-confidence in accuracy ratings, as well as their overall evaluation of the LLM, as compared to the control condition (i.e., no search results). Results showed that participants in both static and dynamic conditions (vs. control) rated hallucinated content to be less accurate and perceived the LLM more negatively. However, those in the dynamic condition rated genuine content as more accurate and demonstrated greater overall self-confidence in their assessments than those in the static search or control conditions. We highlighted practical implications of incorporating web search functionality into LLMs in real-world contexts.

Authors:Fanjun Bu, Kerstin Fischer, Wendy Ju
Title: Making Sense of Robots in Public Spaces: A Study of Trash Barrel Robots
Abstract:
In this work, we analyze video data and interviews from a public deployment of two trash barrel robots in a large public space to better understand the sensemaking activities people perform when they encounter robots in public spaces. Based on an analysis of 274 human-robot interactions and interviews with N=65 individuals or groups, we discovered that people were responding not only to the robots or their behavior, but also to the general idea of deploying robots as trashcans, and the larger social implications of that idea. They wanted to understand details about the deployment because having that knowledge would change how they interact with the robot. Based on our data and analysis, we have provided implications for design that may be topics for future human-robot design researchers who are exploring robots for public space deployment. Furthermore, our work offers a practical example of analyzing field data to make sense of robots in public spaces.

Authors:Wazeer Zulfikar, Treyden Chiaravalloti, Jocelyn Shen, Rosalind Picard, Pattie Maes
Title: Resonance: Drawing from Memories to Imagine Positive Futures through AI-Augmented Journaling
Abstract:
People inherently use experiences of their past while imagining their future, a capability that plays a crucial role in mental health. Resonance is an AI-powered journaling tool designed to augment this ability by offering AI-generated, action-oriented suggestions for future activities based on the user's own past memories. Suggestions are offered when a new memory is logged and are followed by a prompt for the user to imagine carrying out the suggestion. In a two-week randomized controlled study (N=55), we found that using Resonance significantly improved mental health outcomes, reducing the users' PHQ8 scores, a measure of current depression, and increasing their daily positive affect, particularly when they would likely act on the suggestion. Notably, the effectiveness of the suggestions was higher when they were personal, novel, and referenced the user's logged memories. Finally, through open-ended feedback, we discuss the factors that encouraged or hindered the use of the tool.

Authors:Cameron R. Jones, Benjamin K. Bergen
Title: Large Language Models Pass the Turing Test
Abstract:
We evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5) in two randomised, controlled, and pre-registered Turing tests on independent populations. Participants had 5 minute conversations simultaneously with another human participant and one of these systems before judging which conversational partner they thought was human. When prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73% of the time: significantly more often than interrogators selected the real human participant. LLaMa-3.1, with the same prompt, was judged to be the human 56% of the time -- not significantly more or less often than the humans they were being compared to -- while baseline models (ELIZA and GPT-4o) achieved win rates significantly below chance (23% and 21% respectively). The results constitute the first empirical evidence that any artificial system passes a standard three-party Turing test. The results have implications for debates about what kind of intelligence is exhibited by Large Language Models (LLMs), and the social and economic impacts these systems are likely to have.

Authors:Kanako Esaki, Tadayuki Matsumura, Yang Shao, Hiroyuki Mizuno
Title: e-person Architecture and Framework for Human-AI Co-adventure Relationship
Abstract:
This paper proposes the e-person architecture for constructing a unified and incremental development of AI ethics. The e-person architecture takes the reduction of uncertainty through collaborative cognition and action with others as a unified basis for ethics. By classifying and defining uncertainty along two axes - (1) first, second, and third person perspectives, and (2) the difficulty of inference based on the depth of information - we support the development of unified and incremental development of AI ethics. In addition, we propose the e-person framework based on the free energy principle, which considers the reduction of uncertainty as a unifying principle of brain function, with the aim of implementing the e-person architecture, and we show our previous works and future challenges based on the proposed framework.

Authors:Berken Utku Demirel, Adnan Harun Dogan, Juliete Rossie, Max Moebus, Christian Holz
Title: Beyond Subjectivity: Continuous Cybersickness Detection Using EEG-based Multitaper Spectrum Estimation
Abstract:
Virtual reality (VR) presents immersive opportunities across many applications, yet the inherent risk of developing cybersickness during interaction can severely reduce enjoyment and platform adoption. Cybersickness is marked by symptoms such as dizziness and nausea, which previous work primarily assessed via subjective post-immersion questionnaires and motion-restricted controlled setups. In this paper, we investigate the \emph{dynamic nature} of cybersickness while users experience and freely interact in VR. We propose a novel method to \emph{continuously} identify and quantitatively gauge cybersickness levels from users' \emph{passively monitored} electroencephalography (EEG) and head motion signals. Our method estimates multitaper spectrums from EEG, integrating specialized EEG processing techniques to counter motion artifacts, and, thus, tracks cybersickness levels in real-time. Unlike previous approaches, our method requires no user-specific calibration or personalization for detecting cybersickness. Our work addresses the considerable challenge of reproducibility and subjectivity in cybersickness research.

Authors:Yueye Wang, Wenyi Hu, Keyao Zhou, Chi Liu, Jian Zhang, Zhuoting Zhu, Sanil Joseph, Qiuxia Yin, Lixia Luo, Xiaotong Han, Mingguang He, Lei Zhang
Title: What is the role of human decisions in a world of artificial intelligence: an economic evaluation of human-AI collaboration in diabetic retinopathy screening
Abstract:
As Artificial intelligence (AI) has been increasingly integrated into the medical field, the role of humans may become vague. While numerous studies highlight AI's potential, how humans and AI collaborate to maximize the combined clinical benefits remains unexplored. In this work, we analyze 270 screening scenarios from a health-economic perspective in a national diabetic retinopathy screening program, involving eight human-AI collaborative strategies and traditional manual screening. We find that annual copilot human-AI screening in the 20-79 age group, with referral decisions made when both humans and AI agree, is the most cost-effective strategy for human-AI collaboration. The 'copilot' strategy brings health benefits equivalent to USD 4.64 million per 100,000 population compared to manual screening. These findings demonstrate that even in settings where AI is highly mature and efficient, human involvement remains essential to ensuring both health and economic benefits. Our findings highlight the need to optimize human-AI collaboration strategies for AI implementation into healthcare systems.

Authors:Ko Watanabe, Yuki Matsuda, Yugo Nakamura, Yutaka Arakawa, Shoya Ishimaru
Title: TrackThinkDashboard: Understanding Student Self-Regulated Learning in Programming Study
Abstract:
In programming education, fostering self-regulated learning (SRL) skills is essential for both students and teachers. This paper introduces TrackThinkDashboard, an application designed to visualize the learning workflow by integrating web browsing and programming logs into one unified view. The system aims to (1) help students monitor and reflect on their problem-solving processes, identify knowledge gaps, and cultivate effective SRL strategies; and (2) enable teachers to identify at-risk learners more effectively and provide targeted, data-driven guidance. We conducted a study with 33 participants (32 male, 1 female) from Japanese universities, including individuals with and without prior programming experience, to explore differences in web browsing and coding patterns. The dashboards revealed multiple learning approaches, such as trial-and-error and trial-and-search methods, and highlighted how domain knowledge influenced the overall activity flow. We discuss how this visualization tool can be used continuously or in one-off experiments, consider associated privacy implications, and explore opportunities for expanding data sources to gain richer behavioral insights.

Authors:Rida Qadri, Mark Diaz, Ding Wang, Michael Madaio
Title: The Case for "Thick Evaluations" of Cultural Representation in AI
Abstract:
Generative AI image models have been increasingly evaluated for their (in)ability to represent non-Western cultures. We argue that these evaluations operate through reductive ideals of representation, abstracted from how people define their own representation and neglecting the inherently interpretive and contextual nature of cultural representation. In contrast to these 'thin' evaluations, we introduce the idea of 'thick evaluations': a more granular, situated, and discursive measurement framework for evaluating representations of social worlds in AI images, steeped in communities' own understandings of representation. We develop this evaluation framework through workshops in South Asia, by studying the 'thick' ways in which people interpret and assign meaning to images of their own cultures. We introduce practices for thicker evaluations of representation that expand the understanding of representation underpinning AI evaluations and by co-constructing metrics with communities, bringing measurement in line with the experiences of communities on the ground.

Authors:Ananya Ipsita, Ramesh Kaki, Ziyi Liu, Mayank Patel, Runlin Duan, Lakshmi Deshpande, Lin-Ping Yuan, Victoria Lowell, Ashok Maharaj, Kylie Peppler, Steven Feiner, Karthik Ramani
Title: Virtual Reality in Manufacturing Education: A Scoping Review Indicating State-of-the-Art, Benefits, and Challenges Across Domains, Levels, and Entities
Abstract:
To address the shortage of a skilled workforce in the U.S. manufacturing industry, immersive Virtual Reality (VR)-based training solutions hold promising potential. To effectively utilize VR to meet workforce demands, it is important to understand the role of VR in manufacturing education. Therefore, we conduct a scoping review in the field. As a first step, we used a 5W1H (What, Where, Who, When, Why, How) formula as a problem-solving approach to define a comprehensive taxonomy that can consider the role of VR from all relevant possibilities. Our taxonomy categorizes VR applications across three key aspects: (1) Domains, (2) Levels, and (3) Entities. Using a systematic literature search and analysis, we reviewed 108 research articles to find the current state, benefits, challenges, and future opportunities of VR in the field. It was found that VR has been explored in a variety of areas and provides numerous benefits to learners. Despite these benefits, its adoption in manufacturing education is limited. This review discusses the identified barriers and provides actionable insights to address them. These insights can enable the widespread usage of immersive technology to nurture and develop a workforce equipped with the skills required to excel in the evolving landscape of manufacturing.

Authors:Buse Carik, Victoria Izaac, Xiaohan Ding, Angela Scarpa, Eugenia Rho
Title: Reimagining Support: Exploring Autistic Individuals' Visions for AI in Coping with Negative Self-Talk
Abstract:
Autistic individuals often experience negative self-talk (NST), leading to increased anxiety and depression. While therapy is recommended, it presents challenges for many autistic individuals. Meanwhile, a growing number are turning to large language models (LLMs) for mental health support. To understand how autistic individuals perceive AI's role in coping with NST, we surveyed 200 autistic adults and interviewed practitioners. We also analyzed LLM responses to participants' hypothetical prompts about their NST. Our findings show that participants view LLMs as useful for managing NST by identifying and reframing negative thoughts. Both participants and practitioners recognize AI's potential to support therapy and emotional expression. Participants also expressed concerns about LLMs' understanding of neurodivergent thought patterns, particularly due to the neurotypical bias of LLMs. Practitioners critiqued LLMs' responses as overly wordy, vague, and overwhelming. This study contributes to the growing research on AI-assisted mental health support, with specific insights for supporting the autistic community.

Authors:Steve Benford, Eike Schneiders, Juan Pablo Martinez Avila, Praminda Caleb-Solly, Patrick Robert Brundell, Simon Castle-Green, Feng Zhou, Rachael Garrett, Kristina Höök, Sarah Whatley, Kate Marsh, Paul Tennent
Title: Somatic Safety: An Embodied Approach Towards Safe Human-Robot Interaction
Abstract:
As robots enter the messy human world so the vital matter of safety takes on a fresh complexion with physical contact becoming inevitable and even desirable. We report on an artistic-exploration of how dancers, working as part of a multidisciplinary team, engaged in contact improvisation exercises to explore the opportunities and challenges of dancing with cobots. We reveal how they employed their honed bodily senses and physical skills to engage with the robots aesthetically and yet safely, interleaving improvised physical manipulations with reflections to grow their knowledge of how the robots behaved and felt. We introduce somatic safety, a holistic mind-body approach in which safety is learned, felt and enacted through bodily contact with robots in addition to being reasoned about. We conclude that robots need to be better designed for people to hold them and might recognise tacit safety cues among people.We propose that safety should be learned through iterative bodily experience interleaved with reflection.

Authors:Yash Vekaria, Aurelio Loris Canino, Jonathan Levitsky, Alex Ciechonski, Patricia Callejo, Anna Maria Mandalari, Zubair Shafiq
Title: Big Help or Big Brother? Auditing Tracking, Profiling, and Personalization in Generative AI Assistants
Abstract:
Generative AI (GenAI) browser assistants integrate powerful capabilities of GenAI in web browsers to provide rich experiences such as question answering, content summarization, and agentic navigation. These assistants, available today as browser extensions, can not only track detailed browsing activity such as search and click data, but can also autonomously perform tasks such as filling forms, raising significant privacy concerns. It is crucial to understand the design and operation of GenAI browser extensions, including how they collect, store, process, and share user data. To this end, we study their ability to profile users and personalize their responses based on explicit or inferred demographic attributes and interests of users. We perform network traffic analysis and use a novel prompting framework to audit tracking, profiling, and personalization by the ten most popular GenAI browser assistant extensions. We find that instead of relying on local in-browser models, these assistants largely depend on server-side APIs, which can be auto-invoked without explicit user interaction. When invoked, they collect and share webpage content, often the full HTML DOM and sometimes even the user's form inputs, with their first-party servers. Some assistants also share identifiers and user prompts with third-party trackers such as Google Analytics. The collection and sharing continues even if a webpage contains sensitive information such as health or personal information such as name or SSN entered in a web form. We find that several GenAI browser assistants infer demographic attributes such as age, gender, income, and interests and use this profile--which carries across browsing contexts--to personalize responses. In summary, our work shows that GenAI browser assistants can and do collect personal and sensitive information for profiling and personalization with little to no safeguards.

Authors:Elisabeth Menendez, Michael Gienger, Santiago Martínez, Carlos Balaguer, Anna Belardinelli
Title: SemanticScanpath: Combining Gaze and Speech for Situated Human-Robot Interaction Using LLMs
Abstract:
Large Language Models (LLMs) have substantially improved the conversational capabilities of social robots. Nevertheless, for an intuitive and fluent human-robot interaction, robots should be able to ground the conversation by relating ambiguous or underspecified spoken utterances to the current physical situation and to the intents expressed non verbally by the user, for example by using referential gaze. Here we propose a representation integrating speech and gaze to enable LLMs to obtain higher situated awareness and correctly resolve ambiguous requests. Our approach relies on a text-based semantic translation of the scanpath produced by the user along with the verbal requests and demonstrates LLM's capabilities to reason about gaze behavior, robustly ignoring spurious glances or irrelevant objects. We validate the system across multiple tasks and two scenarios, showing its generality and accuracy, and demonstrate its implementation on a robotic platform, closing the loop from request interpretation to execution.

Authors:Dimitris Tsirmpas, Ion Androutsopoulos, John Pavlopoulos
Title: Scalable Evaluation of Online Facilitation Strategies via Synthetic Simulation of Discussions
Abstract:
Limited large-scale evaluations exist for facilitation strategies of online discussions due to significant costs associated with human involvement. An effective solution is synthetic discussion simulations using Large Language Models (LLMs) to create initial pilot experiments. We propose design principles based on existing methodologies for synthetic discussion generation. Based on these principles, we propose a simple, generalizable, LLM-driven methodology to prototype the development of LLM facilitators by generating synthetic data without human involvement, and which surpasses current baselines. We use our methodology to test whether current Social Science strategies for facilitation can improve the performance of LLM facilitators. We find that, while LLM facilitators significantly improve synthetic discussions, there is no evidence that the application of these strategies leads to further improvements in discussion quality. In an effort to aid research in the field of facilitation, we release a large, publicly available dataset containing LLM-generated and LLM-annotated discussions using multiple open-source models. This dataset can be used for LLM facilitator finetuning as well as behavioral analysis of current out-of-the-box LLMs in the task. We also release an open-source python framework that efficiently implements our methodology at great scale.

Authors:David Porfirio, Mark Roberts, Laura M. Hiatt
Title: Uncertainty Expression for Human-Robot Task Communication
Abstract:
An underlying assumption of many existing approaches to human-robot task communication is that the robot possesses a sufficient amount of environmental domain knowledge, including the locations of task-critical objects. This assumption is unrealistic if the locations of known objects change or have not yet been discovered by the robot. In this work, our key insight is that in many scenarios, robot end users possess more scene insight than the robot and need ways to express it. Presently, there is a lack of research on how solutions for collecting end-user scene insight should be designed. We thereby created an Uncertainty Expression System (UES) to investigate how best to elicit end-user scene insight. The UES allows end users to convey their knowledge of object uncertainty using either: (1) a precision interface that allows meticulous expression of scene insight; (2) a painting interface by which users create a heat map of possible object locations; and (3) a ranking interface by which end users express object locations via an ordered list. We then conducted a user study to compare the effectiveness of these approaches based on the accuracy of scene insight conveyed to the robot, the efficiency at which end users are able to express this scene insight, and both usability and task load. Results indicate that the rank interface is more user friendly and efficient than the precision interface, and that the paint interface is the least accurate.

Authors:Milin Patel, Rolf Jung, Yasin Cakir
Title: Simulation-based Testing of Foreseeable Misuse by the Driver applicable for Highly Automated Driving
Abstract:
With Highly Automated Driving (HAD), the driver can engage in non-driving-related tasks. In the event of a system failure, the driver is expected to reasonably regain control of the Automated Vehicle (AV). Incorrect system understanding may provoke misuse by the driver and can lead to vehicle-level hazards. ISO 21448, referred to as the standard for Safety of the Intended Functionality (SOTIF), defines misuse as usage of the system by the driver in a way not intended by the system manufacturer. Foreseeable Misuse (FM) implies anticipated system misuse based on the best knowledge about the system design and the driver behaviour. This is the underlying motivation to propose simulation-based testing of FM. The vital challenge is to perform a simulation-based testing for a SOTIF-related misuse scenario. Transverse Guidance Assist System (TGAS) is modelled for HAD. In the context of this publication, TGAS is referred to as the "system," and the driver is the human operator of the system. This publication focuses on implementing the Driver-Vehicle Interface (DVI) that permits the interactions between the driver and the system. The implementation and testing of a derived misuse scenario using the driving simulator ensure reasonable usage of the system by supporting the driver with unambiguous information on system functions and states so that the driver can conveniently perceive, comprehend, and act upon the information.

Authors:Wenhui Tan, Boyuan Li, Chuhao Jin, Wenbing Huang, Xiting Wang, Ruihua Song
Title: Think-Then-React: Towards Unconstrained Human Action-to-Reaction Generation
Abstract:
Modeling human-like action-to-reaction generation has significant real-world applications, like human-robot interaction and games. Despite recent advancements in single-person motion generation, it is still challenging to well handle action-to-reaction generation, due to the difficulty of directly predicting reaction from action sequence without prompts, and the absence of a unified representation that effectively encodes multi-person motion. To address these challenges, we introduce Think-Then-React (TTR), a large language-model-based framework designed to generate human-like reactions. First, with our fine-grained multimodal training strategy, TTR is capable to unify two processes during inference: a thinking process that explicitly infers action intentions and reasons corresponding reaction description, which serve as semantic prompts, and a reacting process that predicts reactions based on input action and the inferred semantic prompts. Second, to effectively represent multi-person motion in language models, we propose a unified motion tokenizer by decoupling egocentric pose and absolute space features, which effectively represents action and reaction motion with same encoding. Extensive experiments demonstrate that TTR outperforms existing baselines, achieving significant improvements in evaluation metrics, such as reducing FID from 3.988 to 1.942.

Authors:Giulio Antonio Abbo, Maria Jose Pinto-Bernal, Martijn Catrycke, Tony Belpaeme
Title: Fast Multi-Party Open-Ended Conversation with a Social Robot
Abstract:
This paper presents the implementation and evaluation of a conversational agent designed for multi-party open-ended interactions. Leveraging state-of-the-art technologies such as voice direction of arrival, voice recognition, face tracking, and large language models, the system aims to facilitate natural and intuitive human-robot conversations. Deployed on the Furhat robot, the system was tested with 30 participants engaging in open-ended group conversations and then in two overlapping discussions. Quantitative metrics, such as latencies and recognition accuracy, along with qualitative measures from user questionnaires, were collected to assess performance. The results highlight the system's effectiveness in managing multi-party interactions, though improvements are needed in response relevance and latency. This study contributes valuable insights for advancing human-robot interaction, particularly in enhancing the naturalness and engagement in group conversations.

Authors:Steve Benford, Rachael Garrett, Christine Li, Paul Tennent, Claudia Núñez-Pacheco, Ayse Kucukyilmaz, Vasiliki Tsaknaki, Kristina Höök, Praminda Caleb-Solly, Joe Marshall, Eike Schneiders, Kristina Popova, Jude Afana
Title: Tangles: Unpacking Extended Collision Experiences with Soma Trajectories
Abstract:
We reappraise the idea of colliding with robots, moving from a position that tries to avoid or mitigate collisions to one that considers them an important facet of human interaction. We report on a soma design workshop that explored how our bodies could collide with telepresence robots, mobility aids, and a quadruped robot. Based on our findings, we employed soma trajectories to analyse collisions as extended experiences that negotiate key transitions of consent, preparation, launch, contact, ripple, sting, untangle, debris and reflect. We then employed these ideas to analyse two collision experiences, an accidental collision between a person and a drone, and the deliberate design of a robot to play with cats, revealing how real-world collisions involve the complex and ongoing entanglement of soma trajectories. We discuss how viewing collisions as entangled trajectories, or tangles, can be used analytically, as a design approach, and as a lens to broach ethical complexity.

Authors:Anjana Arunkumar, Lace Padilla, Chris Bryan
Title: Lost in Translation: How Does Bilingualism Shape Reader Preferences for Annotated Charts?
Abstract:
Visualizations are powerful tools for conveying information but often rely on accompanying text for essential context and guidance. This study investigates the impact of annotation patterns on reader preferences and comprehension accuracy among multilingual populations, addressing a gap in visualization research. We conducted experiments with two groups fluent in English and either Tamil (n = 557) or Arabic (n = 539) across six visualization types, each varying in annotation volume and semantic content. Full-text annotations yielded the highest comprehension accuracy across all languages, while preferences diverged: English readers favored highly annotated charts, whereas Tamil/Arabic readers preferred full-text or minimally annotated versions. Semantic variations in annotations (L1-L4) did not significantly affect comprehension, demonstrating the robustness of text comprehension across languages. English annotations were generally preferred, with a tendency to think technically in English linked to greater aversion to non-English annotations, though this diminished among participants who regularly switched languages internally. Non-English annotations incorporating visual or external knowledge were less favored, particularly in titles. Our findings highlight cultural and educational factors influencing perceptions of visual information, underscoring the need for inclusive annotation practices for diverse linguistic audiences. All data and materials are available at: https://osf.io/ckdb4/.

Authors:Wen Gu, Zhaoxing Li, Jan Buermann, Jim Dilkes, Dimitris Michailidis, Shinobu Hasegawa, Vahid Yazdanpanah, Sebastian Stein
Title: Facilitating Automated Online Consensus Building through Parallel Thinking
Abstract:
Consensus building is inherently challenging due to the diverse opinions held by stakeholders. Effective facilitation is crucial to support the consensus building process and enable efficient group decision making. However, the effectiveness of facilitation is often constrained by human factors such as limited experience and scalability. In this research, we propose a Parallel Thinking-based Facilitation Agent (PTFA) that facilitates online, text-based consensus building processes. The PTFA automatically collects textual posts and leverages large language models (LLMs) to perform all of the six distinct roles of the well-established Six Thinking Hats technique in parallel thinking. To illustrate the potential of PTFA, a pilot study was carried out and PTFA's ability in idea generation, emotional probing, and deeper analysis of ideas was demonstrated. Furthermore, a comprehensive dataset that contains not only the conversational content among the participants but also between the participants and the agent is constructed for future study.

Authors:Roham Koohestani, Maliheh Izadi
Title: HyperSeq: A Hyper-Adaptive Representation for Predictive Sequencing of States
Abstract:
In the rapidly evolving world of software development, the surge in developers' reliance on AI-driven tools has transformed Integrated Development Environments into powerhouses of advanced features. This transformation, while boosting developers' productivity to unprecedented levels, comes with a catch: increased hardware demands for software development. Moreover, the significant economic and environmental toll of using these sophisticated models necessitates mechanisms that reduce unnecessary computational burdens. We propose HyperSeq - Hyper-Adaptive Representation for Predictive Sequencing of States - a novel, resource-efficient approach designed to model developers' cognitive states. HyperSeq facilitates precise action sequencing and enables real-time learning of user behavior. Our preliminary results show how HyperSeq excels in forecasting action sequences and achieves remarkable prediction accuracies that go beyond 70%. Notably, the model's online-learning capability allows it to substantially enhance its predictive accuracy in a majority of cases and increases its capability in forecasting next user actions with sufficient iterations for adaptation. Ultimately, our objective is to harness these predictions to refine and elevate the user experience dynamically within the IDE.

Authors:Melik Ozolcer, Tongze Zhang, Sang Won Bae
Title: Predicting Volleyball Season Performance Using Pre-Season Wearable Data and Machine Learning
Abstract:
Predicting performance outcomes has the potential to transform training approaches, inform coaching strategies, and deepen our understanding of the factors that contribute to athletic success. Traditional non-automated data analysis in sports are often difficult to scale. To address this gap, this study analyzes factors influencing athletic performance by leveraging passively collected sensor data from smartwatches and ecological momentary assessments (EMA). The study aims to differentiate between 14 collegiate volleyball players who go on to perform well or poorly, using data collected prior to the beginning of the season. This is achieved through an integrated feature set creation approach. The model, validated using leave-one-subject-out cross-validation, achieved promising predictive performance (F1 score = 0.75). Importantly, by utilizing data collected before the season starts, our approach offers an opportunity for players predicted to perform poorly to improve their projected outcomes through targeted interventions by virtue of daily model predictions. The findings from this study not only demonstrate the potential of machine learning in sports performance prediction but also shed light on key features along with subjective psycho-physiological states that are predictive of, or associated with, athletic success.

Authors:Yanwei Huang, Yan Miao, Di Weng, Adam Perer, Yingcai Wu
Title: StructVizor: Interactive Profiling of Semi-Structured Textual Data
Abstract:
Data profiling plays a critical role in understanding the structure of complex datasets and supporting numerous downstream tasks, such as social media analytics and financial fraud detection. While existing research predominantly focuses on structured data formats, a substantial portion of semi-structured textual data still requires ad-hoc and arduous manual profiling to extract and comprehend its internal structures. In this work, we propose StructVizor, an interactive profiling system that facilitates sensemaking and transformation of semi-structured textual data. Our tool mainly addresses two challenges: a) extracting and visualizing the diverse structural patterns within data, such as how information is organized or related, and b) enabling users to efficiently perform various wrangling operations on textual data. Through automatic data parsing and structure mining, StructVizor enables visual analytics of structural patterns, while incorporating novel interactions to enable profile-based data wrangling. A comparative user study involving 12 participants demonstrates the system's usability and its effectiveness in supporting exploratory data analysis and transformation tasks.

Authors:Tongze Zhang, Tammy Chung, Anind Dey, Sang Won Bae
Title: AXAI-CDSS : An Affective Explainable AI-Driven Clinical Decision Support System for Cannabis Use
Abstract:
As cannabis use has increased in recent years, researchers have come to rely on sophisticated machine learning models to predict cannabis use behavior and its impact on health. However, many artificial intelligence (AI) models lack transparency and interpretability due to their opaque nature, limiting their trust and adoption in real-world medical applications, such as clinical decision support systems (CDSS). To address this issue, this paper enhances algorithm explainability underlying CDSS by integrating multiple Explainable Artificial Intelligence (XAI) methods and applying causal inference techniques to clarify the model' predictive decisions under various scenarios. By providing deeper interpretability of the XAI outputs using Large Language Models (LLMs), we provide users with more personalized and accessible insights to overcome the challenges posed by AI's "black box" nature. Our system dynamically adjusts feedback based on user queries and emotional states, combining text-based sentiment analysis with real-time facial emotion recognition to ensure responses are empathetic, context-adaptive, and user-centered. This approach bridges the gap between the learning demands of interpretability and the need for intuitive understanding, enabling non-technical users such as clinicians and clinical researchers to interact effectively with AI models.} Ultimately, this approach improves usability, enhances perceived trustworthiness, and increases the impact of CDSS in healthcare applications.

Authors:Jack West, Bengisu Cagiltay, Shirley Zhang, Jingjie Li, Kassem Fawaz, Suman Banerjee
Title: "Impressively Scary:" Exploring User Perceptions and Reactions to Unraveling Machine Learning Models in Social Media Applications
Abstract:
Machine learning models deployed locally on social media applications are used for features, such as face filters which read faces in-real time, and they expose sensitive attributes to the apps. However, the deployment of machine learning models, e.g., when, where, and how they are used, in social media applications is opaque to users. We aim to address this inconsistency and investigate how social media user perceptions and behaviors change once exposed to these models. We conducted user studies (N=21) and found that participants were unaware to both what the models output and when the models were used in Instagram and TikTok, two major social media platforms. In response to being exposed to the models' functionality, we observed long term behavior changes in 8 participants. Our analysis uncovers the challenges and opportunities in providing transparency for machine learning models that interact with local user data.

Authors:Vincent Freiberger, Arthur Fleig, Erik Buchmann
Title: "You don't need a university degree to comprehend data protection this way": LLM-Powered Interactive Privacy Policy Assessment
Abstract:
Protecting online privacy requires users to engage with and comprehend website privacy policies, but many policies are difficult and tedious to read. We present the first qualitative user study on Large Language Model (LLM)-driven privacy policy assessment. To this end, we build and evaluate an LLM-based privacy policy assessment browser extension, which helps users understand the essence of a lengthy, complex privacy policy while browsing. The tool integrates a dashboard and an LLM chat. In our qualitative user study (N=22), we evaluate usability, understandability of the information our tool provides, and its impacts on awareness. While providing a comprehensible quick overview and a chat for in-depth discussion improves privacy awareness, users note issues with building trust in the tool. From our insights, we derive important design implications to guide future policy analysis tools.

Authors:Husne Ara Rubaiyeat, Njayou Youssouf, Md Kamrul Hasan, Hasan Mahmud
Title: BdSLW401: Transformer-Based Word-Level Bangla Sign Language Recognition Using Relative Quantization Encoding (RQE)
Abstract:
Sign language recognition (SLR) for low-resource languages like Bangla suffers from signer variability, viewpoint variations, and limited annotated datasets. In this paper, we present BdSLW401, a large-scale, multi-view, word-level Bangla Sign Language (BdSL) dataset with 401 signs and 102,176 video samples from 18 signers in front and lateral views. To improve transformer-based SLR, we introduce Relative Quantization Encoding (RQE), a structured embedding approach anchoring landmarks to physiological reference points and quantize motion trajectories. RQE improves attention allocation by decreasing spatial variability, resulting in 44.3% WER reduction in WLASL100, 21.0% in SignBD-200, and significant gains in BdSLW60 and SignBD-90. However, fixed quantization becomes insufficient on large-scale datasets (e.g., WLASL2000), indicating the need for adaptive encoding strategies. Further, RQE-SF, an extended variant that stabilizes shoulder landmarks, achieves improvements in pose consistency at the cost of small trade-offs in lateral view recognition. The attention graphs prove that RQE improves model interpretability by focusing on the major articulatory features (fingers, wrists) and the more distinctive frames instead of global pose changes. Introducing BdSLW401 and demonstrating the effectiveness of RQE-enhanced structured embeddings, this work advances transformer-based SLR for low-resource languages and sets a benchmark for future research in this area.

Authors:Nikita Soni, Pranav Chitale, Khushboo Singh, Niranjan Balasubramanian, H. Andrew Schwartz
Title: Evaluation of LLMs-based Hidden States as Author Representations for Psychological Human-Centered NLP Tasks
Abstract:
Like most of NLP, models for human-centered NLP tasks -- tasks attempting to assess author-level information -- predominantly use representations derived from hidden states of Transformer-based LLMs. However, what component of the LM is used for the representation varies widely. Moreover, there is a need for Human Language Models (HuLMs) that implicitly model the author and provide a user-level hidden state. Here, we systematically evaluate different ways of representing documents and users using different LM and HuLM architectures to predict task outcomes as both dynamically changing states and averaged trait-like user-level attributes of valence, arousal, empathy, and distress. We find that representing documents as an average of the token hidden states performs the best generally. Further, while a user-level hidden state itself is rarely the best representation, we find its inclusion in the model strengthens token or document embeddings used to derive document- and user-level representations resulting in best performances.

Authors:Shiwali Mohan, Aaron H. Mininger, James R. Kirk, John E. Laird
Title: Acquiring Grounded Representations of Words with Situated Interactive Instruction
Abstract:
We present an approach for acquiring grounded representations of words from mixed-initiative, situated interactions with a human instructor. The work focuses on the acquisition of diverse types of knowledge including perceptual, semantic, and procedural knowledge along with learning grounded meanings. Interactive learning allows the agent to control its learning by requesting instructions about unknown concepts, making learning efficient. Our approach has been instantiated in Soar and has been evaluated on a table-top robotic arm capable of manipulating small objects.

Authors:Jackie Chan, Fred Choi, Koustuv Saha, Eshwar Chandrasekharan
Title: Examining Algorithmic Curation on Social Media: An Empirical Audit of Reddit's r/popular Feed
Abstract:
Platforms are increasingly relying on algorithms to curate the content within users' social media feeds. However, the growing prominence of proprietary, algorithmically curated feeds has concealed what factors influence the presentation of content on social media feeds and how that presentation affects user behavior. This lack of transparency can be detrimental to users, from reducing users' agency over their content consumption to the propagation of misinformation and toxic content. To uncover details about how these feeds operate and influence user behavior, we conduct an empirical audit of Reddit's algorithmically curated trending feed called r/popular. Using 10K r/popular posts collected by taking snapshots of the feed over 11 months, we find that recent comments help a post remain on r/popular longer and climb the feed. We also find that posts below rank 80 correspond to a sharp decline in activity compared to posts above. When examining the effects of having a higher proportion of undesired behavior -- i.e., moderator-removed and toxic comments -- we find no significant evidence that it helps posts stay on r/popular for longer. Although posts closer to the top receive more undesired comments, we find this increase to coincide with a broader increase in overall engagement -- rather than indicating a disproportionate effect on undesired activity. The relationships between algorithmic rank and engagement highlight the extent to which algorithms employed by social media platforms essentially determine which content is prioritized and which is not. We conclude by discussing how content creators, consumers, and moderators on social media platforms can benefit from empirical audits aimed at improving transparency in algorithmically curated feeds.

Authors:Yaman Yu, Bektur Ryskeldiev, Ayaka Tsutsui, Matthew Gillingham, Yang Wang
Title: LLM-Driven Optimization of HTML Structure to Support Screen Reader Navigation
Abstract:
Online interactions and e-commerce are commonplace among BLV users. Despite the implementation of web accessibility standards, many e-commerce platforms continue to present challenges to screen reader users, particularly in areas like webpage navigation and information retrieval. We investigate the difficulties encountered by screen reader users during online shopping experiences. We conducted a formative study with BLV users and designed a web browser plugin that uses GenAI to restructure webpage content in real time. Our approach improved the header hierarchy and provided correct labeling for essential information. We evaluated the effectiveness of this solution using an automated accessibility tool and through user interviews. Our results show that the revised webpages generated by our system offer significant improvements over the original webpages regarding screen reader navigation experience. Based on our findings, we discuss its potential usage as both a user and developer tool that can significantly enhance screen reader accessibility of webpages.

Authors:Devansh Saxena, Zoe Kahn, Erina Seh-Young Moon, Lauren M. Chambers, Corey Jackson, Min Kyung Lee, Motahhare Eslami, Shion Guha, Sheena Erete, Lilly Irani, Deirdre Mulligan, John Zimmerman
Title: Emerging Practices in Participatory AI Design in Public Sector Innovation
Abstract:
Local and federal agencies are rapidly adopting AI systems to augment or automate critical decisions, efficiently use resources, and improve public service delivery. AI systems are being used to support tasks associated with urban planning, security, surveillance, energy and critical infrastructure, and support decisions that directly affect citizens and their ability to access essential services. Local governments act as the governance tier closest to citizens and must play a critical role in upholding democratic values and building community trust especially as it relates to smart city initiatives that seek to transform public services through the adoption of AI. Community-centered and participatory approaches have been central for ensuring the appropriate adoption of technology; however, AI innovation introduces new challenges in this context because participatory AI design methods require more robust formulation and face higher standards for implementation in the public sector compared to the private sector. This requires us to reassess traditional methods used in this space as well as develop new resources and methods. This workshop will explore emerging practices in participatory algorithm design - or the use of public participation and community engagement - in the scoping, design, adoption, and implementation of public sector algorithms.

Authors:Yuexi Chen, Yimin Xiao, Kazi Tasnim Zinat, Naomi Yamashita, Ge Gao, Zhicheng Liu
Title: Comparing Native and Non-native English Speakers' Behaviors in Collaborative Writing through Visual Analytics
Abstract:
Understanding collaborative writing dynamics between native speakers (NS) and non-native speakers (NNS) is critical for enhancing collaboration quality and team inclusivity. In this paper, we partnered with communication researchers to develop visual analytics solutions for comparing NS and NNS behaviors in 162 writing sessions across 27 teams. The primary challenges in analyzing writing behaviors are data complexity and the uncertainties introduced by automated methods. In response, we present \textsc{COALA}, a novel visual analytics tool that improves model interpretability by displaying uncertainties in author clusters, generating behavior summaries using large language models, and visualizing writing-related actions at multiple granularities. We validated the effectiveness of \textsc{COALA} through user studies with domain experts (N=2+2) and researchers with relevant experience (N=8). We present the insights discovered by participants using \textsc{COALA}, suggest features for future AI-assisted collaborative writing tools, and discuss the broader implications for analyzing collaborative processes beyond writing.

Authors:Jarod Lévy, Mingfang Zhang, Svetlana Pinet, Jérémy Rapin, Hubert Banville, Stéphane d'Ascoli, Jean-Rémi King
Title: Brain-to-Text Decoding: A Non-invasive Approach via Typing
Abstract:
Modern neuroprostheses can now restore communication in patients who have lost the ability to speak or move. However, these invasive devices entail risks inherent to neurosurgery. Here, we introduce a non-invasive method to decode the production of sentences from brain activity and demonstrate its efficacy in a cohort of 35 healthy volunteers. For this, we present Brain2Qwerty, a new deep learning architecture trained to decode sentences from either electro- (EEG) or magneto-encephalography (MEG), while participants typed briefly memorized sentences on a QWERTY keyboard. With MEG, Brain2Qwerty reaches, on average, a character-error-rate (CER) of 32% and substantially outperforms EEG (CER: 67%). For the best participants, the model achieves a CER of 19%, and can perfectly decode a variety of sentences outside of the training set. While error analyses suggest that decoding depends on motor processes, the analysis of typographical errors suggests that it also involves higher-level cognitive factors. Overall, these results narrow the gap between invasive and non-invasive methods and thus open the path for developing safe brain-computer interfaces for non-communicating patients.

Authors:José Manuel Alcalde-Llergo, Pilar Aparicio-Martínez, Andrea Zingoni, Sara Pinzi, Enrique Yeguas-Bolívar
Title: Fostering Inclusion: A Virtual Reality Experience to Raise Awareness of Dyslexia-Related Barriers in University Settings
Abstract:
This work introduces the design, implementation, and validation of a virtual reality (VR) experience aimed at promoting the inclusion of individuals with dyslexia in university settings. Unlike traditional awareness methods, this immersive approach offers a novel way to foster empathy by allowing participants to experience firsthand the challenges faced by students with dyslexia. Specifically, the experience raises awareness by exposing non-dyslexic individuals to the difficulties commonly encountered by dyslexic students. In the virtual environment, participants explore a virtual campus with multiple buildings, navigating between them while completing tasks and simultaneously encountering barriers that simulate some of the challenges faced by individuals with dyslexia. These barriers include reading signs with shifting letters, following directional arrows that may point incorrectly, and dealing with a lack of assistance. The campus is a comprehensive model featuring both indoor and outdoor spaces and supporting various modes of locomotion. To validate the experience, more than 30 non-dyslexic participants from the university environment, mainly professors and students, evaluated it through ad hoc satisfaction surveys. The results indicated heightened awareness of the barriers encountered by students with dyslexia, with participants deeming the experience a valuable tool for increasing visibility and fostering understanding of dyslexic students.

Authors:Sangwook Lee, Adnan Abbas, Yan Chen, Sang Won Lee
Title: CHOIR: Chat-based Helper for Organizational Intelligence Repository
Abstract:
Modern organizations frequently rely on chat-based platforms (e.g., Slack, Microsoft Teams, and Discord) for day-to-day communication and decision-making. As conversations evolve, organizational knowledge can get buried, prompting repeated searches and discussions. While maintaining shared documents, such as Wiki articles for the organization, offers a partial solution, it requires manual and timely efforts to keep it up to date, and it may not effectively preserve the social and contextual aspect of prior discussions. Moreover, reaching a consensus on document updates with relevant stakeholders can be time-consuming and complex. To address these challenges, we introduce CHOIR (Chat-based Helper for Organizational Intelligence Repository), a chatbot that integrates seamlessly with chat platforms. CHOIR automatically identifies and proposes edits to related documents, initiates discussions with relevant team members, and preserves contextual revision histories. By embedding knowledge management directly into chat environments and leveraging LLMs, CHOIR simplifies manual updates and supports consensus-driven editing based on maintained context with revision histories. We plan to design, deploy, and evaluate CHOIR in the context of maintaining an organizational memory for a research lab. We describe the chatbot's motivation, design, and early implementation to show how CHOIR streamlines collaborative document management.

Authors:Julian Speith, Steffen Becker, Timo Speith, Markus Weber, Yixin Zou, Asia Biega, Christof Paar
Title: "Make the Voodoo Box Go Bleep Bloop:" Exploring End Users' Understanding and Information Needs Regarding Microchips
Abstract:
Microchips are fundamental components of modern electronic devices, yet they remain opaque to the users who rely on them daily. This opacity, compounded by the complexity of global supply chains and the concealment of proprietary information, raises significant security, trust, and accountability issues. We investigate end users' understanding of microchips, exploring their perceptions of the societal implications and information needs regarding these essential technologies. Through an online survey with 250 participants, we found that while our participants were aware of some microchip applications, they lacked awareness of the broader security, societal, and economic implications. While our participants unanimously desired more information on microchips, their specific information needs were shaped by various factors such as the microchip's application environment and one's affinity for technology interaction. Our findings underscore the necessity for improving end users' awareness and understanding of microchips, and we provide possible directions to pursue this end.

Authors:Riya Sahni, Lydia B. Chilton
Title: Beyond Training: Social Dynamics of AI Adoption in Industry
Abstract:
While organizations continue to invest in AI tools like M365 Copilot, little is known about how individual employees engage with these technologies once deployed. This study examines M365 Copilot adoption behaviors among a group of 10 experienced users across many industries in the United States. Findings reveal a strong preference for informal learning methods over structured training. Even though 9 out of 10 participants acknowledged that formal training for Copilot tools would be useful, 7 out of 10 stated that they ignored the Copilot onboarding videos provided to them, citing reasons such as time constraints, preference for self-guided learning, or reliance on external resources like ChatGPT. No participants used formal training as their primary learning method. Instead, experiential learning (trial and error, 8 participants) and social learning (peer discussions, 6 participants) emerged as dominant learning strategies. We discuss opportunities for promoting social learning of AI tools in the workplace.

Authors:Anjali Singh, Karan Taneja, Zhitong Guan, Avijit Ghosh
Title: Protecting Human Cognition in the Age of AI
Abstract:
The rapid adoption of Generative AI (GenAI) is significantly reshaping human cognition, influencing how we engage with information, think, reason, and learn. This paper synthesizes existing literature on GenAI's effects on different aspects of human cognition. Drawing on Krathwohl's revised Bloom's Taxonomy and Dewey's conceptualization of reflective thought, we examine the mechanisms through which GenAI is affecting the development of different cognitive abilities. We focus on novices, such as students, who may lack both domain knowledge and an understanding of effective human-AI interaction. Accordingly, we provide implications for rethinking and designing educational experiences that foster critical thinking and deeper cognitive engagement.

Authors:Zhuoyan Li, Hangxiao Zhu, Zhuoran Lu, Ziang Xiao, Ming Yin
Title: From Text to Trust: Empowering AI-assisted Decision Making with Adaptive LLM-powered Analysis
Abstract:
AI-assisted decision making becomes increasingly prevalent, yet individuals often fail to utilize AI-based decision aids appropriately especially when the AI explanations are absent, potentially as they do not %understand reflect on AI's decision recommendations critically. Large language models (LLMs), with their exceptional conversational and analytical capabilities, present great opportunities to enhance AI-assisted decision making in the absence of AI explanations by providing natural-language-based analysis of AI's decision recommendation, e.g., how each feature of a decision making task might contribute to the AI recommendation. In this paper, via a randomized experiment, we first show that presenting LLM-powered analysis of each task feature, either sequentially or concurrently, does not significantly improve people's AI-assisted decision performance. To enable decision makers to better leverage LLM-powered analysis, we then propose an algorithmic framework to characterize the effects of LLM-powered analysis on human decisions and dynamically decide which analysis to present. Our evaluation with human subjects shows that this approach effectively improves decision makers' appropriate reliance on AI in AI-assisted decision making.

Authors:Momin Siddiqui, Roy Pea, Hari Subramonyam
Title: Script&Shift: A Layered Interface Paradigm for Integrating Content Development and Rhetorical Strategy with LLM Writing Assistants
Abstract:
Good writing is a dynamic process of knowledge transformation, where writers refine and evolve ideas through planning, translating, and reviewing. Generative AI-powered writing tools can enhance this process but may also disrupt the natural flow of writing, such as when using LLMs for complex tasks like restructuring content across different sections or creating smooth transitions. We introduce Script&Shift, a layered interface paradigm designed to minimize these disruptions by aligning writing intents with LLM capabilities to support diverse content development and rhetorical strategies. By bridging envisioning, semantic, and articulatory distances, Script&Shift's interactions allow writers to leverage LLMs for various content development tasks (scripting) and experiment with diverse organization strategies while tailoring their writing for different audiences (shifting). This approach preserves creative control while encouraging divergent and iterative writing. Our evaluation shows that Script&Shift enables writers to creatively and efficiently incorporate LLMs while preserving a natural flow of composition.

Authors:Hamed Rahimi, Adil Bahaj, Mouad Abrini, Mahdi Khoramshahi, Mounir Ghogho, Mohamed Chetouani
Title: USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions
Abstract:
The integration of vision-language models into robotic systems constitutes a significant advancement in enabling machines to interact with their surroundings in a more intuitive manner. While VLMs offer rich multimodal reasoning, existing approaches lack user-specific adaptability, often relying on generic interaction paradigms that fail to account for individual behavioral, contextual, or socio-emotional nuances. When customization is attempted, ethical concerns arise from unmitigated biases in user data, risking exclusion or unfair treatment. To address these dual challenges, we propose User-VLM 360°, a holistic framework integrating multimodal user modeling with bias-aware optimization. Our approach features: (1) user-aware tuning that adapts interactions in real time using visual-linguistic signals; (2) bias mitigation via preference optimization; and (3) curated 360° socio-emotive interaction datasets annotated with demographic, emotion, and relational metadata. Evaluations across eight benchmarks demonstrate state-of-the-art results: +35.3% F1 in personalized VQA, +47.5% F1 in facial features understanding, 15% bias reduction, and 30X speedup over baselines. Ablation studies confirm component efficacy, and deployment on the Pepper robot validates real-time adaptability across diverse users. We open-source parameter-efficient 3B/10B models and an ethical verification framework for responsible adaptation.

Authors:Dom CP Marticorena, Zeyu Lu, Chris Wissmann, Yash Agarwal, David Garrison, John M Zempel, Dennis L Barbour
Title: Immersive virtual games: winners for deep cognitive assessment
Abstract:
Studies of human cognition often rely on brief, controlled tasks emphasizing group-level effects but poorly capturing individual variability. A suite of minigames on the novel PixelDOPA platform was designed to overcome these limitations by embedding classic cognitive tasks in a 3D virtual environment with continuous behavior logging. Four minigames explore constructs overlapping NIH Toolbox tasks: processing speed, rule shifting, inhibitory control, and working memory. In a clinical sample of 60 participants outside a controlled lab setting, large correlations (r=0.42-0.93) were found between PixelDOPA tasks and NIH Toolbox counterparts, despite differences in stimuli and task structures. Process-informed metrics (e.g., gaze-based response times) improved task convergence and data quality. Test-retest analyses showed high reliability (ICC=0.52-0.83) for all minigames. Beyond endpoint metrics, movement and gaze trajectories revealed stable, idiosyncratic gameplay strategy profiles, with unsupervised clustering differentiating participants by navigational and viewing behaviors. These trajectory-based features showed lower within-person variability than between-person variability, facilitating participant identification across sessions. Game-based tasks can therefore retain psychometric rigor of standard cognitive assessments while providing insights into dynamic individual-specific behaviors. By using an engaging, customizable game engine, comprehensive behavioral tracking can boost power to detect individual differences without sacrificing group-level inference. This possibility reveals a path toward cognitive measures that are both robust and ecologically valid, even in less-than-ideal data collection settings.

Authors:Alexander Beiser, Susana Hahn, Torsten Schaub
Title: ASP-driven User-interaction with Clinguin
Abstract:
We present clinguin, a system for ASP-driven user interface design. Clinguin streamlines the development of user interfaces for ASP developers by letting them build interactive prototypes directly in ASP, eliminating the need for separate frontend languages. To this end, clinguin uses a few dedicated predicates to define user interfaces and the treatment of user-triggered events. This simple design greatly facilitates the specification of user interactions with an ASP system, in our case clingo.

Authors:Erina Seh-Young Moon, Devansh Saxena, Dipto Das, Shion Guha
Title: The Datafication of Care in Public Homelessness Services
Abstract:
Homelessness systems in North America adopt coordinated data-driven approaches to efficiently match support services to clients based on their assessed needs and available resources. AI tools are increasingly being implemented to allocate resources, reduce costs and predict risks in this space. In this study, we conducted an ethnographic case study on the City of Toronto's homelessness system's data practices across different critical points. We show how the City's data practices offer standardized processes for client care but frontline workers also engage in heuristic decision-making in their work to navigate uncertainties, client resistance to sharing information, and resource constraints. From these findings, we show the temporality of client data which constrain the validity of predictive AI models. Additionally, we highlight how the City adopts an iterative and holistic client assessment approach which contrasts to commonly used risk assessment tools in homelessness, providing future directions to design holistic decision-making tools for homelessness.

Authors:Hy Dang, Yuwen Lu, Jason Spicer, Tamara Kay, Di Yang, Yang Yang, Jay Brockman, Meng Jiang, Toby Jia-Jun Li
Title: Uncovering Disparities in Rideshare Drivers Earning and Work Patterns: A Case Study of Chicago
Abstract:
Ride-sharing services are revolutionizing urban mobility while simultaneously raising significant concerns regarding fairness and driver equity. This study employs Chicago Trip Network Provider dataset to investigate disparities in ride-sharing earnings between 2018 and 2023. Our analysis reveals marked temporal shifts, including an earnings surge in early 2021 followed by fluctuations and a decline in inflation-adjusted income, as well as pronounced spatial disparities, with drivers in Central and airport regions earning substantially more than those in peripheral areas. Recognizing the limitations of trip-level data, we introduce a novel trip-driver assignment algorithm to reconstruct plausible daily work patterns, uncovering distinct driver clusters with varied earning profiles. Notably, drivers operating during late-evening and overnight hours secure higher per-trip and hourly rates, while emerging groups in low-demand regions face significant earnings deficits. Our findings call for more transparent pricing models and a re-examination of platform design to promote equitable driver outcomes.

Authors:Duan Li, Xinyuan Guo, Xinhuan Shu, Lanxi Xiao, Lingyun Yu, Shixia Liu
Title: RouteFlow: Trajectory-Aware Animated Transitions
Abstract:
Animating objects' movements is widely used to facilitate tracking changes and observing both the global trend and local hotspots where objects converge or diverge. Existing methods, however, often obscure critical local hotspots by only considering the start and end positions of objects' trajectories. To address this gap, we propose RouteFlow, a trajectory-aware animated transition method that effectively balances the global trend and local hotspots while minimizing occlusion. RouteFlow is inspired by a real-world bus route analogy: objects are regarded as passengers traveling together, with local hotspots representing bus stops where these passengers get on and off. Based on this analogy, animation paths are generated like bus routes, with the object layout generated similarly to seat allocation according to their destinations. Compared with state-of-the-art methods, RouteFlow better facilitates identifying the global trend and locating local hotspots while performing comparably in tracking objects' movements.

Authors:Huanchen Wang, Tianrun Qiu, Jiaping Li, Zhicong Lu, Yuxin Ma
Title: HarmonyCut: Supporting Creative Chinese Paper-cutting Design with Form and Connotation Harmony
Abstract:
Chinese paper-cutting, an Intangible Cultural Heritage (ICH), faces challenges from the erosion of traditional culture due to the prevalence of realism alongside limited public access to cultural elements. While generative AI can enhance paper-cutting design with its extensive knowledge base and efficient production capabilities, it often struggles to align content with cultural meaning due to users' and models' lack of comprehensive paper-cutting knowledge. To address these issues, we conducted a formative study (N=7) to identify the workflow and design space, including four core factors (Function, Subject Matter, Style, and Method of Expression) and a key element (Pattern). We then developed HarmonyCut, a generative AI-based tool that translates abstract intentions into creative and structured ideas. This tool facilitates the exploration of suggested related content (knowledge, works, and patterns), enabling users to select, combine, and adjust elements for creative paper-cutting design. A user study (N=16) and an expert evaluation (N=3) demonstrated that HarmonyCut effectively provided relevant knowledge, aiding the ideation of diverse paper-cutting designs and maintaining design quality within the design space to ensure alignment between form and cultural connotation.

Authors:Ko Watanabe, Nico Förster, Shoya Ishimaru
Title: SensPS: Sensing Personal Space Comfortable Distance between Human-Human Using Multimodal Sensors
Abstract:
Personal space, also known as peripersonal space, is crucial in human social interaction, influencing comfort, communication, and social stress. Estimating and respecting personal space is essential for enhancing human-computer interaction (HCI) and smart environments. Personal space preferences vary due to individual traits, cultural background, and contextual factors. Advanced multimodal sensing technologies, including eye-tracking and wristband sensors, offer opportunities to develop adaptive systems that dynamically adjust to user comfort levels. Integrating physiological and behavioral data enables a deeper understanding of spatial interactions. This study develops a sensor-based model to estimate comfortable personal space and identifies key features influencing spatial preferences. Our findings show that multimodal sensors, particularly eye-tracking and physiological wristband data, can effectively predict personal space preferences, with eye-tracking data playing a more significant role. An experimental study involving controlled human interactions demonstrates that a Transformer-based model achieves the highest predictive accuracy (F1 score: 0.87) for estimating personal space. Eye-tracking features, such as gaze point and pupil diameter, emerge as the most significant predictors, while physiological signals from wristband sensors contribute marginally. These results highlight the potential for AI-driven personalization of social space in adaptive environments, suggesting that multimodal sensing can be leveraged to develop intelligent systems that optimize spatial arrangements in workplaces, educational institutions, and public settings. Future work should explore larger datasets, real-world applications, and additional physiological markers to enhance model robustness.

Authors:Runlin Duan, Shao-Kang Hsia, Yuzhao Chen, Yichen Hu, Ming Yin, Karthik Ramani
Title: Investigating Creativity in Humans and Generative AI Through Circles Exercises
Abstract:
Generative AI (GenAI) is transforming the creativity process. However, as presented in this paper, GenAI encounters "narrow creativity" barriers. We observe that both humans and GenAI focus on limited subsets of the design space. We investigate this phenomenon using the "Circles Exercise," a creativity test widely used to examine the creativity of humans. Quantitative analysis reveals that humans tend to generate familiar, high-frequency ideas, while GenAI produces a larger volume of incremental innovations at a low cost. However, similar to humans, it struggles to significantly expand creative boundaries. Moreover, advanced prompting strategies, such as Chain-of-Thought (CoT) prompting, mitigate narrow creativity issues but still fall short of substantially broadening the creative scope of humans and GenAI. These findings underscore both the challenges and opportunities for advancing GenAI-powered human creativity support tools.

Authors:Steven A. Lehr, Ketan S. Saichandran, Eddie Harmon-Jones, Nykko Vitali, Mahzarin R. Banaji
Title: Kernels of Selfhood: GPT-4o shows humanlike patterns of cognitive consistency moderated by free choice
Abstract:
Large Language Models (LLMs) show emergent patterns that mimic human cognition. We explore whether they also mirror other, less deliberative human psychological processes. Drawing upon classical theories of cognitive consistency, two preregistered studies tested whether GPT-4o changed its attitudes toward Vladimir Putin in the direction of a positive or negative essay it wrote about the Russian leader. Indeed, GPT displayed patterns of attitude change mimicking cognitive consistency effects in humans. Even more remarkably, the degree of change increased sharply when the LLM was offered an illusion of choice about which essay (positive or negative) to write. This result suggests that GPT-4o manifests a functional analog of humanlike selfhood, although how faithfully the chatbot's behavior reflects the mechanisms of human attitude change remains to be understood.

Authors:Peinuan Qin, Chi-Lan Yang, Jingshu Li, Jing Wen, Yi-Chieh Lee
Title: Timing Matters: How Using LLMs at Different Timings Influences Writers' Perceptions and Ideation Outcomes in AI-Assisted Ideation
Abstract:
Large Language Models (LLMs) have been widely used to support ideation in the writing process. However, whether generating ideas with the help of LLMs leads to idea fixation or idea expansion is unclear. This study examines how different timings of LLM usage - either at the beginning or after independent ideation - affect people's perceptions and ideation outcomes in a writing task. In a controlled experiment with 60 participants, we found that using LLMs from the beginning reduced the number of original ideas and lowered creative self-efficacy and self-credit, mediated by changes in autonomy and ownership. We discuss the challenges and opportunities associated with using LLMs to assist in idea generation. We propose delaying the use of LLMs to support ideation while considering users' self-efficacy, autonomy, and ownership of the ideation outcomes.

Authors:Sam Yu-Te Lee, Cheng-Wei Hung, Mei-Hua Yuan, Kwan-Liu Ma
Title: Visual Text Mining with Progressive Taxonomy Construction for Environmental Studies
Abstract:
Environmental experts have developed the DPSIR (Driver, Pressure, State, Impact, Response) framework to systematically study and communicate key relationships between society and the environment. Using this framework requires experts to construct a DPSIR taxonomy from a corpus, annotate the documents, and identify DPSIR variables and relationships, which is laborious and inflexible. Automating it with conventional text mining faces technical challenges, primarily because the taxonomy often begins with abstract definitions, which experts progressively refine and contextualize as they annotate the corpus. In response, we develop GreenMine, a system that supports interactive text mining with prompt engineering. The system implements a prompting pipeline consisting of three simple and evaluable subtasks. In each subtask, the DPSIR taxonomy can be defined in natural language and iteratively refined as experts analyze the corpus. To support users evaluate the taxonomy, we introduce an uncertainty score based on response consistency. Then, we design a radial uncertainty chart that visualizes uncertainties and corpus topics, which supports interleaved evaluation and exploration. Using the system, experts can progressively construct the DPSIR taxonomy and annotate the corpus with LLMs. Using real-world interview transcripts, we present a case study to demonstrate the capability of the system in supporting interactive mining of DPSIR relationships, and an expert review in the form of collaborative discussion to understand the potential and limitations of the system. We discuss the lessons learned from developing the system and future opportunities for supporting interactive text mining in knowledge-intensive tasks for other application scenarios.

Authors:Han Zhang, Rotem Shalev-Arkushin, Vasileios Baltatzis, Connor Gillis, Gierad Laput, Raja Kushalnagar, Lorna Quandt, Leah Findlater, Abdelkareem Bedri, Colin Lea
Title: Towards AI-driven Sign Language Generation with Non-manual Markers
Abstract:
Sign languages are essential for the Deaf and Hard-of-Hearing (DHH) community. Sign language generation systems have the potential to support communication by translating from written languages, such as English, into signed videos. However, current systems often fail to meet user needs due to poor translation of grammatical structures, the absence of facial cues and body language, and insufficient visual and motion fidelity. We address these challenges by building on recent advances in LLMs and video generation models to translate English sentences into natural-looking AI ASL signers. The text component of our model extracts information for manual and non-manual components of ASL, which are used to synthesize skeletal pose sequences and corresponding video frames. Our findings from a user study with 30 DHH participants and thorough technical evaluations demonstrate significant progress and identify critical areas necessary to meet user needs.

Authors:Chimdi Chikezie, Pannapat Chenpaiseng, Puja Agarwal, Sadia Afroz, Bhavika Madhwani, Rudrajit Choudhuri, Andrew Anderson, Prisha Velhal, Patricia Morreale, Christopher Bogart, Anita Sarma, Margaret Burnett
Title: Measuring SES-related traits relating to technology usage: Two validated surveys
Abstract:
Software producers are now recognizing the importance of improving their products' suitability for diverse populations, but little attention has been given to measurements to shed light on products' suitability to individuals below the median socioeconomic status (SES) -- who, by definition, make up half the population. To enable software practitioners to attend to both lower- and higher-SES individuals, this paper provides two new surveys that together facilitate measuring how well a software product serves socioeconomically diverse populations. The first survey (SES-Subjective) is who-oriented: it measures who their potential or current users are in terms of their subjective SES (perceptions of their SES). The second survey (SES-Facets) is why-oriented: it collects individuals' values for an evidence-based set of facet values (individual traits) that (1) statistically differ by SES and (2) affect how an individual works and problem-solves with software products. Our empirical validations with deployments at University A and University B (464 and 522 responses, respectively) showed that both surveys are reliable. Further, our results statistically agree with both ground truth data on respondents' socioeconomic statuses and with predictions from foundational literature. Finally, we explain how the pair of surveys is uniquely actionable by software practitioners, such as in requirements gathering, debugging, quality assurance activities, maintenance activities, and fulfilling legal reporting requirements such as those being drafted by various governments for AI-powered software.

Authors:Junlong Chen, Rosella P. Galindo Esparza, Vanja Garaj, Per Ola Kristensson, John Dudley
Title: EnVisionVR: A Scene Interpretation Tool for Visual Accessibility in Virtual Reality
Abstract:
Effective visual accessibility in Virtual Reality (VR) is crucial for Blind and Low Vision (BLV) users. However, designing visual accessibility systems is challenging due to the complexity of 3D VR environments and the need for techniques that can be easily retrofitted into existing applications. While prior work has studied how to enhance or translate visual information, the advancement of Vision Language Models (VLMs) provides an exciting opportunity to advance the scene interpretation capability of current systems. This paper presents EnVisionVR, an accessibility tool for VR scene interpretation. Through a formative study of usability barriers, we confirmed the lack of visual accessibility features as a key barrier for BLV users of VR content and applications. In response, we designed and developed EnVisionVR, a novel visual accessibility system leveraging a VLM, voice input and multimodal feedback for scene interpretation and virtual object interaction in VR. An evaluation with 12 BLV users demonstrated that EnVisionVR significantly improved their ability to locate virtual objects, effectively supporting scene understanding and object interaction.

Authors:Danqing Shi, Yujun Zhu, Francisco Erivaldo Fernandes Junior, Shumin Zhai, Antti Oulasvirta
Title: Simulating Errors in Touchscreen Typing
Abstract:
Empirical evidence shows that typing on touchscreen devices is prone to errors and that correcting them poses a major detriment to users' performance. Design of text entry systems that better serve users, across their broad capability range, necessitates understanding the cognitive mechanisms that underpin these errors. However, prior models of typing cover only motor slips. The paper reports on extending the scope of computational modeling of typing to cover the cognitive mechanisms behind the three main types of error: slips (inaccurate execution), lapses (forgetting), and mistakes (incorrect knowledge). Given a phrase, a keyboard, and user parameters, Typoist simulates eye and finger movements while making human-like insertion, omission, substitution, and transposition errors. Its main technical contribution is the formulation of a supervisory control problem wherein the controller allocates cognitive resources to detect and fix errors generated by the various mechanisms. The model generates predictions of typing performance that can inform design, for better text entry systems.

Authors:Natasha Maniar, Samantha W. T. Chan, Wazeer Zulfikar, Scott Ren, Christine Xu, Pattie Maes
Title: MemPal: Leveraging Multimodal AI and LLMs for Voice-Activated Object Retrieval in Homes of Older Adults
Abstract:
Older adults have increasing difficulty with retrospective memory, hindering their abilities to perform daily activities and posing stress on caregivers to ensure their wellbeing. Recent developments in Artificial Intelligence (AI) and large context-aware multimodal models offer an opportunity to create memory support systems that assist older adults with common issues like object finding. This paper discusses the development of an AI-based, wearable memory assistant, MemPal, that helps older adults with a common problem, finding lost objects at home, and presents results from tests of the system in older adults' own homes. Using visual context from a wearable camera, the multimodal LLM system creates a real-time automated text diary of the person's activities for memory support purposes, offering object retrieval assistance using a voice-based interface. The system is designed to support additional use cases like context-based proactive safety reminders and recall of past actions. We report on a quantitative and qualitative study with N=15 older adults within their own homes that showed improved performance of object finding with audio-based assistance compared to no aid and positive overall user perceptions on the designed system. We discuss further applications of MemPal's design as a multi-purpose memory aid and future design guidelines to adapt memory assistants to older adults' unique needs.

Authors:Gaole He, Gianluca Demartini, Ujwal Gadiraju
Title: Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant
Abstract:
Since the explosion in popularity of ChatGPT, large language models (LLMs) have continued to impact our everyday lives. Equipped with external tools that are designed for a specific purpose (e.g., for flight booking or an alarm clock), LLM agents exercise an increasing capability to assist humans in their daily work. Although LLM agents have shown a promising blueprint as daily assistants, there is a limited understanding of how they can provide daily assistance based on planning and sequential decision making capabilities. We draw inspiration from recent work that has highlighted the value of 'LLM-modulo' setups in conjunction with humans-in-the-loop for planning tasks. We conducted an empirical study (N = 248) of LLM agents as daily assistants in six commonly occurring tasks with different levels of risk typically associated with them (e.g., flight ticket booking and credit card payments). To ensure user agency and control over the LLM agent, we adopted LLM agents in a plan-then-execute manner, wherein the agents conducted step-wise planning and step-by-step execution in a simulation environment. We analyzed how user involvement at each stage affects their trust and collaborative team performance. Our findings demonstrate that LLM agents can be a double-edged sword -- (1) they can work well when a high-quality plan and necessary user involvement in execution are available, and (2) users can easily mistrust the LLM agents with plans that seem plausible. We synthesized key insights for using LLM agents as daily assistants to calibrate user trust and achieve better overall task outcomes. Our work has important implications for the future design of daily assistants and human-AI collaboration with LLM agents.

Authors:Anastasiia Birillo, Ilya Vlasov, Katsiaryna Dzialets, Hieke Keuning, Timofey Bryksin
Title: In-IDE Programming Courses: Learning Software Development in a Real-World Setting
Abstract:
While learning programming languages is crucial for software engineers, mastering the necessary tools is equally important. To facilitate this, JetBrains recently released the JetBrains Academy plugin, which customizes the IDE for learners, allowing tutors to create courses entirely within IDE. In this work, we provide the first exploratory study of this learning format. We carried out eight one-hour interviews with students and developers who completed at least one course using the plugin, inquiring about their experience with the format, the used IDE features, and the current shortcomings. Our results indicate that learning inside the IDE is overall welcomed by the learners, allowing them to study in a more realistic setting, using features such as debugging and code analysis, which are crucial for real software development. With the collected results and the analysis of the current drawbacks, we aim to contribute to teaching students more practical skills.

Authors:Avinash Agarwal, Manisha J. Nene
Title: Standardised schema and taxonomy for AI incident databases in critical digital infrastructure
Abstract:
The rapid deployment of Artificial Intelligence (AI) in critical digital infrastructure introduces significant risks, necessitating a robust framework for systematically collecting AI incident data to prevent future incidents. Existing databases lack the granularity as well as the standardized structure required for consistent data collection and analysis, impeding effective incident management. This work proposes a standardized schema and taxonomy for AI incident databases, addressing these challenges by enabling detailed and structured documentation of AI incidents across sectors. Key contributions include developing a unified schema, introducing new fields such as incident severity, causes, and harms caused, and proposing a taxonomy for classifying AI incidents in critical digital infrastructure. The proposed solution facilitates more effective incident data collection and analysis, thus supporting evidence-based policymaking, enhancing industry safety measures, and promoting transparency. This work lays the foundation for a coordinated global response to AI incidents, ensuring trust, safety, and accountability in using AI across regions.

Authors:Naijun Zheng, Xucheng Wan, Kai Liu, Zhou Huan
Title: SCDiar: a streaming diarization system based on speaker change detection and speech recognition
Abstract:
In hours-long meeting scenarios, real-time speech stream often struggles with achieving accurate speaker diarization, commonly leading to speaker identification and speaker count errors. To address this challenge, we propose SCDiar, a system that operates on speech segments, split at the token level by a speaker change detection (SCD) module. Building on these segments, we introduce several enhancements to efficiently select the best available segment for each speaker. These improvements lead to significant gains across various benchmarks. Notably, on real-world meeting data involving more than ten participants, SCDiar outperforms previous systems by up to 53.6\% in accuracy, substantially narrowing the performance gap between online and offline systems.

Authors:Runze Cai, Nuwan Janaka, Hyeongcheol Kim, Yang Chen, Shengdong Zhao, Yun Huang, David Hsu
Title: AiGet: Transforming Everyday Moments into Hidden Knowledge Discovery with AI Assistance on Smart Glasses
Abstract:
Unlike the free exploration of childhood, the demands of daily life reduce our motivation to explore our surroundings, leading to missed opportunities for informal learning. Traditional tools for knowledge acquisition are reactive, relying on user initiative and limiting their ability to uncover hidden interests. Through formative studies, we introduce AiGet, a proactive AI assistant integrated with AR smart glasses, designed to seamlessly embed informal learning into low-demand daily activities (e.g., casual walking and shopping). AiGet analyzes real-time user gaze patterns, environmental context, and user profiles, leveraging large language models to deliver personalized, context-aware knowledge with low disruption to primary tasks. In-lab evaluations and real-world testing, including continued use over multiple days, demonstrate AiGet's effectiveness in uncovering overlooked yet surprising interests, enhancing primary task enjoyment, reviving curiosity, and deepening connections with the environment. We further propose design guidelines for AI-assisted informal learning, focused on transforming everyday moments into enriching learning experiences.

Authors:Pascal J. Sager, Benjamin Meyer, Peng Yan, Rebekka von Wartburg-Kottler, Layan Etaiwi, Aref Enayati, Gabriel Nobel, Ahmed Abdulkadir, Benjamin F. Grewe, Thilo Stadelmann
Title: A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions
Abstract:
Agents for computer use (ACUs) are an emerging class of systems capable of executing complex tasks on digital devices - such as desktops, mobile phones, and web platforms - given instructions in natural language. These agents can automate tasks by controlling software via low-level actions like mouse clicks and touchscreen gestures. However, despite rapid progress, ACUs are not yet mature for everyday use. In this survey, we investigate the state-of-the-art, trends, and research gaps in the development of practical ACUs. We provide a comprehensive review of the ACU landscape, introducing a unifying taxonomy spanning three dimensions: (I) the domain perspective, characterizing agent operating contexts; (II) the interaction perspective, describing observation modalities (e.g., screenshots, HTML) and action modalities (e.g., mouse, keyboard, code execution); and (III) the agent perspective, detailing how agents perceive, reason, and learn. We review 87 ACUs and 33 datasets across foundation model-based and classical approaches through this taxonomy. Our analysis identifies six major research gaps: insufficient generalization, inefficient learning, limited planning, low task complexity in benchmarks, non-standardized evaluation, and a disconnect between research and practical conditions. To address these gaps, we advocate for: (a) vision-based observations and low-level control to enhance generalization; (b) adaptive learning beyond static prompting; (c) effective planning and reasoning methods and models; (d) benchmarks that reflect real-world task complexity; (e) standardized evaluation based on task success; (f) aligning agent design with real-world deployment constraints. Together, our taxonomy and analysis establish a foundation for advancing ACU research toward general-purpose agents for robust and scalable computer use.

Authors:Vincent Freiberger, Arthur Fleig, Erik Buchmann
Title: PRISMe: A Novel LLM-Powered Tool for Interactive Privacy Policy Assessment
Abstract:
Protecting online privacy requires users to engage with and comprehend website privacy policies, but many policies are difficult and tedious to read. We present PRISMe (Privacy Risk Information Scanner for Me), a novel Large Language Model (LLM)-driven privacy policy assessment tool, which helps users to understand the essence of a lengthy, complex privacy policy while browsing. The tool, a browser extension, integrates a dashboard and an LLM chat. One major contribution is the first rigorous evaluation of such a tool. In a mixed-methods user study (N=22), we evaluate PRISMe's efficiency, usability, understandability of the provided information, and impacts on awareness. While our tool improves privacy awareness by providing a comprehensible quick overview and a quality chat for in-depth discussion, users note issues with consistency and building trust in the tool. From our insights, we derive important design implications to guide future policy analysis tools.

Authors:Sanjeev Nahulanthran, Leimin Tian, Dana Kulić, Mor Vered
Title: Explaining Facial Expression Recognition
Abstract:
Facial expression recognition (FER) has emerged as a promising approach to the development of emotion-aware intelligent agents and systems. However, key challenges remain in utilizing FER in real-world contexts, including ensuring user understanding and establishing a suitable level of user trust. We developed a novel explanation method utilizing Facial Action Units (FAUs) to explain the output of a FER model through both textual and visual modalities. We conducted an empirical user study evaluating user understanding and trust, comparing our approach to state-of-the-art eXplainable AI (XAI) methods. Our results indicate that visual AND textual as well as textual-only FAU-based explanations resulted in better user understanding of the FER model. We also show that all modalities of FAU-based methods improved appropriate trust of the users towards the FER model.

Authors:Hua Shen, Nicholas Clark, Tanushree Mitra
Title: Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?
Abstract:
Existing research primarily evaluates the values of LLMs by examining their stated inclinations towards specific values. However, the "Value-Action Gap," a phenomenon rooted in environmental and social psychology, reveals discrepancies between individuals' stated values and their actions in real-world contexts. To what extent do LLMs exhibit a similar gap between their stated values and their actions informed by those values? This study introduces ValueActionLens, an evaluation framework to assess the alignment between LLMs' stated values and their value-informed actions. The framework encompasses the generation of a dataset comprising 14.8k value-informed actions across twelve cultures and eleven social topics, and two tasks to evaluate how well LLMs' stated value inclinations and value-informed actions align across three different alignment measures. Extensive experiments reveal that the alignment between LLMs' stated values and actions is sub-optimal, varying significantly across scenarios and models. Analysis of misaligned results identifies potential harms from certain value-action gaps. To predict the value-action gaps, we also uncover that leveraging reasoned explanations improves performance. These findings underscore the risks of relying solely on the LLMs' stated values to predict their behaviors and emphasize the importance of context-aware evaluations of LLM values and value-action gaps.

Authors:Xinyu Zhang, Tyler Estro, Geoff Kuenning, Erez Zadok, Klaus Mueller
Title: Into the Void: Mapping the Unseen Gaps in High Dimensional Data
Abstract:
We present a comprehensive pipeline, augmented by a visual analytics system named ``GapMiner'', that is aimed at exploring and exploiting untapped opportunities within the empty areas of high-dimensional datasets. Our approach begins with an initial dataset and then uses a novel Empty Space Search Algorithm (ESA) to identify the center points of these uncharted voids, which are regarded as reservoirs containing potentially valuable novel configurations. Initially, this process is guided by user interactions facilitated by GapMiner. GapMiner visualizes the Empty Space Configurations (ESC) identified by the search within the context of the data, enabling domain experts to explore and adjust ESCs using a linked parallel-coordinate display. These interactions enhance the dataset and contribute to the iterative training of a connected deep neural network (DNN). As the DNN trains, it gradually assumes the task of identifying high-potential ESCs, diminishing the need for direct user involvement. Ultimately, once the DNN achieves adequate accuracy, it autonomously guides the exploration of optimal configurations by predicting performance and refining configurations, using a combination of gradient ascent and improved empty-space searches. Domain users were actively engaged throughout the development of our system. Our findings demonstrate that our methodology consistently produces substantially superior novel configurations compared to conventional randomization-based methods. We illustrate the effectiveness of our method through several case studies addressing various objectives, including parameter optimization, adversarial learning, and reinforcement learning.

Authors:Ru Wang, Ruijia Chen, Anqiao Erica Cai, Zhiyuan Li, Sanbrita Mondal, Yuhang Zhao
Title: Characterizing Visual Intents for People with Low Vision through Eye Tracking
Abstract:
Accessing visual information is crucial yet challenging for people with low vision due to visual conditions like low visual acuity and limited visual fields. However, unlike blind people, low vision people have and prefer using their functional vision in daily tasks. Gaze patterns thus become an important indicator to uncover their visual challenges and intents, inspiring more adaptive visual support. We seek to deeply understand low vision users' gaze behaviors in different image-viewing tasks, characterizing typical visual intents and the unique gaze patterns exhibited by people with different low vision conditions. We conducted a retrospective think-aloud study using eye tracking with 20 low vision participants and 20 sighted controls. Participants completed various image-viewing tasks and watched the playback of their gaze trajectories to reflect on their visual experiences. Based on the study, we derived a visual intent taxonomy with five visual intents characterized by participants' gaze behaviors. We demonstrated the difference between low vision and sighted participants' gaze behaviors and how visual ability affected low vision participants' gaze patterns across visual intents. Our findings underscore the importance of combining visual ability information, visual context, and eye tracking data in visual intent recognition, setting up a foundation for intent-aware assistive technologies for low vision people.

Authors:Farhana Shahid, Mona Elswah, Aditya Vashistha
Title: Think Outside the Data: Colonial Biases and Systemic Issues in Automated Moderation Pipelines for Low-Resource Languages
Abstract:
Most social media users come from the Global South, where harmful content usually appears in local languages. Yet, AI-driven moderation systems struggle with low-resource languages spoken in these regions. Through semi-structured interviews with 22 AI experts working on harmful content detection in four low-resource languages: Tamil (South Asia), Swahili (East Africa), Maghrebi Arabic (North Africa), and Quechua (South America)--we examine systemic issues in building automated moderation tools for these languages. Our findings reveal that beyond data scarcity, socio-political factors such as tech companies' monopoly on user data and lack of investment in moderation for low-profit Global South markets exacerbate historic inequities. Even if more data were available, the English-centric and data-intensive design of language models and preprocessing techniques overlooks the need to design for morphologically complex, linguistically diverse, and code-mixed languages. We argue these limitations are not just technical gaps caused by "data scarcity" but reflect structural inequities, rooted in colonial suppression of non-Western languages. We discuss multi-stakeholder approaches to strengthen local research capacity, democratize data access, and support language-aware solutions to improve automated moderation for low-resource languages.

Authors:Félix Buendía, Joaquín Gayoso-Cabada, José-Luis Sierra
Title: Generation of reusable learning objects from digital medical collections: An analysis based on the MASMDOA framework
Abstract:
Learning Objects represent a widespread approach to structuring instructional materials in a large variety of educational contexts. The main aim of this work consists of analyzing from a qualitative point of view the process of generating reusable learning objects (RLOs) followed by Clavy, a tool that can be used to retrieve data from multiple medical knowledge sources and reconfigure such sources in diverse multimedia-based structures and organizations. From these organizations, Clavy is able to generate learning objects which can be adapted to various instructional healthcare scenarios with several types of user profiles and distinct learning requirements. Moreover, Clavy provides the capability of exporting these learning objects through educational standard specifications, which improves their reusability features. The analysis insights highlight the importance of having a tool able to transfer knowledge from the available digital medical collections to learning objects that can be easily accessed by medical students and healthcare practitioners through the most popular e-learning platforms.

Authors:Ru Wang, Kexin Zhang, Yuqing Wang, Keri Brown, Yuhang Zhao
Title: "It was Mentally Painful to Try and Stop": Design Opportunities for Just-in-Time Interventions for People with Obsessive-Compulsive Disorder in the Real World
Abstract:
Obsessive-compulsive disorder (OCD) is a mental health condition that significantly impacts people's quality of life. While evidence-based therapies such as exposure and response prevention (ERP) can be effective, managing OCD symptoms in everyday life -- an essential part of treatment and independent living -- remains challenging due to fear confrontation and lack of appropriate support. To better understand the challenges and needs in OCD self-management, we conducted interviews with 10 participants with diverse OCD conditions and seven therapists specializing in OCD treatment. Through these interviews, we explored the characteristics of participants' triggers and how they shaped their compulsions, and uncovered key coping strategies across different stages of OCD episodes. Our findings highlight critical gaps between OCD self-management needs and currently available support. Building on these insights, we propose design opportunities for just-in-time self-management technologies for OCD, including personalized symptom tracking, just-in-time interventions, and support for OCD-specific privacy and social needs -- through technology and beyond.

Authors:Hanxiu 'Hazel' Zhu, Avanthika Senthil Kumar, Sihang Zhao, Ru Wang, Xin Tong, Yuhang Zhao
Title: Characterizing Collective Efforts in Content Sharing and Quality Control for ADHD-relevant Content on Video-sharing Platforms
Abstract:
Video-sharing platforms (VSPs) have become increasingly important for individuals with ADHD to recognize symptoms, acquire knowledge, and receive support. While videos offer rich information and high engagement, they also present unique challenges, such as information quality and accessibility issues to users with ADHD. However, little work has thoroughly examined the video content quality and accessibility issues, the impact, and the control strategies in the ADHD community. We fill this gap by systematically collecting 373 ADHD-relevant videos with comments from YouTube and TikTok and analyzing the data with a mixed method. Our study identified the characteristics of ADHD-relevant videos on VSPs (e.g., creator types, video presentation forms, quality issues) and revealed the collective efforts of creators and viewers in video quality control, such as authority building, collective quality checking, and accessibility improvement. We further derive actionable design implications for VSPs to offer more reliable and ADHD-friendly contents.

Authors:Jingshu Li, Yitian Yang, Q. Vera Liao, Junti Zhang, Yi-Chieh Lee
Title: As Confidence Aligns: Exploring the Effect of AI Confidence on Human Self-confidence in Human-AI Decision Making
Abstract:
Complementary collaboration between humans and AI is essential for human-AI decision making. One feasible approach to achieving it involves accounting for the calibrated confidence levels of both AI and users. However, this process would likely be made more difficult by the fact that AI confidence may influence users' self-confidence and its calibration. To explore these dynamics, we conducted a randomized behavioral experiment. Our results indicate that in human-AI decision-making, users' self-confidence aligns with AI confidence and such alignment can persist even after AI ceases to be involved. This alignment then affects users' self-confidence calibration. We also found the presence of real-time correctness feedback of decisions reduced the degree of alignment. These findings suggest that users' self-confidence is not independent of AI confidence, which practitioners aiming to achieve better human-AI collaboration need to be aware of. We call for research focusing on the alignment of human cognition and behavior with AI.

Authors:Safayat Bin Hakim, Muhammad Adil, Alvaro Velasquez, Houbing Herbert Song
Title: ANSR-DT: An Adaptive Neuro-Symbolic Learning and Reasoning Framework for Digital Twins
Abstract:
In this paper, we propose an Adaptive Neuro-Symbolic Learning and Reasoning Framework for digital twin technology called ``ANSR-DT." Digital twins in industrial environments often struggle with interpretability, real-time adaptation, and human input integration. Our approach addresses these challenges by combining CNN-LSTM dynamic event detection with reinforcement learning and symbolic reasoning to enable adaptive intelligence with interpretable decision processes. This integration enhances environmental understanding while promoting continuous learning, leading to more effective real-time decision-making in human-machine collaborative applications. We evaluated ANSR-DT on synthetic industrial data, observing significant improvements over traditional approaches, with up to 99.5% accuracy for dynamic pattern recognition. The framework demonstrated superior adaptability with extended reinforcement learning training, improving explained variance from 0.447 to 0.547. Future work aims at scaling to larger datasets to test rule management beyond the current 14 rules. Our open-source implementation promotes reproducibility and establishes a foundation for future research in adaptive, interpretable digital twins for industrial applications.

Authors:Hashini Senaratne, Leimin Tian, Pavan Sikka, Jason Williams, David Howard, Dana Kulić, Cécile Paris
Title: A Framework for Dynamic Situational Awareness in Human Robot Teams: An Interview Study
Abstract:
In human-robot teams, human situational awareness is the operator's conscious knowledge of the team's states, actions, plans and their environment. Appropriate human situational awareness is critical to successful human-robot collaboration. In human-robot teaming, it is often assumed that the best and required level of situational awareness is knowing everything at all times. This view is problematic, because what a human needs to know for optimal team performance varies given the dynamic environmental conditions, task context and roles and capabilities of team members. We explore this topic by interviewing 16 participants with active and repeated experience in diverse human-robot teaming applications. Based on analysis of these interviews, we derive a framework explaining the dynamic nature of required situational awareness in human-robot teaming. In addition, we identify a range of factors affecting the dynamic nature of required and actual levels of situational awareness (i.e., dynamic situational awareness), types of situational awareness inefficiencies resulting from gaps between actual and required situational awareness, and their main consequences. We also reveal various strategies, initiated by humans and robots, that assist in maintaining the required situational awareness. Our findings inform the implementation of accurate estimates of dynamic situational awareness and the design of user-adaptive human-robot interfaces. Therefore, this work contributes to the future design of more collaborative and effective human-robot teams.

Authors:Haoxiang Yu, Javier Berrocal, Christine Julien
Title: ML Mule: Mobile-Driven Context-Aware Collaborative Learning
Abstract:
Artificial intelligence has been integrated into nearly every aspect of daily life, powering applications from object detection with computer vision to large language models for writing emails and compact models for use in smart homes. These machine learning models at times cater to the needs of individual users but are often detached from them, as they are typically stored and processed in centralized data centers. This centralized approach raises privacy concerns, incurs high infrastructure costs, and struggles to provide real time, personalized experiences. Federated and fully decentralized learning methods have been proposed to address these issues, but they still depend on centralized servers or face slow convergence due to communication constraints. We propose ML Mule, an approach that utilizes individual mobile devices as 'mules' to train and transport model snapshots as the mules move through physical spaces, sharing these models with the physical 'spaces' the mules inhabit. This method implicitly forms affinity groups among devices associated with users who share particular spaces, enabling collaborative model evolution and protecting users' privacy. Our approach addresses several major shortcomings of traditional, federated, and fully decentralized learning systems. ML Mule represents a new class of machine learning methods that are more robust, distributed, and personalized, bringing the field closer to realizing the original vision of intelligent, adaptive, and genuinely context-aware smart environments. Our results show that ML Mule converges faster and achieves higher model accuracy compared to other existing methods.

Authors:Ross Henry, Martin Huber, Anestis Mablekos-Alexiou, Carlo Seneci, Mohamed Abdelaziz, Hans Natalius, Lyndon da Cruz, Christos Bergeles
Title: Evaluating Robotic Approach Techniques for the Insertion of a Straight Instrument into a Vitreoretinal Surgery Trocar
Abstract:
Advances in vitreoretinal robotic surgery enable precise techniques for gene therapies. This study evaluates three robotic approaches using the 7-DoF robotic arm for docking a micro-precise tool to a trocar: fully co-manipulated, hybrid co-manipulated/teleoperated, and hybrid with camera assistance. The fully co-manipulated approach was the fastest but had a 42% success rate. Hybrid methods showed higher success rates (91.6% and 100%) and completed tasks within 2 minutes. NASA Task Load Index (TLX) assessments indicated lower physical demand and effort for hybrid approaches.

Authors:Stephane Hatgis-Kessell, W. Bradley Knox, Serena Booth, Scott Niekum, Peter Stone
Title: Influencing Humans to Conform to Preference Models for RLHF
Abstract:
Designing a reinforcement learning from human feedback (RLHF) algorithm to approximate a human's unobservable reward function requires assuming, implicitly or explicitly, a model of human preferences. A preference model that poorly describes how humans generate preferences risks learning a poor approximation of the human's reward function. In this paper, we conduct three human studies to asses whether one can influence the expression of real human preferences to more closely conform to a desired preference model. Importantly, our approach does not seek to alter the human's unobserved reward function. Rather, we change how humans use this reward function to generate preferences, such that they better match whatever preference model is assumed by a particular RLHF algorithm. We introduce three interventions: showing humans the quantities that underlie a preference model, which is normally unobservable information derived from the reward function; training people to follow a specific preference model; and modifying the preference elicitation question. All intervention types show significant effects, providing practical tools to improve preference data quality and the resultant alignment of the learned reward functions. Overall we establish a novel research direction in model alignment: designing interfaces and training interventions to increase human conformance with the modeling assumptions of the algorithm that will learn from their input.

Authors:Giulio Antonio Abbo, Tony Belpaeme, Micol Spitale
Title: Concerns and Values in Human-Robot Interactions: A Focus on Social Robotics
Abstract:
Robots, as AI with physical instantiation, inhabit our social and physical world, where their actions have both social and physical consequences, posing challenges for researchers when designing social robots. This study starts with a scoping review to identify discussions and potential concerns arising from interactions with robotic systems. Two focus groups of technology ethics experts then validated a comprehensive list of key topics and values in human-robot interaction (HRI) literature. These insights were integrated into the HRI Value Compass web tool, to help HRI researchers identify ethical values in robot design. The tool was evaluated in a pilot study. This work benefits the HRI community by highlighting key concerns in human-robot interactions and providing an instrument to help researchers design robots that align with human values, ensuring future robotic systems adhere to these values in social applications.

Authors:Jianchao Lu, Yuzhe Tian, Yang Zhang, Quan Z. Sheng, Xi Zheng
Title: LGL-BCI: A Motor-Imagery-Based Brain-Computer Interface with Geometric Learning
Abstract:
Brain--computer interfaces are groundbreaking technology whereby brain signals are used to control external devices. Despite some advances in recent years, electroencephalogram (EEG)-based motor-imagery tasks face challenges, such as amplitude and phase variability and complex spatial correlations, with a need for smaller models and faster inference. In this study, we develop a prototype, called the Lightweight Geometric Learning Brain--Computer Interface (LGL-BCI), which uses our customized geometric deep learning architecture for swift model inference without sacrificing accuracy. LGL-BCI contains an EEG channel selection module via a feature decomposition algorithm to reduce the dimensionality of a symmetric positive definite matrix, providing adaptiveness among the continuously changing EEG signal. Meanwhile, a built-in lossless transformation helps boost the inference speed. The performance of our solution was evaluated using two real-world EEG devices and two public EEG datasets. LGL-BCI demonstrated significant improvements, achieving an accuracy of 82.54% compared to 62.22% for the state-of-the-art approach. Furthermore, LGL-BCI uses fewer parameters (64.9K vs. 183.7K), highlighting its computational efficiency. These findings underscore both the superior accuracy and computational efficiency of LGL-BCI, demonstrating the feasibility and robustness of geometric deep learning in motor-imagery brain--computer interface applications.

Authors:Gourav Siddhad, Juhi Singh, Partha Pratim Roy
Title: MECASA: Motor Execution Classification using Additive Self-Attention for Hybrid EEG-fNIRS Data
Abstract:
Motor execution, a fundamental aspect of human behavior, has been extensively studied using BCI technologies. EEG and fNIRS have been utilized to provide valuable insights, but their individual limitations have hindered performance. This study investigates the effectiveness of fusing electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) data for classifying rest versus task states in a motor execution paradigm. Using the SMR Hybrid BCI dataset, this work compares unimodal (EEG and fNIRS) classifiers with a multimodal fusion approach. It proposes Motor Execution using Convolutional Additive Self-Attention Mechanisms (MECASA), a novel architecture leveraging convolutional operations and self-attention to capture complex patterns in multimodal data. MECASA, built upon the CAS-ViT architecture, employs a computationally efficient, convolutional-based self-attention module (CASA), a hybrid block design, and a dedicated fusion network to combine features from separate EEG and fNIRS processing streams. Experimental results demonstrate that MECASA consistently outperforms established methods across all modalities (EEG, fNIRS, and fused), with fusion consistently improving accuracy compared to single-modality approaches. fNIRS generally achieved higher accuracy than EEG alone. Ablation studies revealed optimal configurations for MECASA, with embedding dimensions of 64-128 providing the best performance for EEG data and OD128 (upsampled optical density) yielding superior results for fNIRS data. This work highlights the potential of deep learning, specifically MECASA, to enhance EEG-fNIRS fusion for BCI applications.

Authors:Giulio Antonio Abbo, Gloria Desideri, Tony Belpaeme, Micol Spitale
Title: "Can you be my mum?": Manipulating Social Robots in the Large Language Models Era
Abstract:
Recent advancements in robots powered by large language models have enhanced their conversational abilities, enabling interactions closely resembling human dialogue. However, these models introduce safety and security concerns in HRI, as they are vulnerable to manipulation that can bypass built-in safety measures. Imagining a social robot deployed in a home, this work aims to understand how everyday users try to exploit a language model to violate ethical principles, such as by prompting the robot to act like a life partner. We conducted a pilot study involving 21 university students who interacted with a Misty robot, attempting to circumvent its safety mechanisms across three scenarios based on specific HRI ethical principles: attachment, freedom, and empathy. Our results reveal that participants employed five techniques, including insulting and appealing to pity using emotional language. We hope this work can inform future research in designing strong safeguards to ensure ethical and secure human-robot interactions.

Authors:Krisztian Balog, ChengXiang Zhai
Title: User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation
Abstract:
User simulation is an emerging interdisciplinary topic with multiple critical applications in the era of Generative AI. It involves creating an intelligent agent that mimics the actions of a human user interacting with an AI system, enabling researchers to model and analyze user behaviour, generate synthetic data for training, and evaluate interactive AI systems in a controlled and reproducible manner. User simulation has profound implications for diverse fields and plays a vital role in the pursuit of Artificial General Intelligence. This paper provides an overview of user simulation, highlighting its key applications, connections to various disciplines, and outlining future research directions to advance this increasingly important technology.

Authors:Giulio Antonio Abbo, Tony Belpaeme
Title: Vision Language Models as Values Detectors
Abstract:
Large Language Models integrating textual and visual inputs have introduced new possibilities for interpreting complex data. Despite their remarkable ability to generate coherent and contextually relevant text based on visual stimuli, the alignment of these models with human perception in identifying relevant elements in images requires further exploration. This paper investigates the alignment between state-of-the-art LLMs and human annotators in detecting elements of relevance within home environment scenarios. We created a set of twelve images depicting various domestic scenarios and enlisted fourteen annotators to identify the key element in each image. We then compared these human responses with outputs from five different LLMs, including GPT-4o and four LLaVA variants. Our findings reveal a varied degree of alignment, with LLaVA 34B showing the highest performance but still scoring low. However, an analysis of the results highlights the models' potential to detect value-laden elements in images, suggesting that with improved training and refined prompts, LLMs could enhance applications in social robotics, assistive technologies, and human-computer interaction by providing deeper insights and more contextually relevant responses.

Authors:Xujin Li, Wei Wei, Kun Zhao, Jiayu Mao, Yizhuo Lu, Shuang Qiu, Huiguang He
Title: Exploring EEG and Eye Movement Fusion for Multi-Class Target RSVP-BCI
Abstract:
Rapid Serial Visual Presentation (RSVP)-based Brain-Computer Interfaces (BCIs) facilitate high-throughput target image detection by identifying event-related potentials (ERPs) evoked in EEG signals. The RSVP-BCI systems effectively detect single-class targets within a stream of images but have limited applicability in scenarios that require detecting multiple target categories. Multi-class RSVP-BCI systems address this limitation by simultaneously identifying the presence of a target and distinguishing its category. However, existing multi-class RSVP decoding algorithms predominantly rely on single-modality EEG decoding, which restricts their performance improvement due to the high similarity between ERPs evoked by different target categories. In this work, we introduce eye movement (EM) modality into multi-class RSVP decoding and explore EEG and EM fusion to enhance decoding performance. First, we design three independent multi-class target RSVP tasks and build an open-source dataset comprising EEG and EM signals from 43 subjects. Then, we propose the Multi-class Target RSVP EEG and EM fusion Network (MTREE-Net) to enhance multi-class RSVP decoding. Specifically, a dual-complementary module is proposed to strengthen the differentiation of uni-modal features across categories. To improve multi-modal fusion performance, we adopt a dynamic reweighting fusion strategy guided by theoretically derived modality contribution ratios. Furthermore, we reduce the misclassification of non-target samples through knowledge transfer between two hierarchical classifiers. Extensive experiments demonstrate the feasibility of integrating EM signals into multi-class RSVP decoding and highlight the superior performance of MTREE-Net compared to existing RSVP decoding methods. The proposed MTREE-Net and open-source dataset provide a promising framework for developing practical multi-class RSVP-BCI systems.

Authors:Yue Yu, Yifang Wang, Yongjun Zhang, Huamin Qu, Dongyu Liu
Title: InclusiViz: Visual Analytics of Human Mobility Data for Understanding and Mitigating Urban Segregation
Abstract:
Urban segregation refers to the physical and social division of people, often driving inequalities within cities and exacerbating socioeconomic and racial tensions. While most studies focus on residential spaces, they often neglect segregation across "activity spaces" where people work, socialize, and engage in leisure. Human mobility data offers new opportunities to analyze broader segregation patterns, encompassing both residential and activity spaces, but challenges existing methods in capturing the complexity and local nuances of urban segregation. This work introduces InclusiViz, a novel visual analytics system for multi-level analysis of urban segregation, facilitating the development of targeted, data-driven interventions. Specifically, we developed a deep learning model to predict mobility patterns across social groups using environmental features, augmented with explainable AI to reveal how these features influence segregation. The system integrates innovative visualizations that allow users to explore segregation patterns from broad overviews to fine-grained detail and evaluate urban planning interventions with real-time feedback. We conducted a quantitative evaluation to validate the model's accuracy and efficiency. Two case studies and expert interviews with social scientists and urban analysts demonstrated the system's effectiveness, highlighting its potential to guide urban planning toward more inclusive cities.

Authors:Xianrong Yao, Lingde Hu, Yincheng Jin, Yang Gao, Zhanpeng Jin
Title: IMUFace: Real-Time, Low-Power, Continuous 3D Facial Reconstruction Through Earphones
Abstract:
The potential of facial expression reconstruction technology is significant, with applications in various fields such as human-computer interaction, affective computing, and virtual reality. Recent studies have proposed using ear-worn devices for facial expression reconstruction to address the environmental limitations and privacy concerns associated with traditional camera-based methods. However, these approaches still require improvements in terms of aesthetics and power consumption. This paper introduces a system called IMUFace. It uses inertial measurement units (IMUs) embedded in wireless earphones to detect subtle ear movements caused by facial muscle activities, allowing for covert and low-power facial reconstruction. A user study involving 12 participants was conducted, and a deep learning model named IMUTwinTrans was proposed. The results show that IMUFace can accurately predict users' facial landmarks with a precision of 2.21 mm, using only five minutes of training data. The predicted landmarks can be utilized to reconstruct a three-dimensional facial model. IMUFace operates at a sampling rate of 30 Hz with a relatively low power consumption of 58 mW. The findings presented in this study demonstrate the real-world applicability of IMUFace and highlight potential directions for further research to facilitate its practical adoption.

Authors:Tian Zheng, Xurong Xie, Xiaolan Peng, Hui Chen, Feng Tian
Title: Alzheimer's disease detection based on large language model prompt engineering
Abstract:
In light of the growing proportion of older individuals in our society, the timely diagnosis of Alzheimer's disease has become a crucial aspect of healthcare. In this paper, we propose a non-invasive and cost-effective detection method based on speech technology. The method employs a pre-trained language model in conjunction with techniques such as prompt fine-tuning and conditional learning, thereby enhancing the accuracy and efficiency of the detection process. To address the issue of limited computational resources, this study employs the efficient LORA fine-tuning method to construct the classification model. Following multiple rounds of training and rigorous 10-fold cross-validation, the prompt fine-tuning strategy based on the LLAMA2 model demonstrated an accuracy of 81.31\%, representing a 4.46\% improvement over the control group employing the BERT model. This study offers a novel technical approach for the early diagnosis of Alzheimer's disease and provides valuable insights into model optimization and resource utilization under similar conditions. It is anticipated that this method will prove beneficial in clinical practice and applied research, facilitating more accurate and efficient screening and diagnosis of Alzheimer's disease.

Authors:Riccardo Drago, Yotam Sechayk, Mustafa Doga Dogan, Andrea Sanna, Takeo Igarashi
Title: ImprovMate: Multimodal AI Assistant for Improv Actor Training
Abstract:
Improvisation training for actors presents unique challenges, particularly in maintaining narrative coherence and managing cognitive load during performances. Previous research on AI in improvisation performance often predates advances in large language models (LLMs) and relies on human intervention. We introduce ImprovMate, which leverages LLMs as GPTs to automate the generation of narrative stimuli and cues, allowing actors to focus on creativity without keeping track of plot or character continuity. Based on insights from professional improvisers, ImprovMate incorporates exercises that mimic live training, such as abrupt story resolution and reactive thinking exercises, while maintaining coherence via reference tables. By balancing randomness and structured guidance, ImprovMate provides a groundbreaking tool for improv training. Our pilot study revealed that actors might embrace AI techniques if the latter mirrors traditional practices, and appreciate the fresh twist introduced by our approach with the AI-generated cues.

Authors:David Porfirio, Vincent Hsiao, Morgan Fine-Morris, Leslie Smith, Laura M. Hiatt
Title: Bootstrapping Human-Like Planning via LLMs
Abstract:
Robot end users increasingly require accessible means of specifying tasks for robots to perform. Two common end-user programming paradigms include drag-and-drop interfaces and natural language programming. Although natural language interfaces harness an intuitive form of human communication, drag-and-drop interfaces enable users to meticulously and precisely dictate the key actions of the robot's task. In this paper, we investigate the degree to which both approaches can be combined. Specifically, we construct a large language model (LLM)-based pipeline that accepts natural language as input and produces human-like action sequences as output, specified at a level of granularity that a human would produce. We then compare these generated action sequences to another dataset of hand-specified action sequences. Although our results reveal that larger models tend to outperform smaller ones in the production of human-like action sequences, smaller models nonetheless achieve satisfactory performance.

Authors:Tianrun Qiu, Yuxin Ma
Title: AnyAni: An Interactive System with Generative AI for Animation Effect Creation and Code Understanding in Web Development
Abstract:
Generative AI assistants have been widely used in front-end programming. However, besides code writing, developers often encounter the need to generate animation effects. As novices in creative design without the assistance of professional designers, developers typically face difficulties in describing, designing, and implementing desired animations. To address this issue, we conducted a formative study (N=6) to identify the challenges that code developers face when dealing with animation design issues. Then, we introduce AnyAni, a human-AI collaborative system that supports front-end developers in the ideation, manipulation, and implementation of animation effects. The system combines the assistance of generative AI in creative design by adopting a nonlinear workflow for iterative animation development. In addition, developers can understand and learn the code generated for implementing animations through various interactive methods. A user study (N=9) demonstrated the usability of AnyAni in animation effect creation support for developers.

Authors:Oliver Huang, Carolina Nobre
Title: ViStruct: Simulating Expert-Like Reasoning Through Task Decomposition and Visual Attention Cues
Abstract:
Data visualization tasks often require multi-step reasoning, and the interpretive strategies experts use, such as decomposing complex goals into smaller subtasks and selectively attending to key chart regions are rarely made explicit. ViStruct is an automated pipeline that simulates these expert behaviours by breaking high-level visual questions into structured analytic steps and highlighting semantically relevant chart areas. Leveraging large language and vision-language models, ViStruct identifies chart components, maps subtasks to spatial regions, and presents visual attention cues to externalize expert-like reasoning flows. While not designed for direct novice instruction, ViStruct provides a replicable model of expert interpretation that can inform the development of future visual literacy tools. We evaluate the system on 45 tasks across 12 chart types and validate its outputs with trained visualization users, confirming its ability to produce interpretable and expert-aligned reasoning sequences.

Authors:Saloni Singh, Koen Hindriks, Dirk Heylen, Kim Baraka
Title: A Systematic Review of Human-AI Co-Creativity
Abstract:
The co creativity community is making significant progress in developing more sophisticated and tailored systems to support and enhance human creativity. Design considerations from prior work can serve as a valuable and efficient foundation for future systems. To support this effort, we conducted a systematic literature review of 62 papers on co-creative systems. These papers cover a diverse range of applications, including visual arts, design, and writing, where the AI acts not just as a tool but as an active collaborator in the creative process. From this review, we identified several key dimensions relevant to system design: phase of the creative process, creative task, proactive behavior of the system, user control, system embodiment, and AI model type. Our findings suggest that systems offering high user control lead to greater satisfaction, trust, and a stronger sense of ownership over creative outcomes. Furthermore, proactive systems, when adaptive and context sensitive, can enhance collaboration. We also extracted 24 design considerations, highlighting the value of encouraging users to externalize their thoughts and of increasing the system's social presence and transparency to foster trust. Despite recent advancements, important gaps remain, such as limited support for early creative phases like problem clarification, and challenges related to user adaptation to AI systems.

Authors:Sachin R. Pendse, Ben Rochford, Neha Kumar, Munmun De Choudhury
Title: The Role of Partisan Culture in Mental Health Language Online
Abstract:
The impact of culture on how people express distress in online support communities is increasingly a topic of interest within Computer Supported Cooperative Work (CSCW) and Human-Computer Interaction (HCI). In the United States, distinct cultures have emerged from each of the two dominant political parties, forming a primary lens by which people navigate online and offline worlds. We examine whether partisan culture may play a role in how U.S. Republican and Democrat users of online mental health support communities express distress. We present a large-scale observational study of 2,184,356 posts from 8,916 statistically matched Republican, Democrat, and unaffiliated online support community members. We utilize methods from causal inference to statistically match partisan users along covariates that correspond with demographic attributes and platform use, in order to create comparable cohorts for analysis. We then leverage methods from natural language processing to understand how partisan expressions of distress compare between these sets of closely matched opposing partisans, and between closely matched partisans and typical support community members. Our data spans January 2013 to December 2022, a period of both rising political polarization and mental health concerns. We find that partisan culture does play into expressions of distress, underscoring the importance of considering partisan cultural differences in the design of online support community platforms.

Authors:Andrea Bussolan, Oliver Avram, Andrea Pignata, Gianvito Urgese, Stefano Baraldo, Anna Valente
Title: Personalized Mental State Evaluation in Human-Robot Interaction using Federated Learning
Abstract:
With the advent of Industry 5.0, manufacturers are increasingly prioritizing worker well-being alongside mass customization. Stress-aware Human-Robot Collaboration (HRC) plays a crucial role in this paradigm, where robots must adapt their behavior to human mental states to improve collaboration fluency and safety. This paper presents a novel framework that integrates Federated Learning (FL) to enable personalized mental state evaluation while preserving user privacy. By leveraging physiological signals, including EEG, ECG, EDA, EMG, and respiration, a multimodal model predicts an operator's stress level, facilitating real-time robot adaptation. The FL-based approach allows distributed on-device training, ensuring data confidentiality while improving model generalization and individual customization. Results demonstrate that the deployment of an FL approach results in a global model with performance in stress prediction accuracy comparable to a centralized training approach. Moreover, FL allows for enhancing personalization, thereby optimizing human-robot interaction in industrial settings, while preserving data privacy. The proposed framework advances privacy-preserving, adaptive robotics to enhance workforce well-being in smart manufacturing.

Authors:Tobias Weinberg, Claire O'Connor, Ricardo E. Gonzalez Penuela, Stephanie Valencia, Thijs Roumen
Title: One Does Not Simply 'Mm-hmm': Exploring Backchanneling in the AAC Micro-Culture
Abstract:
Backchanneling (e.g., "uh-huh", "hmm", a simple nod) encompasses a big part of everyday communication; it is how we negotiate the turn to speak, it signals our engagement, and shapes the flow of our conversations. For people with speech and motor impairments, backchanneling is limited to a reduced set of modalities, and their Augmentative and Alternative Communication (AAC) technology requires visual attention, making it harder to observe non-verbal cues of conversation partners. We explore how users of AAC technology approach backchanneling and create their own unique channels and communication culture. We conducted a workshop with 4 AAC users to understand the unique characteristics of backchanneling in AAC. We explored how backchanneling changes when pairs of AAC users communicate vs when an AAC user communicates with a non-AAC user. We contextualize these findings through four in-depth interviews with speech-language pathologists (SLPs). We conclude with a discussion about backchanneling as a micro-cultural practice, rethinking embodiment and mediation in AAC technology, and providing design recommendations for timely multi-modal backchanneling while respecting different communication cultures.

Authors:Soobin Chae, Suhwan Lee, Hanna Hauptmann, Hajo A. Reijers, Xixi Lu
Title: The Role of Explanation Styles and Perceived Accuracy on Decision Making in Predictive Process Monitoring
Abstract:
Predictive Process Monitoring (PPM) often uses deep learning models to predict the future behavior of ongoing processes, such as predicting process outcomes. While these models achieve high accuracy, their lack of interpretability undermines user trust and adoption. Explainable AI (XAI) aims to address this challenge by providing the reasoning behind the predictions. However, current evaluations of XAI in PPM focus primarily on functional metrics (such as fidelity), overlooking user-centered aspects such as their effect on task performance and decision-making. This study investigates the effects of explanation styles (feature importance, rule-based, and counterfactual) and perceived AI accuracy (low or high) on decision-making in PPM. We conducted a decision-making experiment, where users were presented with the AI predictions, perceived accuracy levels, and explanations of different styles. Users' decisions were measured both before and after receiving explanations, allowing the assessment of objective metrics (Task Performance and Agreement) and subjective metrics (Decision Confidence). Our findings show that perceived accuracy and explanation style have a significant effect.

Authors:Reuben Binns, Jake Stein, Siddhartha Datta, Max Van Kleek, Nigel Shadbolt
Title: Not Even Nice Work If You Can Get It; A Longitudinal Study of Uber's Algorithmic Pay and Pricing
Abstract:
Ride-sharing platforms like Uber market themselves as enabling `flexibility' for their workforce, meaning that drivers are expected to anticipate when and where the algorithm will allocate them jobs, and how well remunerated those jobs will be. In this work we describe our process of participatory action research with drivers and trade union organisers, culminating in a participatory audit of Uber's algorithmic pay and work allocation, before and after the introduction of dynamic pricing. Through longitudinal analysis of 1.5 million trips from 258 drivers in the UK, we find that after dynamic pricing, pay has decreased, Uber's cut has increased, job allocation and pay is less predictable, inequality between drivers is increased, and drivers spend more time waiting for jobs. In addition to these findings, we provide methodological and theoretical contributions to algorithm auditing, gig work, and the emerging practice of worker data science.

Authors:Aditya Majumdar, Wenbo Zhang, Kashvi Prawal, Amulya Yadav
Title: The Hardness of Achieving Impact in AI for Social Impact Research: A Ground-Level View of Challenges & Opportunities
Abstract:
In an attempt to tackle the UN SDGs, AI for Social Impact (AI4SI) projects focus on harnessing AI to address societal issues in areas such as healthcare, social justice, etc. Unfortunately, despite growing interest in AI4SI, achieving tangible, on-the-ground impact remains a significant challenge. For example, identifying and engaging motivated collaborators who are willing to co-design and deploy AI based solutions in real-world settings is often difficult. Even when such partnerships are established, many AI4SI projects "fail" to progress beyond the proof-of-concept stage, and hence, are unable to transition to at-scale production-level solutions. Furthermore, the unique challenges faced by AI4SI researchers are not always fully recognized within the broader AI community, where such work is sometimes viewed as primarily applied and not aligning with the traditional criteria for novelty emphasized in core AI venues. This paper attempts to shine a light on the diverse challenges faced in AI4SI research by diagnosing a multitude of factors that prevent AI4SI partnerships from achieving real-world impact on the ground. Drawing on semi-structured interviews with six leading AI4SI researchers - complemented by the authors' own lived experiences in conducting AI4SI research - this paper attempts to understand the day-to-day difficulties faced in developing and deploying socially impactful AI solutions. Through thematic analysis, we identify structural and organizational, communication, collaboration, and operational challenges as key barriers to deployment. While there are no easy fixes, we synthesize best practices and actionable strategies drawn from these interviews and our own work in this space. In doing so, we hope this paper serves as a practical reference guide for AI4SI researchers and partner organizations seeking to engage more effectively in socially impactful AI collaborations.

Authors:Jina Kim, Leeje Jang, Yao-Yi Chiang, Guanyu Wang, Michelle Pasco
Title: StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery
Abstract:
Traditionally, neighborhood studies have employed interviews, surveys, and manual image annotation guided by detailed protocols to identify environmental characteristics, including physical disorder, decay, street safety, and sociocultural symbols, and to examine their impact on developmental and health outcomes. While these methods yield rich insights, they are time-consuming and require intensive expert intervention. Recent technological advances, including vision-language models (VLMs), have begun to automate parts of this process; however, existing efforts are often ad hoc and lack adaptability across research designs and geographic contexts. In this demo paper, we present StreetLens, a human-centered, researcher-configurable workflow that embeds relevant social science expertise in a VLM for scalable neighborhood environmental assessments. StreetLens mimics the process of trained human coders by grounding the analysis in questions derived from established interview protocols, retrieving relevant street view imagery (SVI), and generating a wide spectrum of semantic annotations from objective features (e.g., the number of cars) to subjective perceptions (e.g., the sense of disorder in an image). By enabling researchers to define the VLM's role through domain-informed prompting, StreetLens places domain knowledge at the core of the analysis process. It also supports the integration of prior survey data to enhance robustness and expand the range of characteristics assessed across diverse settings. We provide a Google Colab notebook to make StreetLens accessible and extensible for researchers working with public or custom SVI datasets. StreetLens represents a shift toward flexible, agentic AI systems that work closely with researchers to accelerate and scale neighborhood studies.

Authors:Ziyue Lin, Yi Shan, Lin Gao, Xinghua Jia, Siming Chen
Title: SimSpark: Interactive Simulation of Social Media Behaviors
Abstract:
Understanding user behaviors on social media has garnered significant scholarly attention, enhancing our comprehension of how virtual platforms impact society and empowering decision-makers. Simulating social media behaviors provides a robust tool for capturing the patterns of social media behaviors, testing hypotheses, and predicting the effects of various interventions, ultimately contributing to a deeper understanding of social media environments. Moreover, it can overcome difficulties associated with utilizing real data for analysis, such as data accessibility issues, ethical concerns, and the complexity of processing large and heterogeneous datasets. However, researchers and stakeholders need more flexible platforms to investigate different user behaviors by simulating different scenarios and characters, which is not possible yet. Therefore, this paper introduces SimSpark, an interactive system including simulation algorithms and interactive visual interfaces which is capable of creating small simulated social media platforms with customizable characters and social environments. We address three key challenges: generating believable behaviors, validating simulation results, and supporting interactive control for generation and results analysis. A simulation workflow is introduced to generate believable behaviors of agents by utilizing large language models. A visual interface enables real-time parameter adjustment and process monitoring for customizing generation settings. A set of visualizations and interactions are also designed to display the models' outputs for further analysis. Effectiveness is evaluated through case studies, quantitative simulation model assessments, and expert interviews.

Authors:Tobias Kerbl, David Brecht, Nils Gehrke, Nijinshan Karunainayagam, Niklas Krauss, Florian Pfab, Richard Taupitz, Ines Trautmannsheimer, Xiyan Su, Maria-Magdalena Wolf, Frank Diermeyer
Title: TUM Teleoperation: Open Source Software for Remote Driving and Assistance of Automated Vehicles
Abstract:
Teleoperation is a key enabler for future mobility, supporting Automated Vehicles in rare and complex scenarios beyond the capabilities of their automation. Despite ongoing research, no open source software currently combines Remote Driving, e.g., via steering wheel and pedals, Remote Assistance through high-level interaction with automated driving software modules, and integration with a real-world vehicle for practical testing. To address this gap, we present a modular, open source teleoperation software stack that can interact with an automated driving software, e.g., Autoware, enabling Remote Assistance and Remote Driving. The software featuresstandardized interfaces for seamless integration with various real-world and simulation platforms, while allowing for flexible design of the human-machine interface. The system is designed for modularity and ease of extension, serving as a foundation for collaborative development on individual software components as well as realistic testing and user studies. To demonstrate the applicability of our software, we evaluated the latency and performance of different vehicle platforms in simulation and real-world. The source code is available on GitHub

Authors:Azim Ibragimov, Ethan Wilson, Kevin R. B. Butler, Eakta Jain
Title: Toward Practical Privacy in XR: Empirical Analysis of Multimodal Anonymization Mechanisms
Abstract:
As extended reality (XR) systems become increasingly immersive and sensor-rich, they enable the collection of fine-grained behavioral signals such as eye and body telemetry. These signals support personalized and responsive experiences and may also contain unique patterns that can be linked back to individuals. However, privacy mechanisms that naively pair unimodal mechanisms (e.g., independently apply privacy mechanisms for eye and body privatization) are often ineffective at preventing re-identification in practice. In this work, we systematically evaluate real-time privacy mechanisms for XR, both individually and in pair, across eye and body modalities. To preserve usability, all mechanisms were tuned based on empirically grounded thresholds for real-time interaction. We evaluated four eye and ten body mechanisms across multiple datasets, comprising up to 407 participants. Our results show that while obfuscating eye telemetry alone offers moderate privacy gains, body telemetry perturbation is substantially more effective. When carefully paired, multimodal mechanisms reduce re-identification rate from 80.3% to 26.3% in casual XR applications (e.g., VRChat and Job Simulator) and from 84.8% to 26.1% in competitive XR applications (e.g., Beat Saber and Synth Riders), all without violating real-time usability requirements. These findings underscore the potential of modality-specific and context-aware privacy strategies for protecting behavioral data in XR environments.

Authors:Alex Grzankowski, Geoff Keeling, Henry Shevlin, Winnie Street
Title: Deflating Deflationism: A Critical Perspective on Debunking Arguments Against LLM Mentality
Abstract:
Many people feel compelled to interpret, describe, and respond to Large Language Models (LLMs) as if they possess inner mental lives similar to our own. Responses to this phenomenon have varied. Inflationists hold that at least some folk psychological ascriptions to LLMs are warranted. Deflationists argue that all such attributions of mentality to LLMs are misplaced, often cautioning against the risk that anthropomorphic projection may lead to misplaced trust or potentially even confusion about the moral status of LLMs. We advance this debate by assessing two common deflationary arguments against LLM mentality. What we term the 'robustness strategy' aims to undercut one justification for believing that LLMs are minded entities by showing that putatively cognitive and humanlike behaviours are not robust, failing to generalise appropriately. What we term the 'etiological strategy' undercuts attributions of mentality by challenging naive causal explanations of LLM behaviours, offering alternative causal accounts that weaken the case for mental state attributions. While both strategies offer powerful challenges to full-blown inflationism, we find that neither strategy provides a knock-down case against ascriptions of mentality to LLMs simpliciter. With this in mind, we explore a modest form of inflationism that permits ascriptions of mentality to LLMs under certain conditions. Specifically, we argue that folk practice provides a defeasible basis for attributing mental states and capacities to LLMs provided those mental states and capacities can be understood in metaphysically undemanding terms (e.g. knowledge, beliefs and desires), while greater caution is required when attributing metaphysically demanding mental phenomena such as phenomenal consciousness.

Authors:Nan Chen, Luna K. Qiu, Arran Zeyu Wang, Zilong Wang, Yuqing Yang
Title: Screen Reader Users in the Vibe Coding Era: Adaptation, Empowerment, and New Accessibility Landscape
Abstract:
The rise of generative AI agents has reshaped human-computer interaction and computer-supported cooperative work by shifting users' roles from direct task execution to supervising machine-driven actions, especially in programming (e.g., "vibe coding"). However, there is limited understanding of how screen reader users engage with these systems in practice. To address this gap, we conducted a longitudinal study with 16 screen reader users, exploring their experiences with AI code assistants in daily programming scenarios. Participants first completed a tutorial with GitHub Copilot, then performed a programming task and provided initial feedback. After two weeks of AI-assisted programming, follow-up studies assessed changes in their practices and perceptions. Our findings demonstrate that advanced code assistants not only enhance their programming capabilities but also bridge accessibility gaps. While the assistant proved beneficial, there remains potential to improve how users convey intent and interpret outputs. They also experienced difficulties managing multiple views and maintaining situational awareness. More broadly, they encountered barriers in learning advanced tools and expressed a need to retain control. Based on these insights, we provide design recommendations for more accessible and inclusive AI-assisted tools.

Authors:Frederic Gmeiner, Kaitao Luo, Ye Wang, Kenneth Holstein, Nikolas Martelaro
Title: Exploring the Potential of Metacognitive Support Agents for Human-AI Co-Creation
Abstract:
Despite the potential of generative AI (GenAI) design tools to enhance design processes, professionals often struggle to integrate AI into their workflows. Fundamental cognitive challenges include the need to specify all design criteria as distinct parameters upfront (intent formulation) and designers' reduced cognitive involvement in the design process due to cognitive offloading, which can lead to insufficient problem exploration, underspecification, and limited ability to evaluate outcomes. Motivated by these challenges, we envision novel metacognitive support agents that assist designers in working more reflectively with GenAI. To explore this vision, we conducted exploratory prototyping through a Wizard of Oz elicitation study with 20 mechanical designers probing multiple metacognitive support strategies. We found that agent-supported users created more feasible designs than non-supported users, with differing impacts between support strategies. Based on these findings, we discuss opportunities and tradeoffs of metacognitive support agents and considerations for future AI-based design tools.

Authors:Yutong Zhang, Dora Zhao, Jeffrey T. Hancock, Robert Kraut, Diyi Yang
Title: The Rise of AI Companions: How Human-Chatbot Relationships Influence Well-Being
Abstract:
As large language models (LLMs)-enhanced chatbots grow increasingly expressive and socially responsive, many users are beginning to form companionship-like bonds with them, particularly with simulated AI partners designed to mimic emotionally attuned interlocutors. These emerging AI companions raise critical questions: Can such systems fulfill social needs typically met by human relationships? How do they shape psychological well-being? And what new risks arise as users develop emotional ties to non-human agents? This study investigates how people interact with AI companions, especially simulated partners on CharacterAI, and how this use is associated with users' psychological well-being. We analyzed survey data from 1,131 users and 4,363 chat sessions (413,509 messages) donated by 244 participants, focusing on three dimensions of use: nature of the interaction, interaction intensity, and self-disclosure. By triangulating self-reports primary motivation, open-ended relationship descriptions, and annotated chat transcripts, we identify patterns in how users engage with AI companions and its associations with well-being. Findings suggest that people with smaller social networks are more likely to turn to chatbots for companionship, but that companionship-oriented chatbot usage is consistently associated with lower well-being, particularly when people use the chatbots more intensively, engage in higher levels of self-disclosure, and lack strong human social support. Even though some people turn to chatbots to fulfill social needs, these uses of chatbots do not fully substitute for human connection. As a result, the psychological benefits may be limited, and the relationship could pose risks for more socially isolated or emotionally vulnerable users.

Authors:Abhishek Jaiswal, Armeet Singh Luthra, Purav Jangir, Bhavya Garg, Nisheeth Srivastava
Title: Real-Time Feedback and Benchmark Dataset for Isometric Pose Evaluation
Abstract:
Isometric exercises appeal to individuals seeking convenience, privacy, and minimal dependence on equipments. However, such fitness training is often overdependent on unreliable digital media content instead of expert supervision, introducing serious risks, including incorrect posture, injury, and disengagement due to lack of corrective feedback. To address these challenges, we present a real-time feedback system for assessing isometric poses. Our contributions include the release of the largest multiclass isometric exercise video dataset to date, comprising over 3,600 clips across six poses with correct and incorrect variations. To support robust evaluation, we benchmark state-of-the-art models-including graph-based networks-on this dataset and introduce a novel three-part metric that captures classification accuracy, mistake localization, and model confidence. Our results enhance the feasibility of intelligent and personalized exercise training systems for home workouts. This expert-level diagnosis, delivered directly to the users, also expands the potential applications of these systems to rehabilitation, physiotherapy, and various other fitness disciplines that involve physical motion.

Authors:Zikang Leng, Megha Thukral, Yaqi Liu, Hrudhai Rajasekhar, Shruthi K. Hiremath, Jiaman He, Thomas Plötz
Title: AgentSense: Virtual Sensor Data Generation Using LLM Agents in Simulated Home Environments
Abstract:
A major challenge in developing robust and generalizable Human Activity Recognition (HAR) systems for smart homes is the lack of large and diverse labeled datasets. Variations in home layouts, sensor configurations, and individual behaviors further exacerbate this issue. To address this, we leverage the idea of embodied AI agents-virtual agents that perceive and act within simulated environments guided by internal world models. We introduce AgentSense, a virtual data generation pipeline in which agents live out daily routines in simulated smart homes, with behavior guided by Large Language Models (LLMs). The LLM generates diverse synthetic personas and realistic routines grounded in the environment, which are then decomposed into fine-grained actions. These actions are executed in an extended version of the VirtualHome simulator, which we augment with virtual ambient sensors that record the agents' activities. Our approach produces rich, privacy-preserving sensor data that reflects real-world diversity. We evaluate AgentSense on five real HAR datasets. Models pretrained on the generated data consistently outperform baselines, especially in low-resource settings. Furthermore, combining the generated virtual sensor data with a small amount of real data achieves performance comparable to training on full real-world datasets. These results highlight the potential of using LLM-guided embodied agents for scalable and cost-effective sensor data generation in HAR.

Authors:Yijun Liu, Frederick Choi, Eshwar Chandrasekharan
Title: Needling Through the Threads: A Visualization Tool for Navigating Threaded Online Discussions
Abstract:
Navigating large-scale online discussions is difficult due to the rapid pace and large volume of user-generated content. Prior work in CSCW has shown that moderators often struggle to follow multiple simultaneous discussions, track evolving conversations, and maintain contextual understanding--all of which hinder timely and effective moderation. While platforms like Reddit use threaded structures to organize discourse, deeply nested threads can still obscure discussions and make it difficult to grasp the overall trajectory of conversations. In this paper, we present an interactive system called Needle to support better navigation and comprehension of complex discourse within threaded discussions. Needle uses visual analytics to summarize key conversational metrics--such as activity, toxicity levels, and voting trends--over time, offering both high-level insights and detailed breakdowns of discussion threads. Through a user study with ten Reddit moderators, we find that Needle supports moderation by reducing cognitive load in making sense of large discussion, helping prioritize areas that need attention, and providing decision-making supports. Based on our findings, we provide a set of design guidelines to inform future visualization-driven moderation tools and sociotechnical systems. To the best of our knowledge, Needle is one of the first systems to combine interactive visual analytics with human-in-the-loop moderation for threaded online discussions.

Authors:Christine Bauer, Li Chen, Nicola Ferro, Norbert Fuhr, Avishek Anand, Timo Breuer, Guglielmo Faggioli, Ophir Frieder, Hideo Joho, Jussi Karlgren, Johannes Kiesel, Bart P. Knijnenburg, Aldo Lipani, Lien Michiels, Andrea Papenmeier, Maria Soledad Pera, Mark Sanderson, Scott Sanner, Benno Stein, Johanne R. Trippas, Karin Verspoor, Martijn C Willemsen
Title: Manifesto from Dagstuhl Perspectives Workshop 24352 -- Conversational Agents: A Framework for Evaluation (CAFE)
Abstract:
During the workshop, we deeply discussed what CONversational Information ACcess (CONIAC) is and its unique features, proposing a world model abstracting it, and defined the Conversational Agents Framework for Evaluation (CAFE) for the evaluation of CONIAC systems, consisting of six major components: 1) goals of the system's stakeholders, 2) user tasks to be studied in the evaluation, 3) aspects of the users carrying out the tasks, 4) evaluation criteria to be considered, 5) evaluation methodology to be applied, and 6) measures for the quantitative criteria chosen.

Authors:Bao Zhang, Zihan Li, Zhenglei Liu, Huanchen Wang, Yuxin Ma
Title: Integrating Large Language Models into Text Animation: An Intelligent Editing System with Inline and Chat Interaction
Abstract:
Text animation, a foundational element in video creation, enables efficient and cost-effective communication, thriving in advertisements, journalism, and social media. However, traditional animation workflows present significant usability barriers for non-professionals, with intricate operational procedures severely hindering creative productivity. To address this, we propose a Large Language Model (LLM)-aided text animation editing system that enables real-time intent tracking and flexible editing. The system introduces an agent-based dual-stream pipeline that integrates context-aware inline suggestions and conversational guidance as well as employs a semantic-animation mapping to facilitate LLM-driven creative intent translation. Besides, the system supports synchronized text-animation previews and parametric adjustments via unified controls to improve editing workflow. A user study evaluates the system, highlighting its ability to help non-professional users complete animation workflows while validating the pipeline. The findings encourage further exploration of integrating LLMs into a comprehensive video creation workflow.

Authors:Kellie Yu Hui Sim, Kenny Tsu Wei Choo
Title: "I Said Things I Needed to Hear Myself": Peer Support as an Emotional, Organisational, and Sociotechnical Practice in Singapore
Abstract:
Peer support plays a vital role in expanding access to mental health care by providing empathetic, community-based support outside formal clinical systems. As digital platforms increasingly mediate such support, the design and impact of these technologies remain under-examined, particularly in Asian contexts. This paper presents findings from an interview study with 20 peer supporters in Singapore, who operate across diverse online, offline, and hybrid environments. Through a thematic analysis, we unpack how participants start, conduct, and sustain peer support, highlighting their motivations, emotional labour, and the sociocultural dimensions shaping their practices. Building on this grounded understanding, we surface design directions for culturally responsive digital tools that scaffold rather than supplant relational care. Drawing insights from qualitative accounts, we offer a situated perspective on how AI might responsibly augment peer support. This research contributes to human-centred computing by articulating the lived realities of peer supporters and proposing design implications for trustworthy and context-sensitive AI in mental health.

Authors:Mayar Elfares, Salma Younis, Pascal Reisert, Ralf Küsters, Tobias Renner, Andreas Bulling
Title: Guidelines for Gaze-based Neural Preliminary Diagnosis
Abstract:
Neural disorders refer to any condition affecting the nervous system and that influence how individuals perceive and interact with the world. Traditional neural diagnoses rely on cumbersome, time-consuming, or subjective methods, such as clinical interviews, behavioural observations, or medical imaging. Eye tracking is an attractive alternative because analysing eye movements, such as fixations and saccades, can provide more objective insights into brain function and cognitive processing by capturing non-verbal and unconscious responses. Despite its potential, existing gaze-based studies presented seemingly contradictory findings. They are dispersed across diverse fields, requiring further research to standardise protocols and expand their application, particularly as a preliminary indicator of neural processes for differential diagnosis. Therefore, this paper outlines the main agreed-upon findings and provides a systematisation of knowledge and key guidelines towards advancing gaze-based neural preliminary diagnosis.

Authors:Leijie Wang, Weizi Wu, Lirong Que, Nirvan Tyagi, Amy X. Zhang
Title: From Inquisitorial to Adversarial: Using Legal Theory to Redesign Online Reporting Systems
Abstract:
User reporting systems are central to addressing interpersonal conflicts and protecting users from harm in online spaces, particularly those with heightened privacy expectations. However, users often express frustration at their lack of insight and input into the reporting process. Drawing on offline legal literature, we trace these frustrations to the inquisitorial nature of today's online reporting systems, where moderators lead evidence gathering and case development. In contrast, adversarial models can grant users greater control and thus are better for procedural justice and privacy protection, despite their increased risks of system abuse. This motivates us to explore the potential of incorporating adversarial practices into online reporting systems. Through literature review, formative interviews, and threat modeling, we find a rich design space for empowering users to collect and present their evidence while mitigating potential abuse in the reporting process. In particular, we propose designs that minimize the amount of information shared for reporting purposes, as well as supporting evidence authentication. Finally, we discuss how our findings can inform new cryptographic tools and new efforts to apply comparative legal frameworks to online moderation.

Authors:Ala Yankouskaya, Areej B. Babiker, Syeda W. F. Rizvi, Sameha Alshakhsi, Magnus Liebherr, Raian Ali
Title: LLM-D12: A Dual-Dimensional Scale of Instrumental and Relational Dependencies on Large Language Models
Abstract:
There is growing interest in understanding how people interact with large language models (LLMs) and whether such models elicit dependency or even addictive behaviour. Validated tools to assess the extent to which individuals may become dependent on LLMs are scarce and primarily build on classic behavioral addiction symptoms, adapted to the context of LLM use. We view this as a conceptual limitation, as the LLM-human relationship is more nuanced and warrants a fresh and distinct perspective. To address this gap, we developed and validated a new 12-item questionnaire to measure LLM dependency, referred to as LLM-D12. The scale was based on the authors' prior theoretical work, with items developed accordingly and responses collected from 526 participants in the UK. Exploratory and confirmatory factor analyses, performed on separate halves of the total sample using a split-sample approach, supported a two-factor structure: Instrumental Dependency (six items) and Relationship Dependency (six items). Instrumental Dependency reflects the extent to which individuals rely on LLMs to support or collaborate in decision-making and cognitive tasks. Relationship Dependency captures the tendency to perceive LLMs as socially meaningful, sentient, or companion-like entities. The two-factor structure demonstrated excellent internal consistency and clear discriminant validity. External validation confirmed both the conceptual foundation and the distinction between the two subscales. The psychometric properties and structure of our LLM-D12 scale were interpreted in light of the emerging view that dependency on LLMs does not necessarily indicate dysfunction but may still reflect reliance levels that could become problematic in certain contexts.

Authors:Vanessa Borst, Anna Riedmann, Tassilo Dege, Konstantin Müller, Astrid Schmieder, Birgit Lugrin, Samuel Kounev
Title: WoundAIssist: A Patient-Centered Mobile App for AI-Assisted Wound Care With Physicians in the Loop
Abstract:
The rising prevalence of chronic wounds, especially in aging populations, presents a significant healthcare challenge due to prolonged hospitalizations, elevated costs, and reduced patient quality of life. Traditional wound care is resource-intensive, requiring frequent in-person visits that strain both patients and healthcare professionals (HCPs). Therefore, we present WoundAIssist, a patient-centered, AI-driven mobile application designed to support telemedical wound care. WoundAIssist enables patients to regularly document wounds at home via photographs and questionnaires, while physicians remain actively engaged in the care process through remote monitoring and video consultations. A distinguishing feature is an integrated lightweight deep learning model for on-device wound segmentation, which, combined with patient-reported data, enables continuous monitoring of wound healing progression. Developed through an iterative, user-centered process involving both patients and domain experts, WoundAIssist prioritizes an user-friendly design, particularly for elderly patients. A conclusive usability study with patients and dermatologists reported excellent usability, good app quality, and favorable perceptions of the AI-driven wound recognition. Our main contribution is two-fold: (I) the implementation and (II) evaluation of WoundAIssist, an easy-to-use yet comprehensive telehealth solution designed to bridge the gap between patients and HCPs. Additionally, we synthesize design insights for remote patient monitoring apps, derived from over three years of interdisciplinary research, that may inform the development of similar digital health tools across clinical domains.

Authors:Lama Alqazlan, Zheng Fang, Michael Castelle, Rob Procter
Title: A Novel, Human-in-the-Loop Computational Grounded Theory Framework for Big Social Data
Abstract:
The availability of big data has significantly influenced the possibilities and methodological choices for conducting large-scale behavioural and social science research. In the context of qualitative data analysis, a major challenge is that conventional methods require intensive manual labour and are often impractical to apply to large datasets. One effective way to address this issue is by integrating emerging computational methods to overcome scalability limitations. However, a critical concern for researchers is the trustworthiness of results when Machine Learning (ML) and Natural Language Processing (NLP) tools are used to analyse such data. We argue that confidence in the credibility and robustness of results depends on adopting a 'human-in-the-loop' methodology that is able to provide researchers with control over the analytical process, while retaining the benefits of using ML and NLP. With this in mind, we propose a novel methodological framework for Computational Grounded Theory (CGT) that supports the analysis of large qualitative datasets, while maintaining the rigour of established Grounded Theory (GT) methodologies. To illustrate the framework's value, we present the results of testing it on a dataset collected from Reddit in a study aimed at understanding tutors' experiences in the gig economy.

Authors:Mayar Elfares, Pascal Reisert, Ralf Küsters, Andreas Bulling
Title: QualitEye: Public and Privacy-preserving Gaze Data Quality Verification
Abstract:
Gaze-based applications are increasingly advancing with the availability of large datasets but ensuring data quality presents a substantial challenge when collecting data at scale. It further requires different parties to collaborate, therefore, privacy concerns arise. We propose QualitEye--the first method for verifying image-based gaze data quality. QualitEye employs a new semantic representation of eye images that contains the information required for verification while excluding irrelevant information for better domain adaptation. QualitEye covers a public setting where parties can freely exchange data and a privacy-preserving setting where parties cannot reveal their raw data nor derive gaze features/labels of others with adapted private set intersection protocols. We evaluate QualitEye on the MPIIFaceGaze and GazeCapture datasets and achieve a high verification performance (with a small overhead in runtime for privacy-preserving versions). Hence, QualitEye paves the way for new gaze analysis methods at the intersection of machine learning, human-computer interaction, and cryptography.

Authors:Sijia Xiao, Haodi Zou, Alice Qian Zhang, Deepak Kumar, Hong Shen, Jason Hong, Motahhare Eslami
Title: What Comes After Harm? Mapping Reparative Actions in AI through Justice Frameworks
Abstract:
As Artificial Intelligence (AI) systems are integrated into more aspects of society, they offer new capabilities but also cause a range of harms that are drawing increasing scrutiny. A large body of work in the Responsible AI community has focused on identifying and auditing these harms. However, much less is understood about what happens after harm occurs: what constitutes reparation, who initiates it, and how effective these reparations are. In this paper, we develop a taxonomy of AI harm reparation based on a thematic analysis of real-world incidents. The taxonomy organizes reparative actions into four overarching goals: acknowledging harm, attributing responsibility, providing remedies, and enabling systemic change. We apply this framework to a dataset of 1,060 AI-related incidents, analyzing the prevalence of each action and the distribution of stakeholder involvement. Our findings show that reparation efforts are concentrated in early, symbolic stages, with limited actions toward accountability or structural reform. Drawing on theories of justice, we argue that existing responses fall short of delivering meaningful redress. This work contributes a foundation for advancing more accountable and reparative approaches to Responsible AI.

Authors:Xiaotian Su, Thiemo Wambsganss, Roman Rietsche, Seyed Parsa Neshaei, Tanja Käser
Title: Reviewriter: AI-Generated Instructions For Peer Review Writing
Abstract:
Large Language Models (LLMs) offer novel opportunities for educational applications that have the potential to transform traditional learning for students. Despite AI-enhanced applications having the potential to provide personalized learning experiences, more studies are needed on the design of generative AI systems and evidence for using them in real educational settings. In this paper, we design, implement and evaluate \texttt{Reviewriter}, a novel tool to provide students with AI-generated instructions for writing peer reviews in German. Our study identifies three key aspects: a) we provide insights into student needs when writing peer reviews with generative models which we then use to develop a novel system to provide adaptive instructions b) we fine-tune three German language models on a selected corpus of 11,925 student-written peer review texts in German and choose German-GPT2 based on quantitative measures and human evaluation, and c) we evaluate our tool with fourteen students, revealing positive technology acceptance based on quantitative measures. Additionally, the qualitative feedback presents the benefits and limitations of generative AI in peer review writing.

Authors:Alex Sotiropoulos, Sulyab Thottungal Valapu, Linus Lei, Jared Coleman, Bhaskar Krishnamachari
Title: Crowd-SFT: Crowdsourcing for LLM Alignment
Abstract:
Large Language Models (LLMs) increasingly rely on Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) to align model responses with human preferences. While RLHF employs a reinforcement learning approach with a separate reward model, SFT uses human-curated datasets for supervised learning. Both approaches traditionally depend on small, vetted groups of annotators, making them costly, prone to bias, and limited in scalability. We propose an open, crowd-sourced fine-tuning framework that addresses these limitations by enabling broader feedback collection for SFT without extensive annotator training. Our framework promotes incentive fairness via a point-based reward system correlated with Shapley values and guides model convergence through iterative model updates. Our multi-model selection framework demonstrates up to a 55% reduction in target distance over single-model selection, enabling subsequent experiments that validate our point-based reward mechanism's close alignment with Shapley values (a well-established method for attributing individual contributions) thereby supporting fair and scalable participation.

Authors:Mikel K. Ngueajio, Flor Miriam Plaza-del-Arco, Yi-Ling Chung, Danda B. Rawat, Amanda Cercas Curry
Title: Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate
Abstract:
Automated counter-narratives (CN) offer a promising strategy for mitigating online hate speech, yet concerns about their affective tone, accessibility, and ethical risks remain. We propose a framework for evaluating Large Language Model (LLM)-generated CNs across four dimensions: persona framing, verbosity and readability, affective tone, and ethical robustness. Using GPT-4o-Mini, Cohere's CommandR-7B, and Meta's LLaMA 3.1-70B, we assess three prompting strategies on the MT-Conan and HatEval datasets. Our findings reveal that LLM-generated CNs are often verbose and adapted for people with college-level literacy, limiting their accessibility. While emotionally guided prompts yield more empathetic and readable responses, there remain concerns surrounding safety and effectiveness.

Authors:Chadha Degachi, Samuel Kernan Freire, Evangelos Niforatos, Gerd Kortuem
Title: Understanding Mental Models of Generative Conversational Search and The Effect of Interface Transparency
Abstract:
The experience and adoption of conversational search is tied to the accuracy and completeness of users' mental models -- their internal frameworks for understanding and predicting system behaviour. Thus, understanding these models can reveal areas for design interventions. Transparency is one such intervention which can improve system interpretability and enable mental model alignment. While past research has explored mental models of search engines, those of generative conversational search remain underexplored, even while the popularity of these systems soars. To address this, we conducted a study with 16 participants, who performed 4 search tasks using 4 conversational interfaces of varying transparency levels. Our analysis revealed that most user mental models were too abstract to support users in explaining individual search instances. These results suggest that 1) mental models may pose a barrier to appropriate trust in conversational search, and 2) hybrid web-conversational search is a promising novel direction for future search interface design.

Authors:Oliver Huang, Patrick Lee, Carolina Nobre
Title: From Reality to Recognition: Evaluating Visualization Analogies for Novice Chart Comprehension
Abstract:
Novice learners often have difficulty learning new visualization types because they tend to interpret novel visualizations through the mental models of simpler charts they have previously encountered. Traditional visualization teaching methods, which usually rely on directly translating conceptual aspects of data into concrete data visualizations, often fail to attend to the needs of novice learners navigating this tension. To address this, we conducted an empirical exploration of how analogies can be used to help novices with chart comprehension. We introduced visualization analogies: visualizations that map data structures to real-world contexts to facilitate an intuitive understanding of novel chart types. We evaluated this pedagogical technique using a within-subject study (N=128) where we taught 8 chart types using visualization analogies. Our findings show that visualization analogies improve visual analysis skills and help learners transfer their understanding to actual charts. They effectively introduce visual embellishments, cater to diverse learning preferences, and are preferred by novice learners over traditional chart visualizations. This study offers empirical insights and open-source tools to advance visualization education through analogical reasoning.

Authors:Zining Wang, Yuxuan Zhang, Dongwook Yoon, Nicholas Vincent, Farhan Samir, Vered Shwartz
Title: WikiGap: Promoting Epistemic Equity by Surfacing Knowledge Gaps Between English Wikipedia and other Language Editions
Abstract:
With more than 11 times as many pageviews as the next largest edition, English Wikipedia dominates global knowledge access relative to other language editions. Readers are prone to assuming English Wikipedia as a superset of all language editions, leading many to prefer it even when their primary language is not English. Other language editions, however, comprise complementary facts rooted in their respective cultures and media environments, which are marginalized in English Wikipedia. While Wikipedia's user interface enables switching between language editions through its Interlanguage Link (ILL) system, it does not reveal to readers that other language editions contain valuable, complementary information. We present WikiGap, a system that surfaces complementary facts sourced from other Wikipedias within the English Wikipedia interface. Specifically, by combining a recent multilingual information-gap discovery method with a user-centered design, WikiGap enables access to complementary information from French, Russian, and Chinese Wikipedia. In a mixed-methods study (n=21), WikiGap significantly improved fact-finding accuracy, reduced task time, and received a 32-point higher usability score relative to Wikipedia's current ILL-based navigation system. Participants reported increased awareness of the availability of complementary information in non-English editions and reconsidered the completeness of English Wikipedia. WikiGap thus paves the way for improved epistemic equity across language editions.

Authors:Vishnu Ramineni, Balaji Shesharao Ingole, Nikhil Kumar Pulipeta, Balakrishna Pothineni, Aditya Gupta
Title: Advancing Digital Accessibility In Digital Pharmacy, Healthcare, And Wearable Devices: Inclusive Solutions for Enhanced Patient Engagement
Abstract:
Modern healthcare facilities demand digital accessibility to guarantee equal access to telemedicine platforms, online pharmacy services, and health monitoring devices that can be worn or are handy. With the rising call for the implementation of robust digital healthcare solutions, people with disabilities encounter impediments in their endeavor of managing and getting accustomed to these modern technologies owing to insufficient accessibility features. The paper highlights the role of comprehensive solutions for enhanced patient engagement and usability, particularly, in digital pharmacy, healthcare, and wearable devices. Besides, it elucidates the key obstructions faced by users experiencing auditory, visual, cognitive, and motor impairments. Through a kind consideration of present accessibility guidelines, practices, and emerging technologies, the paper provides a holistic overview by offering innovative solutions, accentuating the vitality of compliance with Web Content Accessibility Guidelines (WCAG), Americans with Disabilities Act (ADA), and other regulatory structures to foster easy access to digital healthcare services. Moreover, there is due focus on using AI-driven tools, speech-activated interfaces, and tactile feedback in wearable health devices to assist persons with disabilities. The outcome of the research explicates the necessity of prioritizing accessibility for individuals with disabilities and cultivating a culture where healthcare providers, policymakers, and officials build a patient-centered digital healthcare ecosystem that is all-encompassing in nature.

Authors:Vishnu Ramineni, Aditya Gupta, Balakrishna Pothineni, Isan Sahoo, Shivareddy Devarapalli, Balaji Shesharao Ingole
Title: Bridging the Gap: Enhancing Digital Accessibility for Medicaid Populations in Telehealth Adoption
Abstract:
The swift evolution of telehealth has revolutionized how medical professionals deliver healthcare services and boost convenience and accessibility. Yet, the Medicaid population encounters several impediments in utilizing facilities especially owing to poor internet connectivity, less awareness about digital platforms, and a shortage of assistive technologies. The paper aims to explicate key factors behind digital accessibility for Medicaid populations and expounds robust solutions to eradicate these challenges. Through inclusive design ideas, AI-assisted technologies, and all-encompassing policies by the concerned authorities, healthcare professionals can enhance usability and efficacy and thus better serve the needy. This revolution not only enhances convenience but also expands access, mainly for underserved groups such as rural populations or those with mobility issues, thereby ensuring inclusivity and flexibility in the healthcare domain. Besides, the paper highlights the vitality of collaboration between healthcare professionals, policymakers, and tech developers in unveiling the accessibility and usability impediments. What else helps in minimizing healthcare differences and enhancing patient outcomes is guaranteeing equitable access to telehealth for Medicaid beneficiaries. The paper systematically offers major recommendations to increase digital accessibility in telehealth, thereby creating a patient-oriented and all-encompassing healthcare system.

Authors:Jane Cleland-Huang, Pedro Antonio Alarcon Granadeno, Arturo Miguel Russell Bernal, Demetrius Hernandez, Michael Murphy, Maureen Petterson, Walter Scheirer
Title: Cognitive Guardrails for Open-World Decision Making in Autonomous Drone Swarms
Abstract:
Small Uncrewed Aerial Systems (sUAS) are increasingly deployed as autonomous swarms in search-and-rescue and other disaster-response scenarios. In these settings, they use computer vision (CV) to detect objects of interest and autonomously adapt their missions. However, traditional CV systems often struggle to recognize unfamiliar objects in open-world environments or to infer their relevance for mission planning. To address this, we incorporate large language models (LLMs) to reason about detected objects and their implications. While LLMs can offer valuable insights, they are also prone to hallucinations and may produce incorrect, misleading, or unsafe recommendations. To ensure safe and sensible decision-making under uncertainty, high-level decisions must be governed by cognitive guardrails. This article presents the design, simulation, and real-world integration of these guardrails for sUAS swarms in search-and-rescue missions.

Authors:Junhua Zhu, Lan Luo
Title: Designing the Future of Entrepreneurship Education: Exploring an AI-Empowered Scaffold System for Business Plan Development
Abstract:
Entrepreneurship education equips students to transform innovative ideas into actionable entrepreneurship plans, yet traditional approaches often struggle to provide the personalized guidance and practical alignment needed for success. Focusing on the business plan as a key learning tool and evaluation method, this study investigates the design needs for an AI-empowered scaffold system to address these challenges. Based on qualitative insights from educators and students, the findings highlight three critical dimensions for system design: mastery of business plan development, alignment with entrepreneurial learning goals, and integration of adaptive system features. These findings underscore the transformative potential of AI in bridging gaps in entrepreneurship education while emphasizing the enduring value of human mentorship and experiential learning.

Authors:Xiaoyu Chang, Fan Zhang, Kexue Fu, Carla Diana, Wendy Ju, Ray LC
Title: A Constructed Response: Designing and Choreographing Robot Arm Movements in Collaborative Dance Improvisation
Abstract:
Dancers often prototype movements themselves or with each other during improvisation and choreography. How are these interactions altered when physically manipulable technologies are introduced into the creative process? To understand how dancers design and improvise movements while working with instruments capable of non-humanoid movements, we engaged dancers in workshops to co-create movements with a robot arm in one-human-to-one-robot and three-human-to-one-robot settings. We found that dancers produced more fluid movements in one-to-one scenarios, experiencing a stronger sense of connection and presence with the robot as a co-dancer. In three-to-one scenarios, the dancers divided their attention between the human dancers and the robot, resulting in increased perceived use of space and more stop-and-go movements, perceiving the robot as part of the stage background. This work highlights how technologies can drive creativity in movement artists adapting to new ways of working with physical instruments, contributing design insights supporting artistic collaborations with non-humanoid agents.

Authors:Abdul Rahman Shaikh, Maoyuan Sun, Xingchen Liu, Hamed Alhoori, Jian Zhao, David Koop
Title: iTrace : Interactive Tracing of Cross-View Data Relationships
Abstract:
Exploring data relations across multiple views has been a common task in many domains such as bioinformatics, cybersecurity, and healthcare. To support this, various techniques (e.g., visual links and brushing and linking) are used to show related visual elements across views via lines and highlights. However, understanding the relations using these techniques, when many related elements are scattered, can be difficult due to spatial distance and complexity. To address this, we present iTrace, an interactive visualization technique to effectively trace cross-view data relationships. iTrace leverages the concept of interactive focus transitions, which allows users to see and directly manipulate their focus as they navigate between views. By directing the user's attention through smooth transitions between related elements, iTrace makes it easier to follow data relationships. We demonstrate the effectiveness of iTrace with a user study, and we conclude with a discussion of how iTrace can be broadly used to enhance data exploration in various types of visualizations.

Authors:Peiling Jiang, Haijun Xia
Title: Orca: Browsing at Scale Through User-Driven and AI-Facilitated Orchestration Across Malleable Webpages
Abstract:
Web-based activities are fundamentally distributed across webpages. However, conventional browsers with stacks of tabs fail to support operating and synthesizing large volumes of information across pages. While recent AI systems enable fully automated web browsing and information synthesis, they often diminish user agency and hinder contextual understanding. Therefore, we explore how AI could instead augment users' interactions with content across webpages and mitigate cognitive and manual efforts. Through literature on information tasks and web browsing challenges, and an iterative design process, we present a rich set of novel interactions with our prototype web browser, Orca. Leveraging AI, Orca supports user-driven exploration, operation, organization, and synthesis of web content at scale. To enable browsing at scale, webpages are treated as malleable materials that humans and AI can collaboratively manipulate and compose into a malleable, dynamic, and browser-level workspace. Our evaluation revealed an increased "appetite" for information foraging, enhanced user control, and more flexibility in sensemaking across a broader information landscape on the web.

Authors:Saleh Afzoon, Zahra Jahanandish, Phuong Thao Huynh, Amin Beheshti, Usman Naseem
Title: Modeling and Optimizing User Preferences in AI Copilots: A Comprehensive Survey and Taxonomy
Abstract:
AI copilots represent a new generation of AI-powered systems designed to assist users, particularly knowledge workers and developers, in complex, context-rich tasks. As these systems become more embedded in daily workflows, personalization has emerged as a critical factor for improving usability, effectiveness, and user satisfaction. Central to this personalization is preference optimization: the system's ability to detect, interpret, and align with individual user preferences. While prior work in intelligent assistants and optimization algorithms is extensive, their intersection within AI copilots remains underexplored. This survey addresses that gap by examining how user preferences are operationalized in AI copilots. We investigate how preference signals are sourced, modeled across different interaction stages, and refined through feedback loops. Building on a comprehensive literature review, we define the concept of an AI copilot and introduce a taxonomy of preference optimization techniques across pre-, mid-, and post-interaction phases. Each technique is evaluated in terms of advantages, limitations, and design implications. By consolidating fragmented efforts across AI personalization, human-AI interaction, and language model adaptation, this work offers both a unified conceptual foundation and a practical design perspective for building user-aligned, persona-aware AI copilots that support end-to-end adaptability and deployment.

Authors:Qishuai Zhong, Zongmin Li, Siqi Fan, Aixin Sun
Title: Evaluating LLM Adaptation to Sociodemographic Factors: User Profile vs. Dialogue History
Abstract:
Effective engagement by large language models (LLMs) requires adapting responses to users' sociodemographic characteristics, such as age, occupation, and education level. While many real-world applications leverage dialogue history for contextualization, existing evaluations of LLMs' behavioral adaptation often focus on single-turn prompts. In this paper, we propose a framework to evaluate LLM adaptation when attributes are introduced either (1) explicitly via user profiles in the prompt or (2) implicitly through multi-turn dialogue history. We assess the consistency of model behavior across these modalities. Using a multi-agent pipeline, we construct a synthetic dataset pairing dialogue histories with distinct user profiles and employ questions from the Value Survey Module (VSM 2013) (Hofstede and Hofstede, 2016) to probe value expression. Our findings indicate that most models adjust their expressed values in response to demographic changes, particularly in age and education level, but consistency varies. Models with stronger reasoning capabilities demonstrate greater alignment, indicating the importance of reasoning in robust sociodemographic adaptation.

Authors:Yifan Shan, Bo Liu, Sebastian Bidegain, Thijs Roumen
Title: THE WASTIVE: An Interactive Ebb and Flow of Digital Fabrication Waste
Abstract:
What if digital fabrication waste could observe the world? What would they see? What would they say? "THE WASTIVE" reimagines digital fabrication waste as sentient observers, giving them a poetic voice through interactive art. As viewers approach, the installation awakens, mimicking the rhythmic ebb and flow of ocean waves - a silent dialogue where discarded materials "observe" and respond to human presence. These interactions echo the gentle murmurs of the sea, transforming technological residue into a reflective, sensory experience. Through this artistic contemplation, "THE WASTIVE" invites audiences to reconsider their creative processes and consumption habits. It serves as a poetic call for more mindful, sustainable practices, provoking deeper reflections on our interconnectedness with the environment.

Authors:Jing Nathan Yan, Emma Harvey, Junxiong Wang, Jeffrey M. Rzeszotarski, Allison Koenecke
Title: Fairness-in-the-Workflow: How Machine Learning Practitioners at Big Tech Companies Approach Fairness in Recommender Systems
Abstract:
Recommender systems (RS), which are widely deployed across high-stakes domains, are susceptible to biases that can cause large-scale societal impacts. Researchers have proposed methods to measure and mitigate such biases -- but translating academic theory into practice is inherently challenging. RS practitioners must balance the competing interests of diverse stakeholders, including providers and users, and operate in dynamic environments. Through a semi-structured interview study (N=11), we map the RS practitioner workflow within large technology companies, focusing on how technical teams consider fairness internally and in collaboration with other (legal, data, and fairness) teams. We identify key challenges to incorporating fairness into existing RS workflows: defining fairness in RS contexts, particularly when navigating multi-stakeholder and dynamic fairness considerations. We also identify key organization-wide challenges: making time for fairness work and facilitating cross-team communication. Finally, we offer actionable recommendations for the RS community, including HCI researchers and practitioners.

Authors:Gelareh Hajian, Ali Abedi, Bing Ye, Jennifer Campos, Alex Mihailidis
Title: Dynamics of Affective States During Takeover Requests in Conditionally Automated Driving Among Older Adults with and without Cognitive Impairment
Abstract:
Driving is a key component of independence and quality of life for older adults. However, cognitive decline associated with conditions such as mild cognitive impairment and dementia can compromise driving safety and often lead to premature driving cessation. Conditionally automated vehicles, which require drivers to take over control when automation reaches its operational limits, offer a potential assistive solution. However, their effectiveness depends on the driver's ability to respond to takeover requests (TORs) in a timely and appropriate manner. Understanding emotional responses during TORs can provide insight into drivers' engagement, stress levels, and readiness to resume control, particularly in cognitively vulnerable populations. This study investigated affective responses, measured via facial expression analysis of valence and arousal, during TORs among cognitively healthy older adults and those with cognitive impairment. Facial affect data were analyzed across different road geometries and speeds to evaluate within- and between-group differences in affective states. Within-group comparisons using the Wilcoxon signed-rank test revealed significant changes in valence and arousal during TORs for both groups. Cognitively healthy individuals showed adaptive increases in arousal under higher-demand conditions, while those with cognitive impairment exhibited reduced arousal and more positive valence in several scenarios. Between-group comparisons using the Mann-Whitney U test indicated that cognitively impaired individuals displayed lower arousal and higher valence than controls across different TOR conditions. These findings suggest reduced emotional response and awareness in cognitively impaired drivers, highlighting the need for adaptive vehicle systems that detect affective states and support safe handovers for vulnerable users.

Authors:Seon Gyeom Kim, Jae Young Choi, Ryan Rossi, Eunyee Koh, Tak Yeon Lee
Title: Chart-to-Experience: Benchmarking Multimodal LLMs for Predicting Experiential Impact of Charts
Abstract:
The field of Multimodal Large Language Models (MLLMs) has made remarkable progress in visual understanding tasks, presenting a vast opportunity to predict the perceptual and emotional impact of charts. However, it also raises concerns, as many applications of LLMs are based on overgeneralized assumptions from a few examples, lacking sufficient validation of their performance and effectiveness. We introduce Chart-to-Experience, a benchmark dataset comprising 36 charts, evaluated by crowdsourced workers for their impact on seven experiential factors. Using the dataset as ground truth, we evaluated capabilities of state-of-the-art MLLMs on two tasks: direct prediction and pairwise comparison of charts. Our findings imply that MLLMs are not as sensitive as human evaluators when assessing individual charts, but are accurate and reliable in pairwise comparisons.

Authors:Konstantinos Barmpas, Na Lee, Yannis Panagakis, Dimitrios A. Adamos, Nikolaos Laskaris, Stefanos Zafeiriou
Title: Advancing Brainwave Modeling with a Codebook-Based Foundation Model
Abstract:
Recent advances in large-scale pre-trained Electroencephalogram (EEG) models have shown great promise, driving progress in Brain-Computer Interfaces (BCIs) and healthcare applications. However, despite their success, many existing pre-trained models have struggled to fully capture the rich information content of neural oscillations, a limitation that fundamentally constrains their performance and generalizability across diverse BCI tasks. This limitation is frequently rooted in suboptimal architectural design choices which constrain their representational capacity. In this work, we introduce LaBraM++, an enhanced Large Brainwave Foundation Model (LBM) that incorporates principled improvements grounded in robust signal processing foundations. LaBraM++ demonstrates substantial gains across a variety of tasks, consistently outperforming its originally-based architecture and achieving competitive results when compared to other open-source LBMs. Its superior performance and training efficiency highlight its potential as a strong foundation for future advancements in LBMs.

Authors:Zihan Lu, Tingying He, Jiayi Hong, Lijie Yao, Tobias Isenberg
Title: Designing Semantically-Resonant Abstract Patterns for Data Visualization
Abstract:
We present a structured design methodology for creating semantically-resonant abstract patterns, making the pattern design process accessible to the general public. Semantically-resonant patterns are those that intuitively evoke the concept they represent within a specific set (e.g., in a vegetable concept set, small dots for olives and large dots for tomatoes), analogous to the concept of semantically-resonant colors (e.g., using olive green for olives and red for tomatoes). Previous research has shown that semantically-resonant colors can improve chart reading speed, and designers have made attempts to integrate semantic cues into abstract pattern designs. However, a systematic framework for developing such patterns was lacking. To bridge this gap, we conducted a series of workshops with design experts, resulting in a design methodology that summarizes the methodology for designing semantically-resonant abstract patterns. We evaluated our design methodology through another series of workshops with non-design participants. The results indicate that our proposed design methodology effectively supports the general public in designing semantically-resonant abstract patterns for both abstract and concrete concepts.

Authors:Soohwan Lee, Seoyeong Hwang, Kyungho Lee
Title: Beyond Individual UX: Defining Group Experience(GX) as a New Paradigm for Group-centered AI
Abstract:
Recent advancements in HCI and AI have predominantly centered on individual user experiences, often neglecting the emergent dynamics of group interactions. This provocation introduces Group Experience(GX) to capture the collective perceptual, emotional, and cognitive dimensions that arise when individuals interact in cohesive groups. We challenge the conventional Human-centered AI paradigm and propose Group-centered AI(GCAI) as a framework that actively mediates group dynamics, amplifies diverse voices, and fosters ethical collective decision-making. Drawing on social psychology, organizational behavior, and group dynamics, we outline a group-centered design approach that balances individual autonomy with collective interests while developing novel evaluative metrics. Our analysis emphasizes rethinking traditional methodologies that focus solely on individual outcomes and advocates for innovative strategies to capture group collaboration. We call on researchers to bridge the gap between micro-level experiences and macro-level impacts, ultimately enriching and transforming collaborative human interactions.

Authors:Minxu Liu, Donghai Guan, Chuhang Zheng, Chunwei Tian, Jie Wen, Qi Zhu
Title: ViEEG: Hierarchical Visual Neural Representation for EEG Brain Decoding
Abstract:
Understanding and decoding brain activity into visual representations is a fundamental challenge at the intersection of neuroscience and artificial intelligence. While EEG visual decoding has shown promise due to its non-invasive, and low-cost nature, existing methods suffer from Hierarchical Neural Encoding Neglect (HNEN)-a critical limitation where flat neural representations fail to model the brain's hierarchical visual processing hierarchy. Inspired by the hierarchical organization of visual cortex, we propose ViEEG, a neuro-We further adopt hierarchical contrastive learning for EEG-CLIP representation alignment, enabling zero-shot object recognition. Extensive experiments on the THINGS-EEG dataset demonstrate that ViEEG significantly outperforms previous methods by a large margin in both subject-dependent and subject-independent settings. Results on the THINGS-MEG dataset further confirm ViEEG's generalization to different neural modalities. Our framework not only advances the performance frontier but also sets a new paradigm for EEG brain decoding. inspired framework that addresses HNEN. ViEEG decomposes each visual stimulus into three biologically aligned components-contour, foreground object, and contextual scene-serving as anchors for a three-stream EEG encoder. These EEG features are progressively integrated via cross-attention routing, simulating cortical information flow from low-level to high-level vision.

Authors:Domenique Zipperling, Luca Deck, Julia Lanzl, Niklas Kühl
Title: It's only fair when I think it's fair: How Gender Bias Alignment Undermines Distributive Fairness in Human-AI Collaboration
Abstract:
Human-AI collaboration is increasingly relevant in consequential areas where AI recommendations support human discretion. However, human-AI teams' effectiveness, capability, and fairness highly depend on human perceptions of AI. Positive fairness perceptions have been shown to foster trust and acceptance of AI recommendations. Yet, work on confirmation bias highlights that humans selectively adhere to AI recommendations that align with their expectations and beliefs -- despite not being necessarily correct or fair. This raises the question whether confirmation bias also transfers to the alignment of gender bias between human and AI decisions. In our study, we examine how gender bias alignment influences fairness perceptions and reliance. The results of a 2x2 between-subject study highlight the connection between gender bias alignment, fairness perceptions, and reliance, demonstrating that merely constructing a ``formally fair'' AI system is insufficient for optimal human-AI collaboration; ultimately, AI recommendations will likely be overridden if biases do not align.

Authors:Eduard Kuric, Peter Demcak, Matus Krajcovic
Title: Card Sorting Simulator: Augmenting Design of Logical Information Architectures with Large Language Models
Abstract:
Card sorting is a common ideation technique that elicits information on users' mental organization of content and functionality by having them sort items into categories. For more robust card sorting research, digital card sorting tools could benefit from providing quick automated feedback. Our objective of this research is to advance toward an instrument that applies artificial intelligence (AI) to augment card sorting. For this purpose, we develop the Card Sorting Simulator, a prototype tool that leverages Large Language Models (LLMs) to generate informative categorizations of cards. To illuminate how aligned the simulation is with card sorting by actual participants, and to inform the instrument's design decisions, we conducted a generalizability-focused comparative study. We obtained 28 pre-existing card sorting studies from real practitioners, comprising 1,399 participants, along with diverse contents and origins. With this dataset, we conducted a comprehensive and nuanced analysis of the agreement between actual card sorting results (clusterings of cards) and synthetic clusterings across a multitude of LLMs and prompt designs. Mutual information scores indicate a good degree of agreement to real result clustering, although similarity matrices also demonstrate inconsistencies from mental models, which can be attributed to their top-down nature. Furthermore, the number of cards or complexity of their labels impact the accuracy of its simulation. These findings bolster the case for AI augmentation in card sorting research as a source of meaningful preliminary feedback and highlight the need for further study for the development and validation of intelligent user research tools.

Authors:Siavash Ghorbany, Ming Hu, Siyuan Yao, Matthew Sisk, Chaoli Wang
Title: EcoSphere: A Decision-Support Tool for Automated Carbon Emission and Cost Optimization in Sustainable Urban Development
Abstract:
The construction industry is a major contributor to global greenhouse gas emissions, with embodied carbon being a key component. This study develops EcoSphere, an innovative software designed to evaluate and balance embodied and operational carbon emissions with construction and environmental costs in urban planning. Using high-resolution data from the National Structure Inventory, combined with computer vision and natural language processing applied to Google Street View and satellite imagery, EcoSphere categorizes buildings by structural and material characteristics with a bottom-up approach, creating a baseline emissions dataset. By simulating policy scenarios and mitigation strategies, EcoSphere provides policymakers and non-experts with actionable insights for sustainable development in cities and provide them with a vision of the environmental and financial results of their decisions. Case studies in Chicago and Indianapolis showcase how EcoSphere aids in assessing policy impacts on carbon emissions and costs, supporting data-driven progress toward carbon neutrality.

Authors:Hiba Eltigani, Rukhshan Haroon, Asli Kocak, Abdullah Bin Faisal, Noah Martin, Fahad Dogar
Title: WaLLM -- Insights from an LLM-Powered Chatbot deployment via WhatsApp
Abstract:
Recent advances in generative AI, such as ChatGPT, have transformed access to information in education, knowledge-seeking, and everyday decision-making. However, in many developing regions, access remains a challenge due to the persistent digital divide. To help bridge this gap, we developed WaLLM - a custom AI chatbot over WhatsApp, a widely used communication platform in developing regions. Beyond answering queries, WaLLM offers several features to enhance user engagement: a daily top question, suggested follow-up questions, trending and recent queries, and a leaderboard-based reward system. Our service has been operational for over 6 months, amassing over 14.7K queries from approximately 100 users. In this paper, we present WaLLM's design and a systematic analysis of logs to understand user interactions. Our results show that 55% of user queries seek factual information. "Health and well-being" was the most popular topic (28%), including queries about nutrition and disease, suggesting users view WaLLM as a reliable source. Two-thirds of users' activity occurred within 24 hours of the daily top question. Users who accessed the "Leaderboard" interacted with WaLLM 3x as those who did not. We conclude by discussing implications for culture-based customization, user interface design, and appropriate calibration of users' trust in AI systems for developing regions.

Authors:Jiawei Zhou, Kritika Venkatachalam, Minje Choi, Koustuv Saha, Munmun De Choudhury
Title: Communication Styles and Reader Preferences of LLM and Human Experts in Explaining Health Information
Abstract:
With the wide adoption of large language models (LLMs) in information assistance, it is essential to examine their alignment with human communication styles and values. We situate this study within the context of fact-checking health information, given the critical challenge of rectifying conceptions and building trust. Recent studies have explored the potential of LLM for health communication, but style differences between LLMs and human experts and associated reader perceptions remain under-explored. In this light, our study evaluates the communication styles of LLMs, focusing on how their explanations differ from those of humans in three core components of health communication: information, sender, and receiver. We compiled a dataset of 1498 health misinformation explanations from authoritative fact-checking organizations and generated LLM responses to inaccurate health information. Drawing from health communication theory, we evaluate communication styles across three key dimensions of information linguistic features, sender persuasive strategies, and receiver value alignments. We further assessed human perceptions through a blinded evaluation with 99 participants. Our findings reveal that LLM-generated articles showed significantly lower scores in persuasive strategies, certainty expressions, and alignment with social values and moral foundations. However, human evaluation demonstrated a strong preference for LLM content, with over 60% responses favoring LLM articles for clarity, completeness, and persuasiveness. Our results suggest that LLMs' structured approach to presenting information may be more effective at engaging readers despite scoring lower on traditional measures of quality in fact-checking and health communication.

Authors:Huiyun Tang, Anastasia Sergeeva
Title: Shots and Boosters: Exploring the Use of Combined Prebunking Interventions to Raise Critical Thinking and Create Long-Term Protection Against Misinformation
Abstract:
The problem of how to effectively mitigate the flow of misinformation remains a significant challenge. The classical approach to this is public disapproval of claims or "debunking." The approach is still widely used on social media, but it has some severe limitations in terms of applicability and efficiency. An alternative strategy is to enhance individuals' critical thinking through educational interventions. Instead of merely disproving misinformation, these approaches aim to strengthen users' reasoning skills, enabling them to evaluate and reject false information independently. In this position paper, we explore a combination of intervention methods designed to improve critical thinking in the context of online media consumption. We highlight the role of AI in supporting different stages of these interventions and present a design concept that integrates AI-driven strategies to foster critical reasoning and media literacy.

Authors:Arvind Srinivasan, Niklas Elmqvist
Title: HeedVision: Attention Awareness in Collaborative Immersive Analytics Environments
Abstract:
Group awareness--the ability to perceive the activities of collaborators in a shared space--is a vital mechanism to support effective coordination and joint data analysis in collaborative visualization. We introduce collaborative attention-aware visualizations (CAAVs) that track, record, and revisualize the collective attention of multiple users over time. We implement this concept in HeedVision, a standards-compliant WebXR system that runs on modern AR/VR headsets. Through a user study where pairs of analysts performed visual search tasks in HeedVision, we demonstrate how attention revisualization enhances collaborative performance in immersive analytics. Our findings reveal that CAAVs improve spatial coordination, search efficiency, and task load distribution among collaborators, with benefits varying by visualization context. This work extends attention awareness from individual to multi-user settings and provides empirical evidence for its benefits in collaborative immersive analytics.

Authors:Philippe Laban, Hiroaki Hayashi, Yingbo Zhou, Jennifer Neville
Title: LLMs Get Lost In Multi-Turn Conversation
Abstract:
Large Language Models (LLMs) are conversational interfaces. As such, LLMs have the potential to assist their users not only when they can fully specify the task at hand, but also to help them define, explore, and refine what they need through multi-turn conversational exchange. Although analysis of LLM conversation logs has confirmed that underspecification occurs frequently in user instructions, LLM evaluation has predominantly focused on the single-turn, fully-specified instruction setting. In this work, we perform large-scale simulation experiments to compare LLM performance in single- and multi-turn settings. Our experiments confirm that all the top open- and closed-weight LLMs we test exhibit significantly lower performance in multi-turn conversations than single-turn, with an average drop of 39% across six generation tasks. Analysis of 200,000+ simulated conversations decomposes the performance degradation into two components: a minor loss in aptitude and a significant increase in unreliability. We find that LLMs often make assumptions in early turns and prematurely attempt to generate final solutions, on which they overly rely. In simpler terms, we discover that *when LLMs take a wrong turn in a conversation, they get lost and do not recover*.

Authors:Xiyun Hu, Dizhi Ma, Fengming He, Zhengzhe Zhu, Shao-Kang Hsia, Chenfei Zhu, Ziyi Liu, Karthik Ramani
Title: GesPrompt: Leveraging Co-Speech Gestures to Augment LLM-Based Interaction in Virtual Reality
Abstract:
Large Language Model (LLM)-based copilots have shown great potential in Extended Reality (XR) applications. However, the user faces challenges when describing the 3D environments to the copilots due to the complexity of conveying spatial-temporal information through text or speech alone. To address this, we introduce GesPrompt, a multimodal XR interface that combines co-speech gestures with speech, allowing end-users to communicate more naturally and accurately with LLM-based copilots in XR environments. By incorporating gestures, GesPrompt extracts spatial-temporal reference from co-speech gestures, reducing the need for precise textual prompts and minimizing cognitive load for end-users. Our contributions include (1) a workflow to integrate gesture and speech input in the XR environment, (2) a prototype VR system that implements the workflow, and (3) a user study demonstrating its effectiveness in improving user communication in VR environments.

Authors:Manas Satish Bedmutha, Feng Chen, Andrea Hartzler, Trevor Cohen, Nadir Weibel
Title: Can Language Models Understand Social Behavior in Clinical Conversations?
Abstract:
Effective communication between providers and their patients influences health and care outcomes. The effectiveness of such conversations has been linked not only to the exchange of clinical information, but also to a range of interpersonal behaviors; commonly referred to as social signals, which are often conveyed through non-verbal cues and shape the quality of the patient-provider relationship. Recent advances in large language models (LLMs) have demonstrated an increasing ability to infer emotional and social behaviors even when analyzing only textual information. As automation increases also in clinical settings, such as for transcription of patient-provider conversations, there is growing potential for LLMs to automatically analyze and extract social behaviors from these interactions. To explore the foundational capabilities of LLMs in tracking social signals in clinical dialogue, we designed task-specific prompts and evaluated model performance across multiple architectures and prompting styles using a highly imbalanced, annotated dataset spanning 20 distinct social signals such as provider dominance, patient warmth, etc. We present the first system capable of tracking all these 20 coded signals, and uncover patterns in LLM behavior. Further analysis of model configurations and clinical context provides insights for enhancing LLM performance on social signal processing tasks in healthcare settings.

Authors:Logan Lane, Feiyu Lu, Shakiba Davari, Rob Teather, Doug A. Bowman
Title: Revisiting Performance Models of Distal Pointing Tasks in Virtual Reality
Abstract:
Performance models of interaction, such as Fitts Law, are important tools for predicting and explaining human motor performance and for designing high-performance user interfaces. Extensive prior work has proposed such models for the 3D interaction task of distal pointing, in which the user points their hand or a device at a distant target in order to select it. However, there is no consensus on how to compute the index of difficulty for distal pointing tasks. We present a preliminary study suggesting that existing models may not be sufficient to model distal pointing performance with current virtual reality technologies. Based on these results, we hypothesized that both the form of the model and the standard method for collecting empirical data for pointing tasks might need to change in order to achieve a more accurate and valid distal pointing model. In our main study, we used a new methodology to collect distal pointing data and evaluated traditional models, purely ballistic models, and two-part models. Ultimately, we found that the best model used a simple Fitts-Law-style index of difficulty with angular measures of amplitude and width.

Authors:Qiang Sun, Tingting Bi, Sirui Li, Eun-Jung Holden, Paul Duuring, Kai Niu, Wei Liu
Title: SymbioticRAG: Enhancing Document Intelligence Through Human-LLM Symbiotic Collaboration
Abstract:
We present \textbf{SymbioticRAG}, a novel framework that fundamentally reimagines Retrieval-Augmented Generation~(RAG) systems by establishing a bidirectional learning relationship between humans and machines. Our approach addresses two critical challenges in current RAG systems: the inherently human-centered nature of relevance determination and users' progression from "unconscious incompetence" in query formulation. SymbioticRAG introduces a two-tier solution where Level 1 enables direct human curation of retrieved content through interactive source document exploration, while Level 2 aims to build personalized retrieval models based on captured user interactions. We implement Level 1 through three key components: (1)~a comprehensive document processing pipeline with specialized models for layout detection, OCR, and extraction of tables, formulas, and figures; (2)~an extensible retriever module supporting multiple retrieval strategies; and (3)~an interactive interface that facilitates both user engagement and interaction data logging. We experiment Level 2 implementation via a retriever strategy incorporated LLM summarized user intention from user interaction logs. To maintain high-quality data preparation, we develop a human-on-the-loop validation interface that improves pipeline output while advancing research in specialized extraction tasks. Evaluation across three scenarios (literature review, geological exploration, and education) demonstrates significant improvements in retrieval relevance and user satisfaction compared to traditional RAG approaches. To facilitate broader research and further advancement of SymbioticRAG Level 2 implementation, we will make our system openly accessible to the research community.

Authors:Safikureshi Mondal, Subhasis Dasgupta, Amarnath Gupta
Title: Minimally Supervised Hierarchical Domain Intent Learning for CRS
Abstract:
Modeling domain intent within an evolving domain structure presents a significant challenge for domain-specific conversational recommendation systems (CRS). The conventional approach involves training an intent model using utterance-intent pairs. However, as new intents and patterns emerge, the model must be continuously updated while preserving existing relationships and maintaining efficient retrieval. This process leads to substantial growth in utterance-intent pairs, making manual labeling increasingly costly and impractical. In this paper, we propose an efficient solution for constructing a dynamic hierarchical structure that minimizes the number of user utterances required to achieve adequate domain knowledge coverage. To this end, we introduce a neural network-based attention-driven hierarchical clustering algorithm designed to optimize intent grouping using minimal data. The proposed method builds upon and integrates concepts from two existing flat clustering algorithms DEC and NAM, both of which utilize neural attention mechanisms. We apply our approach to a curated subset of 44,000 questions from the business food domain. Experimental results demonstrate that constructing the hierarchy using a stratified sampling strategy significantly reduces the number of questions needed to represent the evolving intent structure. Our findings indicate that this approach enables efficient coverage of dynamic domain knowledge without frequent retraining, thereby enhancing scalability and adaptability in domain-specific CSRs.

Authors:Ananya Ipsita, Ramesh Kaki, Mayank Patel, Asim Unmesh, Kylie A. Peppler, Karthik Ramani
Title: Interactive authoring of outcome-oriented lesson plans for immersive Virtual Reality training
Abstract:
Immersive Virtual Reality (iVR) applications have shown immense potential for skill training and learning in manufacturing. However, authoring of such applications requires technical expertise, which makes it difficult for educators to author instructions targeted at desired learning outcomes. We present FlowTrainer, an LLM-assisted interactive system to allow educators to author lesson plans for their iVR instruction based on desired goals. The authoring workflow is supported by Backward design to align the planned lesson based on the desired outcomes. We implemented a welding use case and conducted a user study with welding experts to test the effectiveness of the system in authoring outcome-oriented lesson plans. The study results showed that the system allowed users to plan lesson plans based on desired outcomes while reducing the time and technical expertise required for the authoring process. We believe that such efforts can allow widespread adoption of iVR solutions in manufacturing training to meet the workforce demands in the industry.

Authors:Xiaoshan Huang, Jie Gao, Haolun Wu
Title: SSRLBot: Designing and Developing a Large Language Model-based Agent using Socially Shared Regulated Learning
Abstract:
Large language model (LLM)--based agents have emerged as pivotal tools in assisting human experts across various fields by transforming complex tasks into more efficient workflows and providing actionable stakeholder insights. Despite their potential, the application of LLM-based agents for medical education remains underexplored. The study aims to assist in evaluating the students' process and outcomes on medical case diagnosis and discussion while incorporating the theoretical framework of Socially Shared Regulation of Learning (SSRL) to assess student performance. SSRL emphasizes metacognitive, cognitive, motivational, and emotional interactions, highlighting the collaborative management of learning processes to improve decision-making outcomes. Grounded in SSRL theory, this tool paper introduces SSRLBot, an LLM-based agent designed to enable team members to reflect on their diagnostic performance and the key SSRL skills that foster team success. SSRLBot's core functions include summarizing dialogue content, analyzing participants' SSRL skills, and evaluating students' diagnostic results. Meanwhile, we evaluated SSRLBot through diagnostic conversation data collected from six groups (12 participants, 1926 conversational turns). Results showed that SSRLBot can deliver detailed, theory-aligned evaluations, link specific behaviors to SSRL dimensions, and offer actionable recommendations for improving teamwork. The findings address a critical gap in medical education, advancing the application of LLM agents to enhance team-based decision-making and collaboration in high-stakes environments.

Authors:Maria-Magdalena Wolf, Henrik Schmidt, Michael Christl, Jana Fank, Frank Diermeyer
Title: A User-Centered Teleoperation GUI for Automated Vehicles: Identifying and Evaluating Information Requirements for Remote Driving and Assistance
Abstract:
Teleoperation emerged as a promising fallback for situations beyond the capabilities of automated vehicles. Nevertheless, teleoperation still faces challenges, such as reduced situational awareness. Since situational awareness is primarily built through the remote operator's visual perception, the Graphical User Interface (GUI) design is critical. In addition to video feeds, supplemental informational elements are crucial - not only for the predominantly studied Remote Driving but also for the arising desk-based Remote Assistance concepts. This work develops a GUI for different teleoperation concepts by identifying key informational elements during the teleoperation process through expert interviews (N = 9). Following this, a static and dynamic GUI prototype is developed and evaluated in a click-dummy study (N = 36). Thereby, the dynamic GUI adapts the number of displayed elements according to the teleoperation phase. Results show that both GUIs achieve good System Usability Scale (SUS) ratings, with the dynamic GUI significantly outperforming the static version in both usability and task completion time. The User Experience Questionnaire (UEQ) score shows potential for improvement. To enhance the user experience, the GUI should be evaluated in a follow-up study that includes interaction with a real vehicle.

Authors:Zheng Hui, Xiaokai Wei, Yexi Jiang, Kevin Gao, Chen Wang, Frank Ong, Se-eun Yoon, Rachit Pareek, Michelle Gong
Title: MATCHA: Can Multi-Agent Collaboration Build a Trustworthy Conversational Recommender?
Abstract:
In this paper, we propose a multi-agent collaboration framework called MATCHA for conversational recommendation system, leveraging large language models (LLMs) to enhance personalization and user engagement. Users can request recommendations via free-form text and receive curated lists aligned with their interests, preferences, and constraints. Our system introduces specialized agents for intent analysis, candidate generation, ranking, re-ranking, explainability, and safeguards. These agents collaboratively improve recommendations accuracy, diversity, and safety. On eight metrics, our model achieves superior or comparable performance to the current state-of-the-art. Through comparisons with six baseline models, our approach addresses key challenges in conversational recommendation systems for game recommendations, including: (1) handling complex, user-specific requests, (2) enhancing personalization through multi-agent collaboration, (3) empirical evaluation and deployment, and (4) ensuring safe and trustworthy interactions.

Authors:Anindya Bijoy Das, Shibbir Ahmed, Shahnewaz Karim Sakib
Title: Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models
Abstract:
Clinical summarization is crucial in healthcare as it distills complex medical data into digestible information, enhancing patient understanding and care management. Large language models (LLMs) have shown significant potential in automating and improving the accuracy of such summarizations due to their advanced natural language understanding capabilities. These models are particularly applicable in the context of summarizing medical/clinical texts, where precise and concise information transfer is essential. In this paper, we investigate the effectiveness of open-source LLMs in extracting key events from discharge reports, including admission reasons, major in-hospital events, and critical follow-up actions. In addition, we also assess the prevalence of various types of hallucinations in the summaries produced by these models. Detecting hallucinations is vital as it directly influences the reliability of the information, potentially affecting patient care and treatment outcomes. We conduct comprehensive simulations to rigorously evaluate the performance of these models, further probing the accuracy and fidelity of the extracted content in clinical summarization. Our results reveal that while the LLMs (e.g., Qwen2.5 and DeepSeek-v2) perform quite well in capturing admission reasons and hospitalization events, they are generally less consistent when it comes to identifying follow-up recommendations, highlighting broader challenges in leveraging LLMs for comprehensive summarization.

Authors:Shuting Zhao, Linxin Bai, Liangjing Shao, Ye Zhang, Xinrong Chen
Title: SSD-Poser: Avatar Pose Estimation with State Space Duality from Sparse Observations
Abstract:
The growing applications of AR/VR increase the demand for real-time full-body pose estimation from Head-Mounted Displays (HMDs). Although HMDs provide joint signals from the head and hands, reconstructing a full-body pose remains challenging due to the unconstrained lower body. Recent advancements often rely on conventional neural networks and generative models to improve performance in this task, such as Transformers and diffusion models. However, these approaches struggle to strike a balance between achieving precise pose reconstruction and maintaining fast inference speed. To overcome these challenges, a lightweight and efficient model, SSD-Poser, is designed for robust full-body motion estimation from sparse observations. SSD-Poser incorporates a well-designed hybrid encoder, State Space Attention Encoders, to adapt the state space duality to complex motion poses and enable real-time realistic pose reconstruction. Moreover, a Frequency-Aware Decoder is introduced to mitigate jitter caused by variable-frequency motion signals, remarkably enhancing the motion smoothness. Comprehensive experiments on the AMASS dataset demonstrate that SSD-Poser achieves exceptional accuracy and computational efficiency, showing outstanding inference efficiency compared to state-of-the-art methods.

Authors:Celia Chen, Alex Leitch
Title: Evaluating Machine Expertise: How Graduate Students Develop Frameworks for Assessing GenAI Content
Abstract:
This paper examines how graduate students develop frameworks for evaluating machine-generated expertise in web-based interactions with large language models (LLMs). Through a qualitative study combining surveys, LLM interaction transcripts, and in-depth interviews with 14 graduate students, we identify patterns in how these emerging professionals assess and engage with AI-generated content. Our findings reveal that students construct evaluation frameworks shaped by three main factors: professional identity, verification capabilities, and system navigation experience. Rather than uniformly accepting or rejecting LLM outputs, students protect domains central to their professional identities while delegating others--with managers preserving conceptual work, designers safeguarding creative processes, and programmers maintaining control over core technical expertise. These evaluation frameworks are further influenced by students' ability to verify different types of content and their experience navigating complex systems. This research contributes to web science by highlighting emerging human-genAI interaction patterns and suggesting how platforms might better support users in developing effective frameworks for evaluating machine-generated expertise signals in AI-mediated web environments.

Authors:Justus Flerlage, Ilja Behnke, Odej Kao
Title: Towards Machine-Generated Code for the Resolution of User Intentions
Abstract:
The growing capabilities of Artificial Intelligence (AI), particularly Large Language Models (LLMs), prompt a reassessment of the interaction mechanisms between users and their devices. Currently, users are required to use a set of high-level applications to achieve their desired results. However, the advent of AI may signal a shift in this regard, as its capabilities have generated novel prospects for user-provided intent resolution through the deployment of model-generated code. This development represents a significant progression in the realm of hybrid workflows, where human and artificial intelligence collaborate to address user intentions, with the former responsible for defining these intentions and the latter for implementing the solutions to address them. In this paper, we investigate the feasibility of generating and executing workflows through code generation that results from prompting an LLM with a concrete user intention, and a simplified application programming interface for a GUI-less operating system. We provide an in-depth analysis and comparison of various user intentions, the resulting code, and its execution. The findings demonstrate the general feasibility of our approach and that the employed LLM, GPT-4o-mini, exhibits remarkable proficiency in the generation of code-oriented workflows in accordance with provided user intentions.

Authors:Lirui Guo, Michael G. Burke, Wynita M. Griggs
Title: Exploring human-SAV interaction using large language models: The impact of psychological ownership and anthropomorphism on user experience
Abstract:
There has been extensive prior work exploring how psychological factors such as anthropomorphism affect the adoption of shared autonomous vehicles (SAVs). However, limited research has been conducted on how prompt strategies in large language model (LLM)-powered SAV User Interfaces (UIs) affect users' perceptions, experiences, and intentions to adopt such technology. In this work, we investigate how conversational UIs powered by LLMs drive these psychological factors and psychological ownership, the sense of possession a user may come to feel towards an entity or object they may not legally own. We designed four SAV UIs with varying levels of anthropomorphic characteristics and psychological ownership triggers. Quantitative measures of psychological ownership, anthropomorphism, quality of service, disclosure tendency, sentiment of SAV responses, and overall acceptance were collected after participants interacted with each SAV. Qualitative feedback was also gathered regarding the experience of psychological ownership during the interactions. The results indicate that an SAV conversational UI designed to be more anthropomorphic and to induce psychological ownership improved users' perceptions of the SAV's human-like qualities and improved the sentiment of responses compared to a control condition. These findings provide practical guidance for designing LLM-based conversational UIs that enhance user experience and adoption of SAVs.

Authors:Md Saeed Siddik, Hao Li, Cor-Paul Bezemer
Title: A Systematic Literature Review of Software Engineering Research on Jupyter Notebook
Abstract:
Context: Jupyter Notebook has emerged as a versatile tool that transforms how researchers, developers, and data scientists conduct and communicate their work. As the adoption of Jupyter notebooks continues to rise, so does the interest from the software engineering research community in improving the software engineering practices for Jupyter notebooks. Objective: The purpose of this study is to analyze trends, gaps, and methodologies used in software engineering research on Jupyter notebooks. Method: We selected 146 relevant publications from the DBLP Computer Science Bibliography up to the end of 2024, following established systematic literature review guidelines. We explored publication trends, categorized them based on software engineering topics, and reported findings based on those topics. Results: The most popular venues for publishing software engineering research on Jupyter notebooks are related to human-computer interaction instead of traditional software engineering venues. Researchers have addressed a wide range of software engineering topics on notebooks, such as code reuse, readability, and execution environment. Although reusability is one of the research topics for Jupyter notebooks, only 64 of the 146 studies can be reused based on their provided URLs. Additionally, most replication packages are not hosted on permanent repositories for long-term availability and adherence to open science principles. Conclusion: Solutions specific to notebooks for software engineering issues, including testing, refactoring, and documentation, are underexplored. Future research opportunities exist in automatic testing frameworks, refactoring clones between notebooks, and generating group documentation for coherent code cells.

Authors:Jazlyn Hellman, Itai Epstein, Jinghui Cheng, Jin L. C. Guo
Title: "Ohhh, He's the Boss!": Unpacking Power Dynamics Among Developers, Designers, and End-Users in FLOSS Usability
Abstract:
Addressing usability in free, libre, and open-source software (FLOSS) is a challenging issue, particularly due to a long-existing "by developer, for developer" mentality. Engaging designers and end-users to work with developers can help improve its usability, but unequal power dynamics among those stakeholder roles must be mitigated. To explore how the power of different FLOSS stakeholders manifests and can be mediated during collaboration, we conducted eight design workshops with different combinations of key FLOSS stakeholders (i.e., developers, designers, and end-users). Leveraging existing theories on Dimensions of Power, we revealed how participants navigate existing role-based power structures through resource utilization, knowledge gap management, and experience referencing. We also observed that participants exhibited diverse behaviors confirming and challenging the status quo of FLOSS usability. Overall, our results contribute to a comprehensive understanding of the power dynamics among FLOSS stakeholders, providing valuable insights into ways to balance their power to improve FLOSS usability. Our work also serves as an exemplar of using design workshops as a research method to study power dynamics during collaboration that are usually hidden in the field.

Authors:Ruozhu Sheng, Jinghong Li, Shinobu Hasegawa
Title: Multimodal Non-Semantic Feature Fusion for Predicting Segment Access Frequency in Lecture Archives
Abstract:
This study proposes a multimodal neural network-based approach to predict segment access frequency in lecture archives. These archives, widely used as supplementary resources in modern education, often consist of long, unedited recordings that make it difficult to keep students engaged. Captured directly from face-to-face lectures without post-processing, they lack visual appeal. Meanwhile, the increasing volume of recorded material renders manual editing and annotation impractical. Automatically detecting high-engagement segments is thus crucial for improving accessibility and maintaining learning effectiveness. Our research focuses on real classroom lecture archives, characterized by unedited footage, no additional hardware (e.g., eye-tracking), and limited student numbers. We approximate student engagement using segment access frequency as a proxy. Our model integrates multimodal features from teachers' actions (via OpenPose and optical flow), audio spectrograms, and slide page progression. These features are deliberately chosen for their non-semantic nature, making the approach applicable regardless of lecture language. Experiments show that our best model achieves a Pearson correlation of 0.5143 in 7-fold cross-validation and 69.32 percent average accuracy in a downstream three-class classification task. The results, obtained with high computational efficiency and a small dataset, demonstrate the practical feasibility of our system in real-world educational contexts.

Authors:Li He, He Zhao, Stephen Wan, Dadong Wang, Lina Yao, Tongliang Liu
Title: Direct Advantage Regression: Aligning LLMs with Online AI Reward
Abstract:
Online AI Feedback (OAIF) presents a promising alternative to Reinforcement Learning from Human Feedback (RLHF) by utilizing online AI preference in aligning language models (LLMs). However, the straightforward replacement of humans with AI deprives LLMs from learning more fine-grained AI supervision beyond binary signals. In this paper, we propose Direct Advantage Regression (DAR), a simple alignment algorithm using online AI reward to optimize policy improvement through weighted supervised fine-tuning. As an RL-free approach, DAR maintains theoretical consistency with online RLHF pipelines while significantly reducing implementation complexity and improving learning efficiency. Our empirical results underscore that AI reward is a better form of AI supervision consistently achieving higher human-AI agreement as opposed to AI preference. Additionally, evaluations using GPT-4-Turbo and MT-bench show that DAR outperforms both OAIF and online RLHF baselines.

Authors:Renaud Bougueng Tchemeube, Jeff Ens, Cale Plut, Philippe Pasquier, Maryam Safi, Yvan Grabit, Jean-Baptiste Rolland
Title: Evaluating Human-AI Interaction via Usability, User Experience and Acceptance Measures for MMM-C: A Creative AI System for Music Composition
Abstract:
With the rise of artificial intelligence (AI), there has been increasing interest in human-AI co-creation in a variety of artistic domains including music as AI-driven systems are frequently able to generate human-competitive artifacts. Now, the implications of such systems for musical practice are being investigated. We report on a thorough evaluation of the user adoption of the Multi-Track Music Machine (MMM) as a co-creative AI tool for music composers. To do this, we integrate MMM into Cubase, a popular Digital Audio Workstation (DAW) by Steinberg, by producing a "1-parameter" plugin interface named MMM-Cubase (MMM-C), which enables human-AI co-composition. We contribute a methodological assemblage as a 3-part mixed method study measuring usability, user experience and technology acceptance of the system across two groups of expert-level composers: hobbyists and professionals. Results show positive usability and acceptance scores. Users report experiences of novelty, surprise and ease of use from using the system, and limitations on controllability and predictability of the interface when generating music. Findings indicate no significant difference between the two user groups.

Authors:Renaud Bougueng Tchemeube, Jeff Ens, Philippe Pasquier
Title: Calliope: An Online Generative Music System for Symbolic Multi-Track Composition
Abstract:
With the rise of artificial intelligence in recent years, there has been a rapid increase in its application towards creative domains, including music. There exist many systems built that apply machine learning approaches to the problem of computer-assisted music composition (CAC). Calliope is a web application that assists users in performing a variety of multi-track composition tasks in the symbolic domain. The user can upload (Musical Instrument Digital Interface) MIDI files, visualize and edit MIDI tracks, and generate partial (via bar in-filling) or complete multi-track content using the Multi-Track Music Machine (MMM). Generation of new MIDI excerpts can be done in batch and can be combined with active playback listening for an enhanced assisted-composition workflow. The user can export generated MIDI materials or directly stream MIDI playback from the system to their favorite Digital Audio Workstation (DAW). We present a demonstration of the system, its features, generative parameters and describe the co-creative workflows that it affords.

Authors:Renaud Bougueng Tchemeube, Jeff Ens, Philippe Pasquier
Title: Apollo: An Interactive Environment for Generating Symbolic Musical Phrases using Corpus-based Style Imitation
Abstract:
With the recent developments in machine intelligence and web technologies, new generative music systems are being explored for assisted composition using machine learning techniques on the web. Such systems are built for various tasks such as melodic, harmonic or rhythm generation, music interpolation, continuation and style imitation. In this paper, we introduce Apollo, an interactive music application for generating symbolic phrases of conventional western music using corpus-based style imitation techniques. In addition to enabling the construction and management of symbolic musical corpora, the system makes it possible for music artists and researchers to generate new musical phrases in the style of the proposed corpus. The system is available as a desktop application. The generated symbolic music materials, encoded in the MIDI format, can be exported or streamed for various purposes including using them as seed material for musical projects. We present the system design, implementation details, discuss and conclude with future work for the system.

Authors:Nayoung Choi, Peace Cyebukayire, Jinho D. Choi
Title: Tinker Tales: Interactive Storytelling Framework for Early Childhood Narrative Development and AI Literacy
Abstract:
This paper presents Tinker Tales, an interactive storytelling framework in the format of a board game, designed to support both narrative development and AI literacy in early childhood. The framework integrates tangible and speech-based interactions with AI through NFC chip-attached pawns and tokens, along with a speaker and microphone. Children select and define key story elements-such as characters, places, items, and emotions-using the pawns and tokens, providing further details to the AI and receiving proper assistance, similar to how adults prompt AI for specific tasks (e.g., writing). For evaluation, several game sessions were simulated with a child AI agent, and the quality and safety of the generated stories were assessed from various perspectives. This work highlights the potential of combining physical and digital elements in AI literacy, offering a safe and engaging way for children to learn how to effectively collaborate with AI.

Authors:Marharyta Domnich, Rasmus Moorits Veski, Julius Välja, Kadi Tulver, Raul Vicente
Title: Predicting Satisfaction of Counterfactual Explanations from Human Ratings of Explanatory Qualities
Abstract:
Counterfactual explanations are a widely used approach in Explainable AI, offering actionable insights into decision-making by illustrating how small changes to input data can lead to different outcomes. Despite their importance, evaluating the quality of counterfactual explanations remains an open problem. Traditional quantitative metrics, such as sparsity or proximity, fail to fully account for human preferences in explanations, while user studies are insightful but not scalable. Moreover, relying only on a single overall satisfaction rating does not lead to a nuanced understanding of why certain explanations are effective or not. To address this, we analyze a dataset of counterfactual explanations that were evaluated by 206 human participants, who rated not only overall satisfaction but also seven explanatory criteria: feasibility, coherence, complexity, understandability, completeness, fairness, and trust. Modeling overall satisfaction as a function of these criteria, we find that feasibility (the actionability of suggested changes) and trust (the belief that the changes would lead to the desired outcome) consistently stand out as the strongest predictors of user satisfaction, though completeness also emerges as a meaningful contributor. Crucially, even excluding feasibility and trust, other metrics explain 58% of the variance, highlighting the importance of additional explanatory qualities. Complexity appears independent, suggesting more detailed explanations do not necessarily reduce satisfaction. Strong metric correlations imply a latent structure in how users judge quality, and demographic background significantly shapes ranking patterns. These insights inform the design of counterfactual algorithms that adapt explanatory qualities to user expertise and domain context.

Authors:Ritvik Nair, Timothy Merino, Julian Togelius
Title: God's Innovation Project -- Empowering The Player With Generative AI
Abstract:
In this paper, we present God's Innovation Project (GIP), a god game where players collect words to dynamically terraform the landscape using generative AI. A god game is a genre where players take on the role of a deity, indirectly influencing Non-Player Characters (NPCs) to perform various tasks. These games typically grant players supernatural abilities, such as terrain manipulation or weather control. Traditional god games rely on predefined environments and mechanics, typically created by a human designer. In contrast, GIP allows players to shape the game world procedurally through text-based input. Using a lightweight generative AI model, we create a gamified pipeline which transforms the player's text prompts into playable game terrain in real time. To evaluate the impact of this AI-driven mechanic, we conduct a user study analyzing how players interacted with and experienced the system. Our findings provide insights into player engagement, the effectiveness of AI-generated terrain, and the role of generative AI as an interactive game mechanic.

Authors:Ivan Sviridov, Amina Miftakhova, Artemiy Tereshchenko, Galina Zubkova, Pavel Blinov, Andrey Savchenko
Title: 3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark
Abstract:
Though Large Vision-Language Models (LVLMs) are being actively explored in medicine, their ability to conduct telemedicine consultations combining accurate diagnosis with professional dialogue remains underexplored. In this paper, we present 3MDBench (Medical Multimodal Multi-agent Dialogue Benchmark), an open-source framework for simulating and evaluating LVLM-driven telemedical consultations. 3MDBench simulates patient variability through four temperament-based Patient Agents and an Assessor Agent that jointly evaluate diagnostic accuracy and dialogue quality. It includes 3013 cases across 34 diagnoses drawn from real-world telemedicine interactions, combining textual and image-based data. The experimental study compares diagnostic strategies for popular LVLMs, including GPT-4o-mini, LLaVA-3.2-11B-Vision-Instruct, and Qwen2-VL-7B-Instruct. We demonstrate that multimodal dialogue with internal reasoning improves F1 score by 6.5% over non-dialogue settings, highlighting the importance of context-aware, information-seeking questioning. Moreover, injecting predictions from a diagnostic convolutional network into the LVLM's context boosts F1 by up to 20%. Source code is available at https://anonymous.4open.science/r/3mdbench_acl-0511.

Authors:Takaya Arita, Wenxian Zheng, Reiji Suzuki, Fuminori Akiba
Title: Assessing LLMs in Art Contexts: Critique Generation and Theory of Mind Evaluation
Abstract:
This study explored how large language models (LLMs) perform in two areas related to art: writing critiques of artworks and reasoning about mental states (Theory of Mind, or ToM) in art-related situations. For the critique generation part, we built a system that combines Noel Carroll's evaluative framework with a broad selection of art criticism theories. The model was prompted to first write a full-length critique and then shorter, more coherent versions using a step-by-step prompting process. These AI-generated critiques were then compared with those written by human experts in a Turing test-style evaluation. In many cases, human subjects had difficulty telling which was which, and the results suggest that LLMs can produce critiques that are not only plausible in style but also rich in interpretation, as long as they are carefully guided. In the second part, we introduced new simple ToM tasks based on situations involving interpretation, emotion, and moral tension, which can appear in the context of art. These go beyond standard false-belief tests and allow for more complex, socially embedded forms of reasoning. We tested 41 recent LLMs and found that their performance varied across tasks and models. In particular, tasks that involved affective or ambiguous situations tended to reveal clearer differences. Taken together, these results help clarify how LLMs respond to complex interpretative challenges, revealing both their cognitive limitations and potential. While our findings do not directly contradict the so-called Generative AI Paradox--the idea that LLMs can produce expert-like output without genuine understanding--they suggest that, depending on how LLMs are instructed, such as through carefully designed prompts, these models may begin to show behaviors that resemble understanding more closely than we might assume.

Authors:Gennie Mansi, Naveena Karusala, Mark Riedl
Title: Legally-Informed Explainable AI
Abstract:
Explanations for artificial intelligence (AI) systems are intended to support the people who are impacted by AI systems in high-stakes decision-making environments, such as doctors, patients, teachers, students, housing applicants, and many others. To protect people and support the responsible development of AI, explanations need to be actionable--helping people take pragmatic action in response to an AI system--and contestable--enabling people to push back against an AI system and its determinations. For many high-stakes domains, such as healthcare, education, and finance, the sociotechnical environment includes significant legal implications that impact how people use AI explanations. For example, physicians who use AI decision support systems may need information on how accepting or rejecting an AI determination will protect them from lawsuits or help them advocate for their patients. In this paper, we make the case for Legally-Informed Explainable AI, responding to the need to integrate and design for legal considerations when creating AI explanations. We describe three stakeholder groups with different informational and actionability needs, and provide practical recommendations to tackle design challenges around the design of explainable AI systems that incorporate legal considerations.

Authors:Lauren Olson, Ricarda Anna-Lena Fischer, Florian Kunneman, Emitzá Guzmán
Title: Who Speaks for Ethics? How Demographics Shape Ethical Advocacy in Software Development
Abstract:
The integration of ethics into software development faces significant challenges due to market fundamentalism in organizational practices, where profit often takes precedence over ethical considerations. Additionally, the critical influence of practitioners' individual backgrounds on ethical decision-making remains underexplored, highlighting a gap in comprehensive research. This is especially essential to understand due to the demographic imbalance in software roles. This study investigates ethical concerns in software development, focusing on how they are perceived, prioritized, and addressed by demographically different practitioners. By surveying 217 software practitioners across diverse roles, industries, and countries, we identify critical barriers to ethical integration and examine practitioners' capacity to mitigate these issues. Our findings reveal pronounced demographic disparities, with marginalized groups - including women, BIPOC, and disabled individuals - reporting ethical concerns at higher frequencies. Notably, marginalized practitioners demonstrated heightened sensitivity to ethical implementation and greater empowerment to address them. However, practitioners overall often lack the support needed to address ethical challenges effectively. These insights underscore the urgent need for reforms in software education and development processes that center on diverse perspectives. Such reforms are essential to advancing ethical integration in software development and ensuring responsible computing practices in an increasingly complex technological landscape.

Authors:Mahir Akgun, Sacip Toker
Title: Struggle First, Prompt Later: How Task Complexity Shapes Learning with GenAI-Assisted Pretesting
Abstract:
This study examines the role of AI-assisted pretesting in enhancing learning outcomes, particularly when integrated with generative AI tools like ChatGPT. Pretesting, a learning strategy in which students attempt to answer questions or solve problems before receiving instruction, has been shown to improve retention by activating prior knowledge. The adaptability and interactivity of AI-assisted pretesting introduce new opportunities for optimizing learning in digital environments. Across three experimental studies, we explored how pretesting strategies, task characteristics, and student motivation influence learning. Findings suggest that AI-assisted pretesting enhances learning outcomes, particularly for tasks requiring higher-order thinking. While adaptive AI-driven pretesting increased engagement, its benefits were most pronounced in complex, exploratory tasks rather than straightforward computational problems. These results highlight the importance of aligning pretesting strategies with task demands, demonstrating that AI can optimize learning when applied to tasks requiring deeper cognitive engagement. This research provides insights into how AI-assisted pretesting can be effectively integrated with generative AI tools to enhance both cognitive and motivational outcomes in learning environments.

Authors:Lázaro Costa, Susana Barbosa, Jácome Cunha
Title: Let's Talk About It: Making Scientific Computational Reproducibility Easy
Abstract:
Computational reproducibility of scientific results, that is, the execution of a computational experiment (e.g., a script) using its original settings (data, code, etc.), should always be possible. However, reproducibility has become a significant challenge, as researchers often face difficulties in accurately replicating experiments due to inconsistencies in documentation, setup configurations, and missing data. This lack of reproducibility may undermine the credibility of scientific results. To address this issue, we propose a conversational, text-based tool that allows researchers to easily reproduce computational experiments (theirs or from others) and package them in a single file that can be re-executed with just a double click on any computer, requiring the installation of a single widely-used software. Researchers interact with the platform in natural language, which our tool processes to automatically create a computational environment able to execute the provided experiment/code. We conducted two studies to evaluate our proposal. In the first study, we gathered qualitative data by executing 18 experiments from the literature. Although in some cases it was not possible to execute the experiment, in most instances, it was necessary to have little or even no interaction for the tool to reproduce the results. We also conducted a user study comparing our tool with an enterprise-level one. During this study, we measured the usability of both tools using the System Usability Scale (SUS) and participants' workload using the NASA Task Load Index (TLX). The results show a statistically significant difference between both tools in favor of our proposal, demonstrating that the usability and workload of our tool are superior to the current state of the art.

Authors:Irene Hou, Owen Man, Kate Hamilton, Srishty Muthusekaran, Jeffin Johnykutty, Leili Zadeh, Stephen MacNeil
Title: "All Roads Lead to ChatGPT": How Generative AI is Eroding Social Interactions and Student Learning Communities
Abstract:
The widespread adoption of generative AI is already impacting learning and help-seeking. While the benefits of generative AI are well-understood, recent studies have also raised concerns about increased potential for cheating and negative impacts on students' metacognition and critical thinking. However, the potential impacts on social interactions, peer learning, and classroom dynamics are not yet well understood. To investigate these aspects, we conducted 17 semi-structured interviews with undergraduate computing students across seven R1 universities in North America. Our findings suggest that help-seeking requests are now often mediated by generative AI. For example, students often redirected questions from their peers to generative AI instead of providing assistance themselves, undermining peer interaction. Students also reported feeling increasingly isolated and demotivated as the social support systems they rely on begin to break down. These findings are concerning given the important role that social interactions play in students' learning and sense of belonging.

Authors:Koustuv Saha, Yoshee Jain, Munmun De Choudhury
Title: Linguistic Comparison of AI- and Human-Written Responses to Online Mental Health Queries
Abstract:
The ubiquity and widespread use of digital and online technologies have transformed mental health support, with online mental health communities (OMHCs) providing safe spaces for peer support. More recently, generative AI and large language models (LLMs) have introduced new possibilities for scalable, around-the-clock mental health assistance that could potentially augment and supplement the capabilities of OMHCs. Although genAI shows promise in delivering immediate and personalized responses, their effectiveness in replicating the nuanced, experience-based support of human peers remains an open question. In this study, we harnessed 24,114 posts and 138,758 online community (OC) responses from 55 OMHCs on Reddit. We prompted several state-of-the-art LLMs (GPT-4-Turbo, Llama-3, and Mistral-7B) with these posts, and compared their (AI) responses to human-written (OC) responses based on a variety of linguistic measures across psycholinguistics and lexico-semantics. Our findings revealed that AI responses are more verbose, readable, and analytically structured, but lack linguistic diversity and personal narratives inherent in human-human interactions. Through a qualitative examination, we found validation as well as complementary insights into the nature of AI responses, such as its neutrality of stance and the absence of seeking back-and-forth clarifications. We discuss the ethical and practical implications of integrating generative AI into OMHCs, advocating for frameworks that balance AI's scalability and timeliness with the irreplaceable authenticity, social interactiveness, and expertise of human connections that form the ethos of online support communities.

Authors:Leszek Luchowski, Dariusz Pojda
Title: Visualization of a multidimensional point cloud as a 3D swarm of avatars
Abstract:
This paper proposes an innovative technique for representing multidimensional datasets using icons inspired by Chernoff faces. Our approach combines classical projection techniques with the explicit assignment of selected data dimensions to avatar (facial) features, leveraging the innate human ability to interpret facial traits. We introduce a semantic division of data dimensions into intuitive and technical categories, assigning the former to avatar features and projecting the latter into a four-dimensional (or higher) spatial embedding. The technique is implemented as a plugin for the open-source dpVision visualization platform, enabling users to interactively explore data in the form of a swarm of avatars whose spatial positions and visual features jointly encode various aspects of the dataset. Experimental results with synthetic test data and a 12-dimensional dataset of Portuguese Vinho Verde wines demonstrate that the proposed method enhances interpretability and facilitates the analysis of complex data structures.

Authors:Takehiro Takayanagi, Kiyoshi Izumi, Javier Sanz-Cruzado, Richard McCreadie, Iadh Ounis
Title: Are Generative AI Agents Effective Personalized Financial Advisors?
Abstract:
Large language model-based agents are becoming increasingly popular as a low-cost mechanism to provide personalized, conversational advice, and have demonstrated impressive capabilities in relatively simple scenarios, such as movie recommendations. But how do these agents perform in complex high-stakes domains, where domain expertise is essential and mistakes carry substantial risk? This paper investigates the effectiveness of LLM-advisors in the finance domain, focusing on three distinct challenges: (1) eliciting user preferences when users themselves may be unsure of their needs, (2) providing personalized guidance for diverse investment preferences, and (3) leveraging advisor personality to build relationships and foster trust. Via a lab-based user study with 64 participants, we show that LLM-advisors often match human advisor performance when eliciting preferences, although they can struggle to resolve conflicting user needs. When providing personalized advice, the LLM was able to positively influence user behavior, but demonstrated clear failure modes. Our results show that accurate preference elicitation is key, otherwise, the LLM-advisor has little impact, or can even direct the investor toward unsuitable assets. More worryingly, users appear insensitive to the quality of advice being given, or worse these can have an inverse relationship. Indeed, users reported a preference for and increased satisfaction as well as emotional trust with LLMs adopting an extroverted persona, even though those agents provided worse advice.

Authors:Anastasiia Ivanova, Natalia Fedorova, Sergei Tilga, Ekaterina Artemova
Title: Voices of Freelance Professional Writers on AI: Limitations, Expectations, and Fears
Abstract:
The rapid development of AI-driven tools, particularly large language models (LLMs), is reshaping professional writing. Still, key aspects of their adoption such as languages support, ethics, and long-term impact on writers voice and creativity remain underexplored. In this work, we conducted a questionnaire (N = 301) and an interactive survey (N = 36) targeting professional writers regularly using AI. We examined LLM-assisted writing practices across 25+ languages, ethical concerns, and user expectations. The findings of the survey demonstrate important insights, reflecting upon the importance of: LLMs adoption for non-English speakers; the degree of misinformation, domain and style adaptation; usability and key features of LLMs. These insights can guide further development, benefiting both writers and a broader user base.

Authors:Pengkun Liu, Pingbo Tang, Jiepeng Liu, Yu Hou
Title: Quantifying Personality in Human-Drone Interactions for Building Heat Loss Inspection with Virtual Reality Training
Abstract:
Reliable building energy audits are crucial for efficiency through heat loss detection. While drones assist inspections, they overlook the interplay between personality traits, stress management, and operational strategies expert engineers employ. This gap, combined with workforce shortages, necessitates effective knowledge transfer. This study proposes a VR-based training system for human-drone interaction in building heat loss inspection. Participants piloted a virtual drone with a thermographic monitor to identify defects. By analyzing flight patterns, stress adaptation, and inspection performance across diverse trainees, we found: (1) Flight Trajectories - Extraverts, Intuitives, Feelers, and Perceivers explored larger areas but showed higher misclassification rates, while Introverts, Sensors, Thinkers, and Judgers demonstrated methodical approaches. (2) Stress Adaptation - Heart rate variability revealed broader stress fluctuations among Extraverts, Intuitives, Feelers, and Perceivers, whereas Introverts, Sensors, Thinkers, and Judgers maintained steadier responses. Task complexity magnified these differences. (3) Inspection Performance - Extraverts, Intuitives, and Feelers achieved higher recall but over-identified defects. Introverts, Sensors, Thinkers, and Judgers made fewer random errors but risked overlooking subtle heat losses. These insights highlight the interplay among personality traits, stress management, and operational strategies in VR training for drone-assisted audits. The framework shows potential for addressing workforce shortages by facilitating knowledge transfer and optimizing human-drone collaboration.

Authors:Joshua Holstein, Moritz Diener, Philipp Spitzer
Title: From Consumption to Collaboration: Measuring Interaction Patterns to Augment Human Cognition in Open-Ended Tasks
Abstract:
The rise of Generative AI, and Large Language Models (LLMs) in particular, is fundamentally changing cognitive processes in knowledge work, raising critical questions about their impact on human reasoning and problem-solving capabilities. As these AI systems become increasingly integrated into workflows, they offer unprecedented opportunities for augmenting human thinking while simultaneously risking cognitive erosion through passive consumption of generated answers. This tension is particularly pronounced in open-ended tasks, where effective solutions require deep contextualization and integration of domain knowledge. Unlike structured tasks with established metrics, measuring the quality of human-LLM interaction in such open-ended tasks poses significant challenges due to the absence of ground truth and the iterative nature of solution development. To address this, we present a framework that analyzes interaction patterns along two dimensions: cognitive activity mode (exploration vs. exploitation) and cognitive engagement mode (constructive vs. detrimental). This framework provides systematic measurements to evaluate when LLMs are effective tools for thought rather than substitutes for human cognition, advancing theoretical understanding and practical guidance for developing AI systems that protect and augment human cognitive capabilities.

Authors:Matheus Valentim, Vaishali Dhanoa, Gabriela Molina León, Niklas Elmqvist
Title: The Plot Thickens: Quantitative Part-by-Part Exploration of MLLM Visualization Literacy
Abstract:
Multimodal Large Language Models (MLLMs) can interpret data visualizations, but what makes a visualization understandable to these models? Do factors like color, shape, and text influence legibility, and how does this compare to human perception? In this paper, we build on prior work to systematically assess which visualization characteristics impact MLLM interpretability. We expanded the Visualization Literacy Assessment Test (VLAT) test set from 12 to 380 visualizations by varying plot types, colors, and titles. This allowed us to statistically analyze how these features affect model performance. Our findings suggest that while color palettes have no significant impact on accuracy, plot types and the type of title significantly affect MLLM performance. We observe similar trends for model omissions. Based on these insights, we look into which plot types are beneficial for MLLMs in different tasks and propose visualization design principles that enhance MLLM readability. Additionally, we make the extended VLAT test set, VLAT ex, publicly available on https://osf.io/ermwx/ together with our supplemental material for future model testing and evaluation.

Authors:Erin McGowan, Joao Rulff, Sonia Castelo, Guande Wu, Shaoyu Chen, Roque Lopez, Bea Steers, Iran R. Roman, Fabio F. Dias, Jing Qian, Parikshit Solunke, Michael Middleton, Ryan McKendrick, Claudio T. Silva
Title: Design and Implementation of the Transparent, Interpretable, and Multimodal (TIM) AR Personal Assistant
Abstract:
The concept of an AI assistant for task guidance is rapidly shifting from a science fiction staple to an impending reality. Such a system is inherently complex, requiring models for perceptual grounding, attention, and reasoning, an intuitive interface that adapts to the performer's needs, and the orchestration of data streams from many sensors. Moreover, all data acquired by the system must be readily available for post-hoc analysis to enable developers to understand performer behavior and quickly detect failures. We introduce TIM, the first end-to-end AI-enabled task guidance system in augmented reality which is capable of detecting both the user and scene as well as providing adaptable, just-in-time feedback. We discuss the system challenges and propose design solutions. We also demonstrate how TIM adapts to domain applications with varying needs, highlighting how the system components can be customized for each scenario.

Authors:Eduard Kuric, Peter Demcak, Matus Krajcovic, Jan Lang
Title: Systematic Literature Review of Automation and Artificial Intelligence in Usability Issue Detection
Abstract:
Usability issues can hinder the effective use of software. Therefore, various techniques are deployed to diagnose and mitigate them. However, these techniques are costly and time-consuming, particularly in iterative design and development. A substantial body of research indicates that automation and artificial intelligence can enhance the process of obtaining usability insights. In our systematic review of 155 publications, we offer a comprehensive overview of the current state of the art for automated usability issue detection. We analyze trends, paradigms, and the technical context in which they are applied. Finally, we discuss the implications and potential directions for future research.

Authors:Soyeon Kim, Junho Choi, Subeen Lee, Jaesik Choi
Title: Example-Based Concept Analysis Framework for Deep Weather Forecast Models
Abstract:
To improve the trustworthiness of an AI model, finding consistent, understandable representations of its inference process is essential. This understanding is particularly important in high-stakes operations such as weather forecasting, where the identification of underlying meteorological mechanisms is as critical as the accuracy of the predictions. Despite the growing literature that addresses this issue through explainable AI, the applicability of their solutions is often limited due to their AI-centric development. To fill this gap, we follow a user-centric process to develop an example-based concept analysis framework, which identifies cases that follow a similar inference process as the target instance in a target model and presents them in a user-comprehensible format. Our framework provides the users with visually and conceptually analogous examples, including the probability of concept assignment to resolve ambiguities in weather mechanisms. To bridge the gap between vector representations identified from models and human-understandable explanations, we compile a human-annotated concept dataset and implement a user interface to assist domain experts involved in the the framework development.

Authors:Soyeon Kim, Junho Choi, Yeji Choi, Subeen Lee, Artyom Stitsyuk, Minkyoung Park, Seongyeop Jeong, Youhyun Baek, Jaesik Choi
Title: Explainable AI-Based Interface System for Weather Forecasting Model
Abstract:
Machine learning (ML) is becoming increasingly popular in meteorological decision-making. Although the literature on explainable artificial intelligence (XAI) is growing steadily, user-centered XAI studies have not extend to this domain yet. This study defines three requirements for explanations of black-box models in meteorology through user studies: statistical model performance for different rainfall scenarios to identify model bias, model reasoning, and the confidence of model outputs. Appropriate XAI methods are mapped to each requirement, and the generated explanations are tested quantitatively and qualitatively. An XAI interface system is designed based on user feedback. The results indicate that the explanations increase decision utility and user trust. Users prefer intuitive explanations over those based on XAI algorithms even for potentially easy-to-recognize examples. These findings can provide evidence for future research on user-centered XAI algorithms, as well as a basis to improve the usability of AI systems in practice.

Authors:Nanna Inie, Jeanette Falk, Raghavendra Selvan
Title: The HCI GenAI CO2ST Calculator: A Tool for Calculating the Carbon Footprint of Generative AI Use in Human-Computer Interaction Research
Abstract:
Increased usage of generative AI (GenAI) in Human-Computer Interaction (HCI) research induces a climate impact from carbon emissions due to energy consumption of the hardware used to develop and run GenAI models and systems. The exact energy usage and and subsequent carbon emissions are difficult to estimate in HCI research because HCI researchers most often use cloud-based services where the hardware and its energy consumption are hidden from plain view. The HCI GenAI CO2ST Calculator is a tool designed specifically for the HCI research pipeline, to help researchers estimate the energy consumption and carbon footprint of using generative AI in their research, either a priori (allowing for mitigation strategies or experimental redesign) or post hoc (allowing for transparent documentation of carbon footprint in written reports of the research).

Authors:Maria-Magdalena Wolf, Niklas Krauss, Arwed Schmidt, Frank Diermeyer
Title: Control Center Framework for Teleoperation Support of Automated Vehicles on Public Roads
Abstract:
Implementing a teleoperation system with its various actors and interactions is challenging and requires an overview of the necessary functions. This work collects all tasks that arise in a control center for an automated vehicle fleet from literature and assigns them to the two roles Remote Operator and Fleet Manager. Focusing on the driving-related tasks of the remote operator, a process is derived that contains the sequence of tasks, associated vehicle states, and transitions between the states. The resulting state diagram shows all remote operator actions available to effectively resolve automated vehicle disengagements. Thus, the state diagram can be applied to existing legislation or modified based on prohibitions of specific interactions. The developed control center framework and included state diagram should serve as a basis for implementing and testing remote support for automated vehicles to be validated on public roads.

Authors:Tim Rolff, Jurik Karimian, Niklas Hypki, Susanne Schmidt, Markus Lappe, Frank Steinicke
Title: Tokenization of Gaze Data
Abstract:
A considerable part of the performance of today's large language models (LLM's) and multimodal large language models (MLLM's) depends on their tokenization strategies. While tokenizers are extensively researched for textual and visual input, there is no research on tokenization strategies for gaze data due to its nature. However, a corresponding tokenization strategy would allow using the vision capabilities of pre-trained MLLM's for gaze data, for example, through fine-tuning. In this paper, we aim to close this research gap by analyzing five different tokenizers for gaze data on three different datasets for the forecasting and generation of gaze data through LLMs (cf.~\cref{fig:teaser}). We evaluate the tokenizers regarding their reconstruction and compression abilities. Further, we train an LLM for each tokenization strategy, measuring its generative and predictive performance. Overall, we found that a quantile tokenizer outperforms all others in predicting the gaze positions and k-means is best when predicting gaze velocities.

Authors:Mallika Garg, Debashis Ghosh, Pyari Mohan Pradhan
Title: OccRobNet : Occlusion Robust Network for Accurate 3D Interacting Hand-Object Pose Estimation
Abstract:
Occlusion is one of the challenging issues when estimating 3D hand pose. This problem becomes more prominent when hand interacts with an object or two hands are involved. In the past works, much attention has not been given to these occluded regions. But these regions contain important and beneficial information that is vital for 3D hand pose estimation. Thus, in this paper, we propose an occlusion robust and accurate method for the estimation of 3D hand-object pose from the input RGB image. Our method includes first localising the hand joints using a CNN based model and then refining them by extracting contextual information. The self attention transformer then identifies the specific joints along with the hand identity. This helps the model to identify the hand belongingness of a particular joint which helps to detect the joint even in the occluded region. Further, these joints with hand identity are then used to estimate the pose using cross attention mechanism. Thus, by identifying the joints in the occluded region, the obtained network becomes robust to occlusion. Hence, this network achieves state-of-the-art results when evaluated on the InterHand2.6M, HO3D and H$_2$O3D datasets.

Authors:Yuanrong Tang, Yu Kang, Yifan Wang, Tianhong Wang, Chen Zhong, Jiangtao Gong
Title: CA+: Cognition Augmented Counselor Agent Framework for Long-term Dynamic Client Engagement
Abstract:
Current AI counseling systems struggle with maintaining effective long-term client engagement. Through formative research with counselors and a systematic literature review, we identified five key design considerations for AI counseling interactions. Based on these insights, we propose CA+, a Cognition Augmented counselor framework enhancing contextual understanding through three components: (1) Therapy Strategies Module: Implements hierarchical Goals-Session-Action planning with bidirectional adaptation based on client feedback; (2) Communication Form Module: Orchestrates parallel guidance and empathy pathways for balanced therapeutic progress and emotional resonance; (3) Information Management: Utilizes client profile and therapeutic knowledge databases for dynamic, context-aware interventions. A three-day longitudinal study with 24 clients demonstrates CA+'s significant improvements in client engagement, perceived empathy, and overall satisfaction compared to a baseline system. Besides, two licensed counselors confirm its high professionalism. Our research demonstrates the potential for enhancing LLM engagement in psychological counseling dialogues through cognitive theory, which may inspire further innovations in computational interaction in the future.

Authors:Cosima du Pasquier, Jennifer Grannen, Chuer Pan, Serin L. Huber, Aliyah Smith, Monroe Kennedy, Shuran Song, Dorsa Sadigh, Allison M. Okamura
Title: A Study of Perceived Safety for Soft Robotics in Caregiving Tasks
Abstract:
In this project, we focus on human-robot interaction in caregiving scenarios like bathing, where physical contact is inevitable and necessary for proper task execution because force must be applied to the skin. Using finite element analysis, we designed a 3D-printed gripper combining positive and negative pressure for secure yet compliant handling. Preliminary tests showed it exerted a lower, more uniform pressure profile than a standard rigid gripper. In a user study, participants' trust in robots significantly increased after they experienced a brief bathing demonstration performed by a robotic arm equipped with the soft gripper. These results suggest that soft robotics can enhance perceived safety and acceptance in intimate caregiving scenarios.

Authors:Hsiang-Ting Chen, Yuan Zhang, Gustavo Carneiro, Rajvinder Singh
Title: Toward a Human-Centered AI-assisted Colonoscopy System in Australia
Abstract:
While AI-assisted colonoscopy promises improved colorectal cancer screening, its success relies on effective integration into clinical practice, not just algorithmic accuracy. This paper, based on an Australian field study (observations and gastroenterologist interviews), highlights a critical disconnect: current development prioritizes machine learning model performance, overlooking essential aspects of user interface design, workflow integration, and overall user experience. Industry interactions reveal a similar emphasis on data and algorithms. To realize AI's full potential, the HCI community must champion user-centered design, ensuring these systems are usable, support endoscopist expertise, and enhance patient outcomes.

Authors:Liza Darwesh, Jaspreet Singh, Marin Marian, Eduard Alexa, Koen Hindriks, Kim Baraka
Title: Exploring the Effect of Robotic Embodiment and Empathetic Tone of LLMs on Empathy Elicitation
Abstract:
This study investigates the elicitation of empathy toward a third party through interaction with social agents. Participants engaged with either a physical robot or a voice-enabled chatbot, both driven by a large language model (LLM) programmed to exhibit either an empathetic tone or remain neutral. The interaction is focused on a fictional character, Katie Banks, who is in a challenging situation and in need of financial donations. The willingness to help Katie, measured by the number of hours participants were willing to volunteer, along with their perceptions of the agent, were assessed for 60 participants. Results indicate that neither robotic embodiment nor empathetic tone significantly influenced participants' willingness to volunteer. While the LLM effectively simulated human empathy, fostering genuine empathetic responses in participants proved challenging.

Authors:Noam Kahlon, Guy Rom, Anatoly Efros, Filippo Galgani, Omri Berkovitch, Sapir Caduri, William E. Bishop, Oriana Riva, Ido Dagan
Title: Agent-Initiated Interaction in Phone UI Automation
Abstract:
Phone automation agents aim to autonomously perform a given natural-language user request, such as scheduling appointments or booking a hotel. While much research effort has been devoted to screen understanding and action planning, complex tasks often necessitate user interaction for successful completion. Aligning the agent with the user's expectations is crucial for building trust and enabling personalized experiences. This requires the agent to proactively engage the user when necessary, avoiding actions that violate their preferences while refraining from unnecessary questions where a default action is expected. We argue that such subtle agent-initiated interaction with the user deserves focused research attention. To promote such research, this paper introduces a task formulation for detecting the need for user interaction and generating appropriate messages. We thoroughly define the task, including aspects like interaction timing and the scope of the agent's autonomy. Using this definition, we derived annotation guidelines and created AndroidInteraction, a diverse dataset for the task, leveraging an existing UI automation dataset. We tested several text-based and multimodal baseline models for the task, finding that it is very challenging for current LLMs. We suggest that our task formulation, dataset, baseline models and analysis will be valuable for future UI automation research, specifically in addressing this crucial yet often overlooked aspect of agent-initiated interaction. This work provides a needed foundation to allow personalized agents to properly engage the user when needed, within the context of phone UI automation.

Authors:Zijian Ding, Michelle Brachman, Joel Chan, Werner Geyer
Title: "The Diagram is like Guardrails": Structuring GenAI-assisted Hypotheses Exploration with an Interactive Shared Representation
Abstract:
Data analysis encompasses a spectrum of tasks, from high-level conceptual reasoning to lower-level execution. While AI-powered tools increasingly support execution tasks, there remains a need for intelligent assistance in conceptual tasks. This paper investigates the design of an ordered node-link tree interface augmented with AI-generated information hints and visualizations, as a potential shared representation for hypothesis exploration. Through a design probe (n=22), participants generated diagrams averaging 21.82 hypotheses. Our findings showed that the node-link diagram acts as "guardrails" for hypothesis exploration, facilitating structured workflows, providing comprehensive overviews, and enabling efficient backtracking. The AI-generated information hints, particularly visualizations, aided users in transforming abstract ideas into data-backed concepts while reducing cognitive load. We further discuss how node-link diagrams can support both parallel exploration and iterative refinement in hypothesis formulation, potentially enhancing the breadth and depth of human-AI collaborative data analysis.

Authors:Geonsun Lee, Yue Yang, Jennifer Healey, Dinesh Manocha
Title: Since U Been Gone: Augmenting Context-Aware Transcriptions for Re-engaging in Immersive VR Meetings
Abstract:
Maintaining engagement in immersive meetings is challenging, particularly when users must catch up on missed content after disruptions. While transcription interfaces can help, table-fixed panels have the potential to distract users from the group, diminishing social presence, while avatar-fixed captions fail to provide past context. We present EngageSync, a context-aware avatar-fixed transcription interface that adapts based on user engagement, offering live transcriptions and LLM-generated summaries to enhance catching up while preserving social presence. We implemented a live VR meeting setup for a 12-participant formative study and elicited design considerations. In two user studies with small (3 avatars) and mid-sized (7 avatars) groups, EngageSync significantly improved social presence (p < .05) and time spent gazing at others in the group instead of the interface over table-fixed panels. Also, it reduced re-engagement time and increased information recall (p < .05) over avatar-fixed interfaces, with stronger effects in mid-sized groups (p < .01).

Authors:Saugat Pandey, Alvitta Ottley
Title: Benchmarking Visual Language Models on Standardized Visualization Literacy Tests
Abstract:
The increasing integration of Visual Language Models (VLMs) into visualization systems demands a comprehensive understanding of their visual interpretation capabilities and constraints. While existing research has examined individual models, systematic comparisons of VLMs' visualization literacy remain unexplored. We bridge this gap through a rigorous, first-of-its-kind evaluation of four leading VLMs (GPT-4, Claude, Gemini, and Llama) using standardized assessments: the Visualization Literacy Assessment Test (VLAT) and Critical Thinking Assessment for Literacy in Visualizations (CALVI). Our methodology uniquely combines randomized trials with structured prompting techniques to control for order effects and response variability - a critical consideration overlooked in many VLM evaluations. Our analysis reveals that while specific models demonstrate competence in basic chart interpretation (Claude achieving 67.9% accuracy on VLAT), all models exhibit substantial difficulties in identifying misleading visualization elements (maximum 30.0\% accuracy on CALVI). We uncover distinct performance patterns: strong capabilities in interpreting conventional charts like line charts (76-96% accuracy) and detecting hierarchical structures (80-100% accuracy), but consistent difficulties with data-dense visualizations involving multiple encodings (bubble charts: 18.6-61.4%) and anomaly detection (25-30% accuracy). Significantly, we observe distinct uncertainty management behavior across models, with Gemini displaying heightened caution (22.5% question omission) compared to others (7-8%). These findings provide crucial insights for the visualization community by establishing reliable VLM evaluation benchmarks, identifying areas where current models fall short, and highlighting the need for targeted improvements in VLM architectures for visualization tasks.

Authors:Dian Chen, Han Jun Yoon, Zelin Wan, Nithin Alluru, Sang Won Lee, Richard He, Terrence J. Moore, Frederica F. Nelson, Sunghyun Yoon, Hyuk Lim, Dan Dongseong Kim, Jin-Hee Cho
Title: Advancing Human-Machine Teaming: Concepts, Challenges, and Applications
Abstract:
Human-Machine Teaming (HMT) is revolutionizing collaboration across domains such as defense, healthcare, and autonomous systems by integrating AI-driven decision-making, trust calibration, and adaptive teaming. This survey presents a comprehensive taxonomy of HMT, analyzing theoretical models, including reinforcement learning, instance-based learning, and interdependence theory, alongside interdisciplinary methodologies. Unlike prior reviews, we examine team cognition, ethical AI, multi-modal interactions, and real-world evaluation frameworks. Key challenges include explainability, role allocation, and scalable benchmarking. We propose future research in cross-domain adaptation, trust-aware AI, and standardized testbeds. By bridging computational and social sciences, this work lays a foundation for resilient, ethical, and scalable HMT systems.

Authors:Yara Kyrychenko, Jon Roozenbeek, Brandon Davidson, Sander van der Linden, Ramit Debnath
Title: Human Preferences for Constructive Interactions in Language Model Alignment
Abstract:
As large language models (LLMs) enter the mainstream, aligning them to foster constructive dialogue rather than exacerbate societal divisions is critical. Using an individualized and multicultural alignment dataset of over 7,500 conversations of individuals from 74 countries engaging with 21 LLMs, we examined how linguistic attributes linked to constructive interactions are reflected in human preference data used for training AI. We found that users consistently preferred well-reasoned and nuanced responses while rejecting those high in personal storytelling. However, users who believed that AI should reflect their values tended to place less preference on reasoning in LLM responses and more on curiosity. Encouragingly, we observed that users could set the tone for how constructive their conversation would be, as LLMs mirrored linguistic attributes, including toxicity, in user queries.

Authors:Majid Behravan, Denis Gracanin
Title: From Voices to Worlds: Developing an AI-Powered Framework for 3D Object Generation in Augmented Reality
Abstract:
This paper presents Matrix, an advanced AI-powered framework designed for real-time 3D object generation in Augmented Reality (AR) environments. By integrating a cutting-edge text-to-3D generative AI model, multilingual speech-to-text translation, and large language models (LLMs), the system enables seamless user interactions through spoken commands. The framework processes speech inputs, generates 3D objects, and provides object recommendations based on contextual understanding, enhancing AR experiences. A key feature of this framework is its ability to optimize 3D models by reducing mesh complexity, resulting in significantly smaller file sizes and faster processing on resource-constrained AR devices. Our approach addresses the challenges of high GPU usage, large model output sizes, and real-time system responsiveness, ensuring a smoother user experience. Moreover, the system is equipped with a pre-generated object repository, further reducing GPU load and improving efficiency. We demonstrate the practical applications of this framework in various fields such as education, design, and accessibility, and discuss future enhancements including image-to-3D conversion, environmental object detection, and multimodal support. The open-source nature of the framework promotes ongoing innovation and its utility across diverse industries.

Authors:Mahsa Golchoubian, Moojan Ghafurian, Nasser Lashgarian Azad, Kerstin Dautenhahn
Title: What are Social Norms for Low-speed Autonomous Vehicle Navigation in Crowded Environments? An Online Survey
Abstract:
It has been suggested that autonomous vehicles can improve efficiency and safety of the transportation systems. While research in this area often focuses on autonomous vehicles which operate on roads, the deployment of low-speed, autonomous vehicles in unstructured, crowded environments has been studied less well and requires specific considerations regarding their interaction with pedestrians. For making the operation of these vehicles acceptable, their behaviour needs to be perceived as safe by both pedestrians and the passengers riding the vehicle. In this paper we conducted an online survey with 116 participants, to understand people's preferences with respect to an autonomous golf cart's behaviour in different interaction scenarios. We measured people's self-reported perceived safety towards different behaviour of the cart in a variety of scenarios. Results suggested that despite the unstructured nature of the environment, the cart was expected to follow common traffic rules when interacting with a group of pedestrians.

Authors:JiHyun Kim, JuneHyoung Kwon, MiHyeon Kim, Eunju Lee, YoungBin Kim
Title: Rank-O-ToM: Unlocking Emotional Nuance Ranking to Enhance Affective Theory-of-Mind
Abstract:
Facial Expression Recognition (FER) plays a foundational role in enabling AI systems to interpret emotional nuances, a critical aspect of affective Theory of Mind (ToM). However, existing models often struggle with poor calibration and a limited capacity to capture emotional intensity and complexity. To address this, we propose Ranking the Emotional Nuance for Theory of Mind (Rank-O-ToM), a framework that leverages ordinal ranking to align confidence levels with the emotional spectrum. By incorporating synthetic samples reflecting diverse affective complexities, Rank-O-ToM enhances the nuanced understanding of emotions, advancing AI's ability to reason about affective states.

Authors:Aviv L. Cohav, A. Xinran Gong, J. Taery Kim, Clint Zeagler, Sehoon Ha, Bruce N. Walker
Title: Do Looks Matter? Exploring Functional and Aesthetic Design Preferences for a Robotic Guide Dog
Abstract:
Dog guides offer an effective mobility solution for blind or visually impaired (BVI) individuals, but conventional dog guides have limitations including the need for care, potential distractions, societal prejudice, high costs, and limited availability. To address these challenges, we seek to develop a robot dog guide capable of performing the tasks of a conventional dog guide, enhanced with additional features. In this work, we focus on design research to identify functional and aesthetic design concepts to implement into a quadrupedal robot. The aesthetic design remains relevant even for BVI users due to their sensitivity toward societal perceptions and the need for smooth integration into society. We collected data through interviews and surveys to answer specific design questions pertaining to the appearance, texture, features, and method of controlling and communicating with the robot. Our study identified essential and preferred features for a future robot dog guide, which are supported by relevant statistics aligning with each suggestion. These findings will inform the future development of user-centered designs to effectively meet the needs of BVI individuals.

Authors:Chelse Swoopes, Tyler Holloway, Elena L. Glassman
Title: The Impact of Revealing Large Language Model Stochasticity on Trust, Reliability, and Anthropomorphization
Abstract:
Interfaces for interacting with large language models (LLMs) are often designed to mimic human conversations, typically presenting a single response to user queries. This design choice can obscure the probabilistic and predictive nature of these models, potentially fostering undue trust and over-anthropomorphization of the underlying model. In this paper, we investigate (i) the effect of displaying multiple responses simultaneously as a countermeasure to these issues, and (ii) how a cognitive support mechanism-highlighting structural and semantic similarities across responses-helps users deal with the increased cognitive load of that intervention. We conducted a within-subjects study in which participants inspected responses generated by an LLM under three conditions: one response, ten responses with cognitive support, and ten responses without cognitive support. Participants then answered questions about workload, trust and reliance, and anthropomorphization. We conclude by reporting the results of these studies and discussing future work and design opportunities for future LLM interfaces.

Authors:Duc-An Nguyen, Raunak Bhattacharyya, Clara Colombatto, Steve Fleming, Ingmar Posner, Nick Hawes
Title: Joint Decision-Making in Robot Teleoperation: When are Two Heads Better Than One?
Abstract:
Operators working with robots in safety-critical domains have to make decisions under uncertainty, which remains a challenging problem for a single human operator. An open question is whether two human operators can make better decisions jointly, as compared to a single operator alone. While prior work has shown that two heads are better than one, such studies have been mostly limited to static and passive tasks. We investigate joint decision-making in a dynamic task involving humans teleoperating robots. We conduct a human-subject experiment with $N=100$ participants where each participant performed a navigation task with two mobiles robots in simulation. We find that joint decision-making through confidence sharing improves dyad performance beyond the better-performing individual (p<0.0001). Further, we find that the extent of this benefit is regulated both by the skill level of each individual, as well as how well-calibrated their confidence estimates are. Finally, we present findings on characterising the human-human dyad's confidence calibration based on the individuals constituting the dyad. Our findings demonstrate for the first time that two heads are better than one, even on a spatiotemporal task which includes active operator control of robots.

Authors:Michele Grimaldi, Jieyeon Woo, Fabien Boucaud, Lucie Galland, Nezih Younsi, Liu Yang, Mireille Fares, Sean Graux, Philippe Gauthier, Catherine Pelachaud
Title: GRETA: Modular Platform to Create Adaptive Socially Interactive Agents
Abstract:
The interaction between humans is very complex to describe since it is composed of different elements from different modalities such as speech, gaze, and gestures influenced by social attitudes and emotions. Furthermore, the interaction can be affected by some features which refer to the interlocutor's state. Actual Socially Interactive Agents SIAs aim to adapt themselves to the state of the interaction partner. In this paper, we discuss this adaptation by describing the architecture of the GRETA platform which considers external features while interacting with humans and/or another ECA and process the dialogue incrementally. We illustrate the new architecture of GRETA which deals with the external features, the adaptation, and the incremental approach for the dialogue processing.

Authors:Tommaso Van Der Meer, Andrea Garulli, Antonio Giannitrapani, Renato Quartullo
Title: A Comparative Study of Human Motion Models in Reinforcement Learning Algorithms for Social Robot Navigation
Abstract:
Social robot navigation is an evolving research field that aims to find efficient strategies to safely navigate dynamic environments populated by humans. A critical challenge in this domain is the accurate modeling of human motion, which directly impacts the design and evaluation of navigation algorithms. This paper presents a comparative study of two popular categories of human motion models used in social robot navigation, namely velocity-based models and force-based models. A system-theoretic representation of both model types is presented, which highlights their common feedback structure, although with different state variables. Several navigation policies based on reinforcement learning are trained and tested in various simulated environments involving pedestrian crowds modeled with these approaches. A comparative study is conducted to assess performance across multiple factors, including human motion model, navigation policy, scenario complexity and crowd density. The results highlight advantages and challenges of different approaches to modeling human behavior, as well as their role during training and testing of learning-based navigation policies. The findings offer valuable insights and guidelines for selecting appropriate human motion models when designing socially-aware robot navigation systems.

Authors:Kellie Yu Hui Sim, Kenny Tsu Wei Choo
Title: Envisioning an AI-Enhanced Mental Health Ecosystem
Abstract:
The rapid advancement of Large Language Models (LLMs), reasoning models, and agentic AI approaches coincides with a growing global mental health crisis, where increasing demand has not translated into adequate access to professional support, particularly for underserved populations. This presents a unique opportunity for AI to complement human-led interventions, offering scalable and context-aware support while preserving human connection in this sensitive domain. We explore various AI applications in peer support, self-help interventions, proactive monitoring, and data-driven insights, using a human-centred approach that ensures AI supports rather than replaces human interaction. However, AI deployment in mental health fields presents challenges such as ethical concerns, transparency, privacy risks, and risks of over-reliance. We propose a hybrid ecosystem where where AI assists but does not replace human providers, emphasising responsible deployment and evaluation. We also present some of our early work and findings in several of these AI applications. Finally, we outline future research directions for refining AI-enhanced interventions while adhering to ethical and culturally sensitive guidelines.

Authors:Mohammed Alnajjar, Khalid Alnajjar, Mika Hämäläinen
Title: Threefold model for AI Readiness: A Case Study with Finnish Healthcare SMEs
Abstract:
This study examines AI adoption among Finnish healthcare SMEs through semi-structured interviews with six health-tech companies. We identify three AI engagement categories: AI-curious (exploring AI), AI-embracing (integrating AI), and AI-catering (providing AI solutions). Our proposed threefold model highlights key adoption barriers, including regulatory complexities, technical expertise gaps, and financial constraints. While SMEs recognize AI's potential, most remain in early adoption stages. We provide actionable recommendations to accelerate AI integration, focusing on regulatory reforms, talent development, and inter-company collaboration, offering valuable insights for healthcare organizations, policymakers, and researchers.

Authors:Soohwan Lee, Seoyeong Hwang, Dajung Kim, Kyungho Lee
Title: Conversational Agents as Catalysts for Critical Thinking: Challenging Social Influence in Group Decision-making
Abstract:
Group decision-making processes frequently suffer when social influence and power dynamics suppress minority viewpoints, leading to compliance and groupthink. Conversational agents can counteract these harmful dynamics by encouraging critical thinking. This study investigates how LLM-powered devil's advocate systems affect psychological safety, opinion expression, and satisfaction in power-imbalanced group dynamics. We conducted an experiment with 48 participants in 12 four-person groups, each containing three high-power (senior) and one low-power (junior) member. Each group completed decision tasks in both baseline and AI intervention conditions. Results show AI counterarguments fostered a more flexible atmosphere and significantly enhanced both process and outcome satisfaction for all participants, with particularly notable improvements for minority members. Cognitive workload increased slightly, though not significantly. This research contributes empirical evidence on how AI systems can effectively navigate power hierarchies to foster more inclusive decision-making environments, highlighting the importance of balancing intervention frequency, maintaining conversational flow, and preserving group cohesion.

Authors:Suyash Fulay, Deb Roy
Title: The Empty Chair: Using LLMs to Raise Missing Perspectives in Policy Deliberations
Abstract:
Deliberation is essential to well-functioning democracies, yet physical, economic, and social barriers often exclude certain groups, reducing representativeness and contributing to issues like group polarization. In this work, we explore the use of large language model (LLM) personas to introduce missing perspectives in policy deliberations. We develop and evaluate a tool that transcribes conversations in real-time and simulates input from relevant but absent stakeholders. We deploy this tool in a 19-person student citizens' assembly on campus sustainability. Participants and facilitators found that the tool sparked new discussions and surfaced valuable perspectives they had not previously considered. However, they also noted that AI-generated responses were sometimes overly general. They raised concerns about overreliance on AI for perspective-taking. Our findings highlight both the promise and potential risks of using LLMs to raise missing points of view in group deliberation settings.

Authors:Marco Rondina, Antonio Vetrò, Juan Carlos De Martin
Title: Completeness of Datasets Documentation on ML/AI repositories: an Empirical Investigation
Abstract:
ML/AI is the field of computer science and computer engineering that arguably received the most attention and funding over the last decade. Data is the key element of ML/AI, so it is becoming increasingly important to ensure that users are fully aware of the quality of the datasets that they use, and of the process generating them, so that possible negative impacts on downstream effects can be tracked, analysed, and, where possible, mitigated. One of the tools that can be useful in this perspective is dataset documentation. The aim of this work is to investigate the state of dataset documentation practices, measuring the completeness of the documentation of several popular datasets in ML/AI repositories. We created a dataset documentation schema -- the Documentation Test Sheet (DTS) -- that identifies the information that should always be attached to a dataset (to ensure proper dataset choice and informed use), according to relevant studies in the literature. We verified 100 popular datasets from four different repositories with the DTS to investigate which information was present. Overall, we observed a lack of relevant documentation, especially about the context of data collection and data processing, highlighting a paucity of transparency.

Authors:Ripan Kumar Kundu, Matthew Denton, Genova Mongalo, Prasad Calyam, Khaza Anuarul Hoque
Title: Securing Virtual Reality Experiences: Unveiling and Tackling Cybersickness Attacks with Explainable AI
Abstract:
The synergy between virtual reality (VR) and artificial intelligence (AI), specifically deep learning (DL)-based cybersickness detection models, has ushered in unprecedented advancements in immersive experiences by automatically detecting cybersickness severity and adaptively various mitigation techniques, offering a smooth and comfortable VR experience. While this DL-enabled cybersickness detection method provides promising solutions for enhancing user experiences, it also introduces new risks since these models are vulnerable to adversarial attacks; a small perturbation of the input data that is visually undetectable to human observers can fool the cybersickness detection model and trigger unexpected mitigation, thus disrupting user immersive experiences (UIX) and even posing safety risks. In this paper, we present a new type of VR attack, i.e., a cybersickness attack, which successfully stops the triggering of cybersickness mitigation by fooling DL-based cybersickness detection models and dramatically hinders the UIX. Next, we propose a novel explainable artificial intelligence (XAI)-guided cybersickness attack detection framework to detect such attacks in VR to ensure UIX and a comfortable VR experience. We evaluate the proposed attack and the detection framework using two state-of-the-art open-source VR cybersickness datasets: Simulation 2021 and Gameplay dataset. Finally, to verify the effectiveness of our proposed method, we implement the attack and the XAI-based detection using a testbed with a custom-built VR roller coaster simulation with an HTC Vive Pro Eye headset and perform a user study. Our study shows that such an attack can dramatically hinder the UIX. However, our proposed XAI-guided cybersickness attack detection can successfully detect cybersickness attacks and trigger the proper mitigation, effectively reducing VR cybersickness.

Authors:André Markus, Astrid Carolus, Carolin Wienrich
Title: Objective Measurement of AI Literacy: Development and Validation of the AI Competency Objective Scale (AICOS)
Abstract:
As Artificial Intelligence (AI) becomes more pervasive in various aspects of life, AI literacy is becoming a fundamental competency that enables individuals to move safely and competently in an AI-pervaded world. There is a growing need to measure this competency, e.g., to develop targeted educational interventions. Although several measurement tools already exist, many have limitations regarding subjective data collection methods, target group differentiation, validity, and integration of current developments such as Generative AI Literacy. This study develops and validates the AI Competency Objective Scale (AICOS) for measuring AI literacy objectively. The presented scale addresses weaknesses and offers a robust measurement approach that considers established competency and measurement models, captures central sub-competencies of AI literacy, and integrates the dimension of Generative AI Literacy. The AICOS provides a sound and comprehensive measure of AI literacy, and initial analyses show potential for a modular structure. Furthermore, a first edition of a short version of the AICOS is developed. Due to its methodological foundation, extensive validation, and integration of recent developments, the test represents a valuable resource for scientific research and practice in educational institutions and professional contexts. The AICOS significantly contributes to the development of standardized measurement instruments and enables the targeted assessment and development of AI skills in different target groups.

Authors:Hai Dang, Chelse Swoopes, Daniel Buschek, Elena L. Glassman
Title: CorpusStudio: Surfacing Emergent Patterns in a Corpus of Prior Work while Writing
Abstract:
Many communities, including the scientific community, develop implicit writing norms. Understanding them is crucial for effective communication with that community. Writers gradually develop an implicit understanding of norms by reading papers and receiving feedback on their writing. However, it is difficult to both externalize this knowledge and apply it to one's own writing. We propose two new writing support concepts that reify document and sentence-level patterns in a given text corpus: (1) an ordered distribution over section titles and (2) given the user's draft and cursor location, many retrieved contextually relevant sentences. Recurring words in the latter are algorithmically highlighted to help users see any emergent norms. Study results (N=16) show that participants revised the structure and content using these concepts, gaining confidence in aligning with or breaking norms after reviewing many examples. These results demonstrate the value of reifying distributions over other authors' writing choices during the writing process.

Authors:Arno Verduyn, Maxim Vochten, Joris De Schutter
Title: Enhancing Hand Palm Motion Gesture Recognition by Eliminating Reference Frame Bias via Frame-Invariant Similarity Measures
Abstract:
The ability of robots to recognize human gestures facilitates a natural and accessible human-robot collaboration. However, most work in gesture recognition remains rooted in reference frame-dependent representations. This poses a challenge when reference frames vary due to different work cell layouts, imprecise frame calibrations, or other environmental changes. This paper investigated the use of invariant trajectory descriptors for robust hand palm motion gesture recognition under reference frame changes. First, a novel dataset of recorded Hand Palm Motion (HPM) gestures is introduced. The motion gestures in this dataset were specifically designed to be distinguishable without dependence on specific reference frames or directional cues. Afterwards, multiple invariant trajectory descriptor approaches were benchmarked to assess how their performances generalize to this novel HPM dataset. After this offline benchmarking, the best scoring approach is validated for online recognition by developing a real-time Proof of Concept (PoC). In this PoC, hand palm motion gestures were used to control the real-time movement of a manipulator arm. The PoC demonstrated a high recognition reliability in real-time operation, achieving an $F_1$-score of 92.3%. This work demonstrates the effectiveness of the invariant descriptor approach as a standalone solution. Moreover, we believe that the invariant descriptor approach can also be utilized within other state-of-the-art pattern recognition and learning systems to improve their robustness against reference frame variations.

Authors:Andrew Anderson, David Piorkowski, Margaret Burnett, Justin Weisz
Title: An LLM's Attempts to Adapt to Diverse Software Engineers' Problem-Solving Styles: More Inclusive & Equitable?
Abstract:
Software engineers use code-fluent large language models (LLMs) to help explain unfamiliar code, yet LLM explanations are not adapted to engineers' diverse problem-solving needs. We prompted an LLM to adapt to five problem-solving style types from an inclusive design method, the Gender Inclusiveness Magnifier (GenderMag). We ran a user study with software engineers to examine the impact of explanation adaptations on software engineers' perceptions, both for explanations which matched and mismatched engineers' problem-solving styles. We found that explanations were more frequently beneficial when they matched problem-solving style, but not every matching adaptation was equally beneficial; in some instances, diverse engineers found as much (or more) benefit from mismatched adaptations. Through an equity and inclusivity lens, our work highlights the benefits of having an LLM adapt its explanations to match engineers' diverse problem-solving style values, the potential harms when matched adaptations were not perceived well by engineers, and a comparison of how matching and mismatching LLM adaptations impacted diverse engineers.

Authors:Oen McKinley, Saugat Pandey, Alvitta Ottley
Title: Trustworthy by Design: The Viewer's Perspective on Trust in Data Visualization
Abstract:
Despite the importance of viewers' trust in data visualization, there is a lack of research on the viewers' own perspective on their trust. In addition, much of the research on trust remains relatively theoretical and inaccessible for designers. This work aims to address this gap by conducting a qualitative study to explore how viewers perceive different data visualizations and how their perceptions impact their trust. Three dominant themes emerged from the data. First, users appeared to be consistent, listing similar rationale for their trust across different stimuli. Second, there were diverse opinions about what factors were most important to trust perception and about why the factors matter. Third, despite this disagreement, there were important trends to the factors that users reported as impactful. Finally, we leverage these themes to give specific and actionable guidelines for visualization designers to make more trustworthy visualizations.

Authors:Longfei Chen, Shengxin Li, Ziang Li, Quan Li
Title: DancingBoard: Streamlining the Creation of Motion Comics to Enhance Narratives
Abstract:
Motion comics, a digital animation format that enhances comic book narratives, have wide applications in storytelling, education, and advertising. However, their creation poses significant challenges for amateur creators, primarily due to the need for specialized skills and complex workflows. To address these issues, we conducted an exploratory survey (N=58) to understand the challenges associated with creating motion comics, and an expert interview (N=4) to identify a typical workflow for creation. We further analyzed $95$ online motion comics to gain insights into the design space of character and object actions. Based on our findings, we proposed DancingBoard, an integrated authoring tool designed to simplify the creation process. This tool features a user-friendly interface and a guided workflow, providing comprehensive support throughout each step of the creation process. A user study involving 23 creators showed that, compared to professional tools, DancingBoard is easily comprehensible and provides improved guidance and support, requiring less effort from users. Additionally, a separate study with $18$ audience members confirmed the tool's effectiveness in conveying the story to its viewers.

Authors:Anna Madison, Kaleb McDowell, Vinicius G. Goecks, Jeff Hansberger, Ceili M. Olney, Claire Ahern, Amar Marathe, Nicholas Waytowich, Christian Kenney, Christopher Kelshaw
Title: "New" Challenges for Future C2: Commanding Soldier-Machine Partnerships
Abstract:
Future warfare will occur in more complex, fast-paced, ill-structured, and demanding conditions that will stress current Command and Control (C2) systems. Without modernization, these C2 systems may fail to maintain overmatch against adversaries. We previously proposed robust partnerships between humans and artificial intelligence systems, and directly focusing on C2, we introduced how intelligent technologies could provide future overmatch through streamlining the C2 operations process, maintaining unity of effort across formations, and developing collective knowledge systems that adapt to battlefield dynamics across missions. Future C2 systems must seamlessly integrate human and machine intelligence to achieve decision advantage over adversaries while overcoming "new" challenges due to the technological advances driving fundamental changes in effective teaming, unity of effort, and meaningful human control. Here, we describe "new" C2 challenges and discuss pathways to transcend them, such as AI-enabled systems with effective human machine interfaces.

Authors:J. D. Zamfirescu-Pereira, Eunice Jun, Michael Terry, Qian Yang, Björn Hartmann
Title: Beyond Code Generation: LLM-supported Exploration of the Program Design Space
Abstract:
In this work, we explore explicit Large Language Model (LLM)-powered support for the iterative design of computer programs. Program design, like other design activity, is characterized by navigating a space of alternative problem formulations and associated solutions in an iterative fashion. LLMs are potentially powerful tools in helping this exploration; however, by default, code-generation LLMs deliver code that represents a particular point solution. This obscures the larger space of possible alternatives, many of which might be preferable to the LLM's default interpretation and its generated code. We contribute an IDE that supports program design through generating and showing new ways to frame problems alongside alternative solutions, tracking design decisions, and identifying implicit decisions made by either the programmer or the LLM. In a user study, we find that with our IDE, users combine and parallelize design phases to explore a broader design space -- but also struggle to keep up with LLM-originated changes to code and other information overload. These findings suggest a core challenge for future IDEs that support program design through higher-level instructions given to LLM-based agents: carefully managing attention and deciding what information agents should surface to program designers and when.

Authors:Agnia Sergeyuk, Ilya Zakharov, Ekaterina Koshchenko, Maliheh Izadi
Title: Human-AI Experience in Integrated Development Environments: A Systematic Literature Review
Abstract:
The integration of Artificial Intelligence (AI) into Integrated Development Environments (IDEs) is reshaping software development, fundamentally altering how developers interact with their tools. This shift marks the emergence of Human-AI Experience in Integrated Development Environment (in-IDE HAX), a field that explores the evolving dynamics of Human-Computer Interaction in AI-assisted coding environments. Despite rapid adoption, research on in-IDE HAX remains fragmented, which highlights the need for a unified overview of current practices, challenges, and opportunities. To provide a structured overview of existing research, we conduct a systematic literature review of 90 studies, summarizing current findings and outlining areas for further investigation. We organize key insights from reviewed studies into three aspects: Impact, Design, and Quality of AI-based systems inside IDEs. Impact findings show that AI-assisted coding enhances developer productivity but also introduces challenges, such as verification overhead and over-reliance. Design studies show that effective interfaces surface context, provide explanations and transparency of suggestion, and support user control. Quality studies document risks in correctness, maintainability, and security. For future research, priorities include productivity studies, design of assistance, and audit of AI-generated code. The agenda calls for larger and longer evaluations, stronger audit and verification assets, broader coverage across the software life cycle, and adaptive assistance under user control.

Authors:Yuansong Xu, Yuheng Shao, Jiahe Dong, Shaohan Shi, Chang Jiang, Quan Li
Title: Advancing Problem-Based Learning with Clinical Reasoning for Improved Differential Diagnosis in Medical Education
Abstract:
Medical education increasingly emphasizes students' ability to apply knowledge in real-world clinical settings, focusing on evidence-based clinical reasoning and differential diagnoses. Problem-based learning (PBL) addresses traditional teaching limitations by embedding learning into meaningful contexts and promoting active participation. However, current PBL practices are often confined to medical instructional settings, limiting students' ability to self-direct and refine their approaches based on targeted improvements. Additionally, the unstructured nature of information organization during analysis poses challenges for record-keeping and subsequent review. Existing research enhances PBL realism and immersion but overlooks the construction of logic chains and evidence-based reasoning. To address these gaps, we designed e-MedLearn, a learner-centered PBL system that supports more efficient application and practice of evidence-based clinical reasoning. Through controlled study (N=19) and testing interviews (N=13), we gathered data to assess the system's impact. The findings demonstrate that e-MedLearn improves PBL experiences and provides valuable insights for advancing clinical reasoning-based learning.

Authors:Joan Giner-Miguelez, Sergio Morales, Sergio Cobos, Javier Luis Canovas Izquierdo, Robert Clariso, Jordi Cabot
Title: The Software Diversity Card: A Framework for Reporting Diversity in Software Projects
Abstract:
The interest and concerns about diversity in software development have soared in recent years. Reporting diversity-related aspects of software projects can increase user trust and help regulators evaluate potential adoption. Furthermore, recent directives around AI are beginning to require diversity information in the development of AI products, indicating the growing interest of public regulators in it. Despite this importance, current documentation assets in software development processes frequently overlook diversity in favor of technical features, partly due to a lack of tools for describing and annotating diversity. This work introduces the Software Diversity Card, a comprehensive framework for reporting diversity-related aspects of software projects. The card is designed to profile the different types of teams involved in developing and governing software projects (including the final user groups involved in testing), and the software adaptations for specific social groups. To encourage its adoption, we provide a diversity modeling language, a toolkit for generating the cards using such language, and a collection of real-world examples from active software projects. Our proposal can enhance diversity practices in software development e.g., through open-source projects like the CONTRIBUTING.md file), support public administrations in software assessment, and help businesses promote diversity as a key asset.

Authors:Jooyoung Lee, Xiaochen Zhu, Georgi Karadzhov, Tom Stafford, Andreas Vlachos, Dongwon Lee
Title: Collaborative Evaluation of Deepfake Text with Deliberation-Enhancing Dialogue Systems
Abstract:
The proliferation of generative models has presented significant challenges in distinguishing authentic human-authored content from deepfake content. Collaborative human efforts, augmented by AI tools, present a promising solution. In this study, we explore the potential of DeepFakeDeLiBot, a deliberation-enhancing chatbot, to support groups in detecting deepfake text. Our findings reveal that group-based problem-solving significantly improves the accuracy of identifying machine-generated paragraphs compared to individual efforts. While engagement with DeepFakeDeLiBot does not yield substantial performance gains overall, it enhances group dynamics by fostering greater participant engagement, consensus building, and the frequency and diversity of reasoning-based utterances. Additionally, participants with higher perceived effectiveness of group collaboration exhibited performance benefits from DeepFakeDeLiBot. These findings underscore the potential of deliberative chatbots in fostering interactive and productive group dynamics while ensuring accuracy in collaborative deepfake text detection. \textit{Dataset and source code used in this study will be made publicly available upon acceptance of the manuscript.

Authors:Ti-Chung Cheng, Yutong Zhang, Yi-Hung Chou, Vinay Koshy, Tiffany Wenting Li, Karrie Karahalios, Hari Sundaram
Title: Organize, Then Vote: Exploring Cognitive Load in Quadratic Survey Interfaces
Abstract:
Quadratic Surveys (QSs) elicit more accurate preferences than traditional methods like Likert-scale surveys. However, the cognitive load associated with QSs has hindered their adoption in digital surveys for collective decision-making. We introduce a two-phase "organize-then-vote" QS to reduce cognitive load. As interface design significantly impacts survey results and accuracy, our design scaffolds survey takers' decision-making while managing the cognitive load imposed by QS. In a 2x2 between-subject in-lab study on public resource allotment, we compared our interface with a traditional text interface across a QS with 6 (short) and 24 (long) options. Two-phase interface participants spent more time per option and exhibited shorter voting edit distances. We qualitatively observed shifts in cognitive effort from mechanical operations to constructing more comprehensive preferences. We conclude that this interface promoted deeper engagement, potentially reducing satisficing behaviors caused by cognitive overload in longer QSs. This research clarifies how human-centered design improves preference elicitation tools for collective decision-making.

Authors:Yining Cao, Peiling Jiang, Haijun Xia
Title: Generative and Malleable User Interfaces with Generative and Evolving Task-Driven Data Model
Abstract:
Unlike static and rigid user interfaces, generative and malleable user interfaces offer the potential to respond to diverse users' goals and tasks. However, current approaches primarily rely on generating code, making it difficult for end-users to iteratively tailor the generated interface to their evolving needs. We propose employing task-driven data models-representing the essential information entities, relationships, and data within information tasks-as the foundation for UI generation. We leverage AI to interpret users' prompts and generate the data models that describe users' intended tasks, and by mapping the data models with UI specifications, we can create generative user interfaces. End-users can easily modify and extend the interfaces via natural language and direct manipulation, with these interactions translated into changes in the underlying model. The technical evaluation of our approach and user evaluation of the developed system demonstrate the feasibility and effectiveness of the proposed generative and malleable UIs.

Authors:Ahmed Njifenjou, Virgile Sucal, Bassam Jabaian, Fabrice Lefèvre
Title: Open-Source Large Language Models as Multilingual Crowdworkers: Synthesizing Open-Domain Dialogues in Several Languages With No Examples in Targets and No Machine Translation
Abstract:
The prevailing paradigm in the domain of Open-Domain Dialogue agents predominantly focuses on the English language, encompassing both models and datasets. Furthermore, the financial and temporal investments required for crowdsourcing such datasets for finetuning are substantial, particularly when multiple languages are involved. Fortunately, advancements in Large Language Models (LLMs) have unveiled a plethora of possibilities across diverse tasks. Specifically, instruction-tuning has enabled LLMs to execute tasks based on natural language instructions, occasionally surpassing the performance of human crowdworkers. Additionally, these models possess the capability to function in various languages within a single thread. Consequently, to generate new samples in different languages, we propose leveraging these capabilities to replicate the data collection process. We introduce a pipeline for generating Open-Domain Dialogue data in multiple Target Languages using LLMs, with demonstrations provided in a unique Source Language. By eschewing explicit Machine Translation in this approach, we enhance the adherence to language-specific nuances. We apply this methodology to the PersonaChat dataset. To enhance the openness of generated dialogues and mimic real life scenarii, we added the notion of speech events corresponding to the type of conversation the speakers are involved in and also that of common ground which represents the premises of a conversation.

Authors:Eddie L. Ungless, Zachary Horne, Björn Ross
Title: "Till I can get my satisfaction": Open Questions in the Public Desire to Punish AI
Abstract:
There are countless examples of how AI can cause harm, and increasing evidence that the public are willing to ascribe blame to the AI itself, regardless of how "illogical" this might seem. This raises the question of whether and how the public might expect AI to be punished for this harm. However, public expectations of the punishment of AI have been vastly underexplored. Understanding these expectations is vital, as the public may feel the lingering effect of harm unless their desire for punishment is satisfied. We synthesise research from psychology, human-computer and -robot interaction, philosophy and AI ethics, and law to highlight how our understanding of this issue is still lacking. We call for an interdisciplinary programme of research to establish how we can best satisfy victims of AI harm, for fear of creating a "satisfaction gap" where legal punishment of AI (or not) fails to meet public expectations.

Authors:Yihan Hou, Xingchen Zeng, Yusong Wang, Manling Yang, Xiaojiao Chen, Wei Zeng
Title: GenColor: Generative Color-Concept Association in Visual Design
Abstract:
Existing approaches for color-concept association typically rely on query-based image referencing, and color extraction from image references. However, these approaches are effective only for common concepts, and are vulnerable to unstable image referencing and varying image conditions. Our formative study with designers underscores the need for primary-accent color compositions and context-dependent colors (e.g., 'clear' vs. 'polluted' sky) in design. In response, we introduce a generative approach for mining semantically resonant colors leveraging images generated by text-to-image models. Our insight is that contemporary text-to-image models can resemble visual patterns from large-scale real-world data. The framework comprises three stages: concept instancing produces generative samples using diffusion models, text-guided image segmentation identifies concept-relevant regions within the image, and color association extracts primarily accompanied by accent colors. Quantitative comparisons with expert designs validate our approach's effectiveness, and we demonstrate the applicability through cases in various design scenarios and a gallery.

Authors:Shuning Zhang, Shixuan Li
Title: The Real Her? Exploring Whether Young Adults Accept Human-AI Love
Abstract:
This paper explores the acceptance of human-AI love among young adults, particularly focusing on Chinese women in romantic or intimate relationships with AI companions. Through qualitative research, including 14 semi-structured interviews, the study investigates how these individuals establish and maintain relationships with AI, their perceptions and attitudes towards these entities, and the perspectives of other stakeholders. Key findings reveal that users engage with AI companions for emotional comfort, stress relief, and to avoid social pressures. We identify various roles users assign to AI companions, such as friends, mentors, or romantic partners, and highlights the importance of customization and emotional support in these interactions. While AI companions offer advantages like emotional stability and constant availability, they also face limitations in emotional depth and understanding. The research underscores the need for ethical considerations and regulatory frameworks to address privacy concerns and prevent over-immersion in AI relationships. Future work should explore the long-term psychological impacts and evolving dynamics of human-AI relationships as technology advances.

Authors:Zack While, Ali Sarvghad
Title: Toward Filling a Critical Knowledge Gap: Charting the Interactions of Age with Task and Visualization
Abstract:
We present the results of a study comparing the performance of younger adults (YA) and people in late adulthood (PLA) across ten low-level analysis tasks and five basic visualizations, employing Bayesian regression to aggregate and model participant performance. We analyzed performance at the task level and across combinations of tasks and visualizations, reporting measures of performance at aggregate and individual levels. These analyses showed that PLA on average required more time to complete tasks while demonstrating comparable accuracy. Furthermore, at the individual level, PLA exhibited greater heterogeneity in task performance as well as differences in best-performing visualization types for some tasks. We contribute empirical knowledge on how age interacts with analysis task and visualization type and use these results to offer actionable insights and design recommendations for aging-inclusive visualization design. We invite the visualization research community to further investigate aging-aware data visualization. Supplementary materials can be found at https://osf.io/a7xtz/.

Authors:Hannah Selder, Florian Fischer, Per Ola Kristensson, Arthur Fleig
Title: What Makes a Model Breathe? Understanding Reinforcement Learning Reward Function Design in Biomechanical User Simulation
Abstract:
Biomechanical models allow for diverse simulations of user movements in interaction. Their performance depends critically on the careful design of reward functions, yet the interplay between reward components and emergent behaviours remains poorly understood. We investigate what makes a model "breathe" by systematically analysing the impact of rewarding effort minimisation, task completion, and target proximity on movement trajectories. Using a choice reaction task as a test-bed, we find that a combination of completion bonus and proximity incentives is essential for task success. Effort terms are optional, but can help avoid irregularities if scaled appropriately. Our work offers practical insights for HCI designers to create realistic simulations without needing deep reinforcement learning expertise, advancing the use of simulations as a powerful tool for interaction design and evaluation in HCI.

Authors:Eun Cheol Choi, Ashwin Balasubramanian, Jinhu Qi, Emilio Ferrara
Title: Limited Effectiveness of LLM-based Data Augmentation for COVID-19 Misinformation Stance Detection
Abstract:
Misinformation surrounding emerging outbreaks poses a serious societal threat, making robust countermeasures essential. One promising approach is stance detection (SD), which identifies whether social media posts support or oppose misleading claims. In this work, we finetune classifiers on COVID-19 misinformation SD datasets consisting of claims and corresponding tweets. Specifically, we test controllable misinformation generation (CMG) using large language models (LLMs) as a method for data augmentation. While CMG demonstrates the potential for expanding training datasets, our experiments reveal that performance gains over traditional augmentation methods are often minimal and inconsistent, primarily due to built-in safeguards within LLMs. We release our code and datasets to facilitate further research on misinformation detection and generation.

Authors:Junsol Kim, James Evans, Aaron Schein
Title: Linear Representations of Political Perspective Emerge in Large Language Models
Abstract:
Large language models (LLMs) have demonstrated the ability to generate text that realistically reflects a range of different subjective human perspectives. This paper studies how LLMs are seemingly able to reflect more liberal versus more conservative viewpoints among other political perspectives in American politics. We show that LLMs possess linear representations of political perspectives within activation space, wherein more similar perspectives are represented closer together. To do so, we probe the attention heads across the layers of three open transformer-based LLMs (Llama-2-7b-chat, Mistral-7b-instruct, Vicuna-7b). We first prompt models to generate text from the perspectives of different U.S. lawmakers. We then identify sets of attention heads whose activations linearly predict those lawmakers' DW-NOMINATE scores, a widely-used and validated measure of political ideology. We find that highly predictive heads are primarily located in the middle layers, often speculated to encode high-level concepts and tasks. Using probes only trained to predict lawmakers' ideology, we then show that the same probes can predict measures of news outlets' slant from the activations of models prompted to simulate text from those news outlets. These linear probes allow us to visualize, interpret, and monitor ideological stances implicitly adopted by an LLM as it generates open-ended responses. Finally, we demonstrate that by applying linear interventions to these attention heads, we can steer the model outputs toward a more liberal or conservative stance. Overall, our research suggests that LLMs possess a high-level linear representation of American political ideology and that by leveraging recent advances in mechanistic interpretability, we can identify, monitor, and steer the subjective perspective underlying generated text.

Authors:Andrew Konya, Luke Thorburn, Wasim Almasri, Oded Adomi Leshem, Ariel D. Procaccia, Lisa Schirch, Michiel A. Bakker
Title: Using Collective Dialogues and AI to Find Common Ground Between Israeli and Palestinian Peacebuilders
Abstract:
A growing body of work has shown that AI-assisted methods -- leveraging large language models, social choice methods, and collective dialogues -- can help navigate polarization and surface common ground in controlled lab settings. But what can these approaches contribute in real-world contexts? We present a case study applying these techniques to find common ground between Israeli and Palestinian peacebuilders in the period following October 7th, 2023. From April to July 2024 an iterative deliberative process combining LLMs, bridging-based ranking, and collective dialogues was conducted in partnership with the Alliance for Middle East Peace. Around 138 civil society peacebuilders participated including Israeli Jews, Palestinian citizens of Israel, and Palestinians from the West Bank and Gaza. The process resulted in a set of collective statements, including demands to world leaders, with at least 84% agreement from participants on each side. In this paper, we document the process, results, challenges, and important open questions.

Authors:Yu Zhang, Kexue Fu, Zhicong Lu
Title: RevTogether: Supporting Science Story Revision with Multiple AI Agents
Abstract:
As a popular form of science communication, science stories attract readers because they combine engaging narratives with comprehensible scientific knowledge. However, crafting such stories requires substantial skill and effort, as writers must navigate complex scientific concepts and transform them into coherent and accessible narratives tailored to audiences with varying levels of scientific literacy. To address the challenge, we propose RevTogether, a multi-agent system (MAS) designed to support revision of science stories with human-like AI agents (using GPT-4o). RevTogether allows AI agents to simulate affects in addition to providing comments and writing suggestions, while offering varying degrees of user agency. Our preliminary user study with non-expert writers (N=3) highlighted the need for transparency in AI agents' decision-making processes to support learning and suggested that emotional interactions could enhance human-AI collaboration in science storytelling.

Authors:Yi Li, Florian Fischer, Tim Dwyer, Barrett Ens, Robert Crowther, Per Ola Kristensson, Benjamin Tag
Title: AlphaPIG: The Nicest Way to Prolong Interactive Gestures in Extended Reality
Abstract:
Mid-air gestures serve as a common interaction modality across Extended Reality (XR) applications, enhancing engagement and ownership through intuitive body movements. However, prolonged arm movements induce shoulder fatigue, known as "Gorilla Arm Syndrome", degrading user experience and reducing interaction duration. Although existing ergonomic techniques derived from Fitts' law (such as reducing target distance, increasing target width, and modifying control-display gain) provide some fatigue mitigation, their implementation in XR applications remains challenging due to the complex balance between user engagement and physical exertion. We present AlphaPIG, a meta-technique designed to Prolong Interactive Gestures by leveraging real-time fatigue predictions. AlphaPIG assists designers in extending and improving XR interactions by enabling automated fatigue-based interventions. Through adjustment of intervention timing and intensity decay rate, designers can explore and control the trade-off between fatigue reduction and potential effects such as decreased body ownership. We validated AlphaPIG's effectiveness through a study (N=22) implementing the widely-used Go-Go technique. Results demonstrated that AlphaPIG significantly reduces shoulder fatigue compared to non-adaptive Go-Go, while maintaining comparable perceived body ownership and agency. Based on these findings, we discuss positive and negative perceptions of the intervention. By integrating real-time fatigue prediction with adaptive intervention mechanisms, AlphaPIG constitutes a critical first step towards creating fatigue-aware applications in XR.

Authors:Vijay Keswani, Vincent Conitzer, Walter Sinnott-Armstrong, Breanna K. Nguyen, Hoda Heidari, Jana Schaich Borg
Title: Can AI Model the Complexities of Human Moral Decision-Making? A Qualitative Study of Kidney Allocation Decisions
Abstract:
A growing body of work in Ethical AI attempts to capture human moral judgments through simple computational models. The key question we address in this work is whether such simple AI models capture {the critical} nuances of moral decision-making by focusing on the use case of kidney allocation. We conducted twenty interviews where participants explained their rationale for their judgments about who should receive a kidney. We observe participants: (a) value patients' morally-relevant attributes to different degrees; (b) use diverse decision-making processes, citing heuristics to reduce decision complexity; (c) can change their opinions; (d) sometimes lack confidence in their decisions (e.g., due to incomplete information); and (e) express enthusiasm and concern regarding AI assisting humans in kidney allocation decisions. Based on these findings, we discuss challenges of computationally modeling moral judgments {as a stand-in for human input}, highlight drawbacks of current approaches, and suggest future directions to address these issues.

Authors:Sheng Long, Angelos Chatzimparmpas, Emma Alexander, Matthew Kay, Jessica Hullman
Title: Seeing Eye to AI? Applying Deep-Feature-Based Similarity Metrics to Information Visualization
Abstract:
Judging the similarity of visualizations is crucial to various applications, such as visualization-based search and visualization recommendation systems. Recent studies show deep-feature-based similarity metrics correlate well with perceptual judgments of image similarity and serve as effective loss functions for tasks like image super-resolution and style transfer. We explore the application of such metrics to judgments of visualization similarity. We extend a similarity metric using five ML architectures and three pre-trained weight sets. We replicate results from previous crowd-sourced studies on scatterplot and visual channel similarity perception. Notably, our metric using pre-trained ImageNet weights outperformed gradient-descent tuned MS-SSIM, a multi-scale similarity metric based on luminance, contrast, and structure. Our work contributes to understanding how deep-feature-based metrics can enhance similarity assessments in visualization, potentially improving visual analysis tools and techniques. Supplementary materials are available at https://osf.io/dj2ms.

Authors:Vania Castro, Ana Karina de Oliveira Nascimento, Raigul Zheldibayeva, Duane Searsmith, Akash Saini, Bill Cope, Mary Kalantzis
Title: Generative AI in K-12 Education: The CyberScholar Initiative
Abstract:
This paper focuses on the piloting of CyberScholar, a Generative AI assistant tool that aims to provide formative feedback on writing in K-12 contexts. Specifically, this study explores how students worked with CyberScholar in diverse subject areas, including English Language Arts, Social Studies, and Modern World History classes in Grades 7, 8, 10, and 11 in three schools in the Midwest and one in the Northwest of the United States. This paper focuses on CyberScholar's potential to support K-12 students' writing in diverse subject areas requiring written assignments. Data were collected through implementation observations, surveys, and interviews by participating 121 students and 4 teachers. Thematic qualitative analysis revealed that the feedback tool was perceived as a valuable tool for supporting student writing through detailed feedback, enhanced interactivity, and alignment with rubric criteria. Students appreciated the tool's guidance in refining their writing. For the students, the assistant tool suggests restructuring feedback as a dynamic, dialogic process rather than a static evaluation, a shift that aligns with the cyber-social learning idea, self-regulation, and metacognition. For the teaching side, the findings indicate a shift in teachers' roles, from serving primarily as evaluators to guiding AI feedback processes that foster better student writing and critical thinking.

Authors:Frederic Gmeiner, Nicolai Marquardt, Michael Bentley, Hugo Romat, Michel Pahud, David Brown, Asta Roseway, Nikolas Martelaro, Kenneth Holstein, Ken Hinckley, Nathalie Riche
Title: Intent Tagging: Exploring Micro-Prompting Interactions for Supporting Granular Human-GenAI Co-Creation Workflows
Abstract:
Despite Generative AI (GenAI) systems' potential for enhancing content creation, users often struggle to effectively integrate GenAI into their creative workflows. Core challenges include misalignment of AI-generated content with user intentions (intent elicitation and alignment), user uncertainty around how to best communicate their intents to the AI system (prompt formulation), and insufficient flexibility of AI systems to support diverse creative workflows (workflow flexibility). Motivated by these challenges, we created IntentTagger: a system for slide creation based on the notion of Intent Tags - small, atomic conceptual units that encapsulate user intent - for exploring granular and non-linear micro-prompting interactions for Human-GenAI co-creation workflows. Our user study with 12 participants provides insights into the value of flexibly expressing intent across varying levels of ambiguity, meta-intent elicitation, and the benefits and challenges of intent tag-driven workflows. We conclude by discussing the broader implications of our findings and design considerations for GenAI-supported content creation workflows.

Authors:Shramay Palta, Nirupama Chandrasekaran, Rachel Rudinger, Scott Counts
Title: Speaking the Right Language: The Impact of Expertise Alignment in User-AI Interactions
Abstract:
Using a sample of 25,000 Bing Copilot conversations, we study how the agent responds to users of varying levels of domain expertise and the resulting impact on user experience along multiple dimensions. Our findings show that across a variety of topical domains, the agent largely responds at proficient or expert levels of expertise (77% of conversations) which correlates with positive user experience regardless of the user's level of expertise. Misalignment, such that the agent responds at a level of expertise below that of the user, has a negative impact on overall user experience, with the impact more profound for more complex tasks. We also show that users engage more, as measured by the number of words in the conversation, when the agent responds at a level of expertise commensurate with that of the user. Our findings underscore the importance of alignment between user and AI when designing human-centered AI systems, to ensure satisfactory and productive interactions.

Authors:Kevin Pu, Daniel Lazaro, Ian Arawjo, Haijun Xia, Ziang Xiao, Tovi Grossman, Yan Chen
Title: Assistance or Disruption? Exploring and Evaluating the Design and Trade-offs of Proactive AI Programming Support
Abstract:
AI programming tools enable powerful code generation, and recent prototypes attempt to reduce user effort with proactive AI agents, but their impact on programming workflows remains unexplored. We introduce and evaluate Codellaborator, a design probe LLM agent that initiates programming assistance based on editor activities and task context. We explored three interface variants to assess trade-offs between increasingly salient AI support: prompt-only, proactive agent, and proactive agent with presence and context (Codellaborator). In a within-subject study (N=18), we find that proactive agents increase efficiency compared to prompt-only paradigm, but also incur workflow disruptions. However, presence indicators and interaction context support alleviated disruptions and improved users' awareness of AI processes. We underscore trade-offs of Codellaborator on user control, ownership, and code understanding, emphasizing the need to adapt proactivity to programming processes. Our research contributes to the design exploration and evaluation of proactive AI systems, presenting design implications on AI-integrated programming workflow.

Authors:Jessica He, Stephanie Houde, Justin D. Weisz
Title: Which Contributions Deserve Credit? Perceptions of Attribution in Human-AI Co-Creation
Abstract:
AI systems powered by large language models can act as capable assistants for writing and editing. In these tasks, the AI system acts as a co-creative partner, making novel contributions to an artifact-under-creation alongside its human partner(s). One question that arises in these scenarios is the extent to which AI should be credited for its contributions. We examined knowledge workers' views of attribution through a survey study (N=155) and found that they assigned different levels of credit across different contribution types, amounts, and initiative. Compared to a human partner, we observed a consistent pattern in which AI was assigned less credit for equivalent contributions. Participants felt that disclosing AI involvement was important and used a variety of criteria to make attribution judgments, including the quality of contributions, personal values, and technology considerations. Our results motivate and inform new approaches for crediting AI contributions to co-created work.

Authors:Christine Lee, David Porfirio, Xinyu Jessica Wang, Kevin Zhao, Bilge Mutlu
Title: VeriPlan: Integrating Formal Verification and LLMs into End-User Planning
Abstract:
Automated planning is traditionally the domain of experts, utilized in fields like manufacturing and healthcare with the aid of expert planning tools. Recent advancements in LLMs have made planning more accessible to everyday users due to their potential to assist users with complex planning tasks. However, LLMs face several application challenges within end-user planning, including consistency, accuracy, and user trust issues. This paper introduces VeriPlan, a system that applies formal verification techniques, specifically model checking, to enhance the reliability and flexibility of LLMs for end-user planning. In addition to the LLM planner, VeriPlan includes three additional core features -- a rule translator, flexibility sliders, and a model checker -- that engage users in the verification process. Through a user study (n=12), we evaluate VeriPlan, demonstrating improvements in the perceived quality, usability, and user satisfaction of LLMs. Our work shows the effective integration of formal verification and user-control features with LLMs for end-user planning tasks.

Authors:Yifan He, To Eun Kim, Fernando Diaz, Jaime Arguello, Bhaskar Mitra
Title: Tip of the Tongue Query Elicitation for Simulated Evaluation
Abstract:
Tip-of-the-tongue (TOT) search occurs when a user struggles to recall a specific identifier, such as a document title. While common, existing search systems often fail to effectively support TOT scenarios. Research on TOT retrieval is further constrained by the challenge of collecting queries, as current approaches rely heavily on community question-answering (CQA) websites, leading to labor-intensive evaluation and domain bias. To overcome these limitations, we introduce two methods for eliciting TOT queries - leveraging large language models (LLMs) and human participants - to facilitate simulated evaluations of TOT retrieval systems. Our LLM-based TOT user simulator generates synthetic TOT queries at scale, achieving high correlations with how CQA-based TOT queries rank TOT retrieval systems when tested in the Movie domain. Additionally, these synthetic queries exhibit high linguistic similarity to CQA-derived queries. For human-elicited queries, we developed an interface that uses visual stimuli to place participants in a TOT state, enabling the collection of natural queries. In the Movie domain, system rank correlation and linguistic similarity analyses confirm that human-elicited queries are both effective and closely resemble CQA-based queries. These approaches reduce reliance on CQA-based data collection while expanding coverage to underrepresented domains, such as Landmark and Person. LLM-elicited queries for the Movie, Landmark, and Person domains have been released as test queries in the TREC 2024 TOT track, with human-elicited queries scheduled for inclusion in the TREC 2025 TOT track. Additionally, we provide source code for synthetic query generation and the human query collection interface, along with curated visual stimuli used for eliciting TOT queries.

Authors:Zhe Liu, Taekyu Kang, Haoyu Wang, Seyed Hossein Alavi, Vered Shwartz
Title: Bridging Information Gaps with Comprehensive Answers: Improving the Diversity and Informativeness of Follow-Up Questions
Abstract:
Generating diverse follow-up questions that uncover missing information remains challenging for conversational agents, particularly when they run on small, locally hosted models. To address this, we develop an information-gap-driven knowledge distillation pipeline in which a teacher LLM generates a comprehensive answer, contrasts it with the initial answer to identify information gaps, and formulates gap-bridging follow-up questions. Using this pipeline, we augment the existing FollowupQG dataset tenfold. We then fine-tune smaller student models on the augmented dataset to distill the teacher's knowledge. Experiments with selected teacher-student model pairs show that fine-tuned students achieve significantly higher informativeness and diversity than variations trained on the original dataset. These findings indicate that our pipeline, which mirrors the human cognitive process of information seeking, provides an efficient distillation channel from state-of-the-art LLMs to smaller models, enabling resource-constrained conversational systems to generate more diverse and informative follow-up questions.

Authors:Dawar Khan, Xinyu Liu, Omar Mena, Donggang Jia, Alexandre Kouyoumdjian, Ivan Viola
Title: AIvaluateXR: An Evaluation Framework for on-Device AI in XR with Benchmarking Results
Abstract:
The deployment of large language models (LLMs) on extended reality (XR) devices has great potential to advance the field of human-AI interaction. In the case of direct, on-device model inference, selecting the appropriate model and device for specific tasks remains challenging. In this paper, we present AIvaluateXR, a comprehensive evaluation framework for benchmarking LLMs running on XR devices. To demonstrate the framework, we deploy 17 selected LLMs across four XR platforms: Magic Leap 2, Meta Quest 3, Vivo X100s Pro, and Apple Vision Pro, and conduct an extensive evaluation. Our experimental setup measures four key metrics: performance consistency, processing speed, memory usage, and battery consumption. For each of the 68 model-device pairs, we assess performance under varying string lengths, batch sizes, and thread counts, analyzing the trade-offs for real-time XR applications. We propose a unified evaluation method based on the 3D Pareto Optimality theory to select the optimal device-model pairs from quality and speed objectives. Additionally, we compare the efficiency of on-device LLMs with client-server and cloud-based setups, and evaluate their accuracy on two interactive tasks. We believe our findings offer valuable insight to guide future optimization efforts for LLM deployment on XR devices. Our evaluation method can be used as standard groundwork for further research and development in this emerging field. The source code and supplementary materials are available at: www.nanovis.org/AIvaluateXR.html

Authors:Henry Hengyuan Zhao, Wenqi Pei, Yifei Tao, Haiyang Mei, Mike Zheng Shou
Title: InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
Abstract:
Existing benchmarks do not test Large Multimodal Models (LMMs) on their interactive intelligence with human users, which is vital for developing general-purpose AI assistants. We design InterFeedback, an interactive framework, which can be applied to any LMM and dataset to assess this ability autonomously. On top of this, we introduce InterFeedback-Bench which evaluates interactive intelligence using two representative datasets, MMMU-Pro and MathVerse, to test 10 different open-source LMMs. Additionally, we present InterFeedback-Human, a newly collected dataset of 120 cases designed for manually testing interactive performance in leading models such as OpenAI-o1 and Claude-3.5-Sonnet. Our evaluation results indicate that even the state-of-the-art LMM, OpenAI-o1, struggles to refine its responses based on human feedback, achieving an average score of less than 50%. Our findings point to the need for methods that can enhance LMMs' capabilities to interpret and benefit from feedback.

Authors:Takehiro Takayanagi, Ryuji Hashimoto, Chung-Chi Chen, Kiyoshi Izumi
Title: The Impact and Feasibility of Self-Confidence Shaping for AI-Assisted Decision-Making
Abstract:
In AI-assisted decision-making, it is crucial but challenging for humans to appropriately rely on AI, especially in high-stakes domains such as finance and healthcare. This paper addresses this problem from a human-centered perspective by presenting an intervention for self-confidence shaping, designed to calibrate self-confidence at a targeted level. We first demonstrate the impact of self-confidence shaping by quantifying the upper-bound improvement in human-AI team performance. Our behavioral experiments with 121 participants show that self-confidence shaping can improve human-AI team performance by nearly 50% by mitigating both over- and under-reliance on AI. We then introduce a self-confidence prediction task to identify when our intervention is needed. Our results show that simple machine-learning models achieve 67% accuracy in predicting self-confidence. We further illustrate the feasibility of such interventions. The observed relationship between sentiment and self-confidence suggests that modifying sentiment could be a viable strategy for shaping self-confidence. Finally, we outline future research directions to support the deployment of self-confidence shaping in a real-world scenario for effective human-AI collaboration.

Authors:Kathrin Seßler, Arne Bewersdorff, Claudia Nerdel, Enkelejda Kasneci
Title: Towards Adaptive Feedback with AI: Comparing the Feedback Quality of LLMs and Teachers on Experimentation Protocols
Abstract:
Effective feedback is essential for fostering students' success in scientific inquiry. With advancements in artificial intelligence, large language models (LLMs) offer new possibilities for delivering instant and adaptive feedback. However, this feedback often lacks the pedagogical validation provided by real-world practitioners. To address this limitation, our study evaluates and compares the feedback quality of LLM agents with that of human teachers and science education experts on student-written experimentation protocols. Four blinded raters, all professionals in scientific inquiry and science education, evaluated the feedback texts generated by 1) the LLM agent, 2) the teachers and 3) the science education experts using a five-point Likert scale based on six criteria of effective feedback: Feed Up, Feed Back, Feed Forward, Constructive Tone, Linguistic Clarity, and Technical Terminology. Our results indicate that LLM-generated feedback shows no significant difference to that of teachers and experts in overall quality. However, the LLM agent's performance lags in the Feed Back dimension, which involves identifying and explaining errors within the student's work context. Qualitative analysis highlighted the LLM agent's limitations in contextual understanding and in the clear communication of specific errors. Our findings suggest that combining LLM-generated feedback with human expertise can enhance educational practices by leveraging the efficiency of LLMs and the nuanced understanding of educators.

Authors:He Zhang, Xinyi Fu
Title: Zero-shot Emotion Annotation in Facial Images Using Large Multimodal Models: Benchmarking and Prospects for Multi-Class, Multi-Frame Approaches
Abstract:
This study investigates the feasibility and performance of using large multimodal models (LMMs) to automatically annotate human emotions in everyday scenarios. We conducted experiments on the DailyLife subset of the publicly available FERV39k dataset, employing the GPT-4o-mini model for rapid, zero-shot labeling of key frames extracted from video segments. Under a seven-class emotion taxonomy ("Angry," "Disgust," "Fear," "Happy," "Neutral," "Sad," "Surprise"), the LMM achieved an average precision of approximately 50%. In contrast, when limited to ternary emotion classification (negative/neutral/positive), the average precision increased to approximately 64%. Additionally, we explored a strategy that integrates multiple frames within 1-2 second video clips to enhance labeling performance and reduce costs. The results indicate that this approach can slightly improve annotation accuracy. Overall, our preliminary findings highlight the potential application of zero-shot LMMs in human facial emotion annotation tasks, offering new avenues for reducing labeling costs and broadening the applicability of LMMs in complex multimodal environments.

Authors:Negar Kamali, Karyn Nakamura, Aakriti Kumar, Angelos Chatzimparmpas, Jessica Hullman, Matthew Groh
Title: Characterizing Photorealism and Artifacts in Diffusion Model-Generated Images
Abstract:
Diffusion model-generated images can appear indistinguishable from authentic photographs, but these images often contain artifacts and implausibilities that reveal their AI-generated provenance. Given the challenge to public trust in media posed by photorealistic AI-generated images, we conducted a large-scale experiment measuring human detection accuracy on 450 diffusion-model generated images and 149 real images. Based on collecting 749,828 observations and 34,675 comments from 50,444 participants, we find that scene complexity of an image, artifact types within an image, display time of an image, and human curation of AI-generated images all play significant roles in how accurately people distinguish real from AI-generated images. Additionally, we propose a taxonomy characterizing artifacts often appearing in images generated by diffusion models. Our empirical observations and taxonomy offer nuanced insights into the capabilities and limitations of diffusion models to generate photorealistic images in 2024.

Authors:Ashley Sheil, Jacob Camilleri, Michelle O'Keeffe, Melanie Gruben, Moya Cronin, Hazel Murray
Title: "I'm 73, you can't expect me to have multiple passwords": Password Management Concerns and Solutions of Irish Older Adults
Abstract:
Based on Irish older adult's perceptions, practices, and challenges regarding password management, the goal of this study was to compile suitable advice that can benefit this demographic. To achieve this, we first conducted semi structured interviews (n=37), we then collated advice based on best practice and what we learned from these interviews. We facilitated two independent focus groups (n=31) to evaluate and adjust this advice and tested the finalized advice through an observational study (n=15). The participants were aged between 59 and 86 and came from various counties in Ireland, both rural and urban. The findings revealed that managing multiple passwords was a significant source of frustration, leading some participants to adopt novel and informal strategies for storing them. A notable hesitation to adopt digital password managers and passphrases was also observed. Participants appreciated guidance on improving their password practices, with many affirming that securely writing down passwords was a practical strategy. Irish older adults demonstrated strong intuition regarding cybersecurity, notably expressing concerns over knowledge-based security checks used by banks and government institutions. This study aims to contribute to the aggregation of practical password advice suited to older adults, making password security more manageable and less burdensome for this demographic.

Authors:Smit Desai, Jessie Chin, Dakuo Wang, Benjamin Cowan, Michael Twidale
Title: Toward Metaphor-Fluid Conversation Design for Voice User Interfaces
Abstract:
Metaphors play a critical role in shaping user experiences with Voice User Interfaces (VUIs), yet existing designs often rely on static, human-centric metaphors that fail to adapt to diverse contexts and user needs. This paper introduces Metaphor-Fluid Design, a novel approach that dynamically adjusts metaphorical representations based on conversational use-contexts. We compare this approach to a Default VUI, which characterizes the present implementation of commercial VUIs commonly designed around the persona of an assistant, offering a uniform interaction style across contexts. In Study 1 (N=130), metaphors were mapped to four key use-contexts-commands, information seeking, sociality, and error recovery-along the dimensions of formality and hierarchy, revealing distinct preferences for task-specific metaphorical designs. Study 2 (N=91) evaluates a Metaphor-Fluid VUI against a Default VUI, showing that the Metaphor-Fluid VUI enhances perceived intention to adopt, enjoyment, and likability by aligning better with user expectations for different contexts. However, individual differences in metaphor preferences highlight the need for personalization. These findings challenge the one-size-fits-all paradigm of VUI design and demonstrate the potential of Metaphor-Fluid Design to create more adaptive and engaging human-AI interactions.

Authors:Zeyu He, Saniya Naphade, Ting-Hao 'Kenneth' Huang
Title: Prompting in the Dark: Assessing Human Performance in Prompt Engineering for Data Labeling When Gold Labels Are Absent
Abstract:
Millions of users prompt large language models (LLMs) for various tasks, but how good are people at prompt engineering? Do users actually get closer to their desired outcome over multiple iterations of their prompts? These questions are crucial when no gold-standard labels are available to measure progress. This paper investigates a scenario in LLM-powered data labeling, "prompting in the dark," where users iteratively prompt LLMs to label data without using manually-labeled benchmarks. We developed PromptingSheet, a Google Sheets add-on that enables users to compose, revise, and iteratively label data through spreadsheets. Through a study with 20 participants, we found that prompting in the dark was highly unreliable -- only 9 participants improved labeling accuracy after four or more iterations. Automated prompt optimization tools like DSPy also struggled when few gold labels were available. Our findings highlight the importance of gold labels and the needs, as well as the risks, of automated support in human prompt engineering, providing insights for future tool design.

Authors:Melanie Gruben, Ashley Sheil, Sanchari Das, Michelle O Keeffe, Jacob Camilleri, Moya Cronin, Hazel Murray
Title: "It's Like Not Being Able to Read and Write": Narrowing the Digital Divide for Older Adults and Leveraging the Role of Digital Educators in Ireland
Abstract:
As digital services increasingly replace traditional analogue systems, ensuring that older adults are not left behind is critical to fostering inclusive access. This study explores how digital educators support older adults in developing essential digital skills, drawing insights from interviews with $34$ educators in Ireland. These educators, both professional and volunteer, offer instruction through a range of formats, including workshops, remote calls, and in-person sessions. Our findings highlight the importance of personalized, step-by-step guidance tailored to older adults' learning needs, as well as fostering confidence through hands-on engagement with technology. Key challenges identified include limited transportation options, poor internet connectivity, outdated devices, and a lack of familial support for learning. To address these barriers, we propose enhanced public funding, expanded access to resources, and sustainable strategies such as providing relevant and practical course materials. Additionally, innovative tools like simulated online platforms for practicing digital transactions can help reduce anxiety and enhance digital literacy among older adults. This study underscores the vital role that digital educators play in bridging the digital divide, creating a more inclusive, human-centered approach to digital learning for older adults.

Authors:Alessandro Gambetti, Qiwei Han, Hong Shen, Claudia Soares
Title: A Survey on Human-Centered Evaluation of Explainable AI Methods in Clinical Decision Support Systems
Abstract:
Explainable AI (XAI) has become a crucial component of Clinical Decision Support Systems (CDSS) to enhance transparency, trust, and clinical adoption. However, while many XAI methods have been proposed, their effectiveness in real-world medical settings remains underexplored. This paper provides a survey of human-centered evaluations of Explainable AI methods in Clinical Decision Support Systems. By categorizing existing works based on XAI methodologies, evaluation frameworks, and clinical adoption challenges, we offer a structured understanding of the landscape. Our findings reveal key challenges in the integration of XAI into healthcare workflows and propose a structured framework to align the evaluation methods of XAI with the clinical needs of stakeholders.

Authors:Bereket A. Yilma, Chan Mi Kim, Geke Ludden, Thomas van Rompay, Luis A. Leiva
Title: The AI-Therapist Duo: Exploring the Potential of Human-AI Collaboration in Personalized Art Therapy for PICS Intervention
Abstract:
Post-intensive care syndrome (PICS) is a multifaceted condition that arises from prolonged stays in an intensive care unit (ICU). While preventing PICS among ICU patients is becoming increasingly important, interventions remain limited. Building on evidence supporting the effectiveness of art exposure in addressing the psychological aspects of PICS, we propose a novel art therapy solution through a collaborative Human-AI approach that enhances personalized therapeutic interventions using state-of-the-art Visual Art Recommendation Systems. We developed two Human-in-the-Loop (HITL) personalization methods and assessed their impact through a large-scale user study (N=150). Our findings demonstrate that this Human-AI collaboration not only enhances the personalization and effectiveness of art therapy but also supports therapists by streamlining their workload. While our study centres on PICS intervention, the results suggest that human-AI collaborative Art therapy could potentially benefit other areas where emotional support is critical, such as cases of anxiety and depression.

Authors:Qian Wan, Jiannan Li, Huanchen Wang, Zhicong Lu
Title: Polymind: Parallel Visual Diagramming with Large Language Models to Support Prewriting Through Microtasks
Abstract:
Prewriting is the process of generating and organising ideas before a first draft. It consists of a combination of informal, iterative, and semi-structured strategies such as visual diagramming, which poses a challenge for collaborating with large language models (LLMs) in a turn-taking conversational manner. We present Polymind, a visual diagramming tool that leverages multiple LLM-powered agents to support prewriting. The system features a parallel collaboration workflow in place of the turn-taking conversational interactions. It defines multiple ``microtasks'' to simulate group collaboration scenarios such as collaborative writing and group brainstorming. Instead of repetitively prompting a chatbot for various purposes, Polymind enables users to orchestrate multiple microtasks simultaneously. Users can configure and delegate customised microtasks, and manage their microtasks by specifying task requirements and toggling visibility and initiative. Our evaluation revealed that, compared to ChatGPT, users had more customizability over collaboration with Polymind, and were thus able to quickly expand personalised writing ideas during prewriting.

Authors:Lujain Ibrahim, Canfer Akbulut, Rasmi Elasmar, Charvi Rastogi, Minsuk Kahng, Meredith Ringel Morris, Kevin R. McKee, Verena Rieser, Murray Shanahan, Laura Weidinger
Title: Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models
Abstract:
The tendency of users to anthropomorphise large language models (LLMs) is of growing interest to AI developers, researchers, and policy-makers. Here, we present a novel method for empirically evaluating anthropomorphic LLM behaviours in realistic and varied settings. Going beyond single-turn static benchmarks, we contribute three methodological advances in state-of-the-art (SOTA) LLM evaluation. First, we develop a multi-turn evaluation of 14 anthropomorphic behaviours. Second, we present a scalable, automated approach by employing simulations of user interactions. Third, we conduct an interactive, large-scale human subject study (N=1101) to validate that the model behaviours we measure predict real users' anthropomorphic perceptions. We find that all SOTA LLMs evaluated exhibit similar behaviours, characterised by relationship-building (e.g., empathy and validation) and first-person pronoun use, and that the majority of behaviours only first occur after multiple turns. Our work lays an empirical foundation for investigating how design choices influence anthropomorphic model behaviours and for progressing the ethical debate on the desirability of these behaviours. It also showcases the necessity of multi-turn evaluations for complex social phenomena in human-AI interaction.

Authors:Jina Yoon, Amy X. Zhang, Joseph Seering
Title: "It's Great Because It's Ran By Us": Empowering Teen Volunteer Discord Moderators to Design Healthy and Engaging Youth-Led Online Communities
Abstract:
Online communities can offer many benefits for youth including peer learning, cultural expression, and skill development. However, most HCI research on youth-focused online communities has centered communities developed by adults for youth rather than by the youth themselves. In this work, we interviewed 11 teenagers (ages 13-17) who moderate online Discord communities created by youth, for youth. Participants were identified by Discord platform staff as leaders of well-moderated servers through an intensive exam and application-based process. We also interviewed 2 young adults who volunteered as mentors of some of our teen participants. We present our findings about the benefits, motivations, and risks of teen-led online communities, as well as the role of external stakeholders of these youth spaces. We contextualize our work within the broader teen online safety landscape to provide recommendations to better support, encourage, and protect teen moderators and their online communities. This empirical work contributes one of the first studies to date with teen Discord moderators and aims to empower safe youth-led online communities.

Authors:Marco Rondina, Fabiana Vinci, Antonio Vetrò, Juan Carlos De Martin
Title: Facial Analysis Systems and Down Syndrome
Abstract:
The ethical, social and legal issues surrounding facial analysis technologies have been widely debated in recent years. Key critics have argued that these technologies can perpetuate bias and discrimination, particularly against marginalized groups. We contribute to this field of research by reporting on the limitations of facial analysis systems with the faces of people with Down syndrome: this particularly vulnerable group has received very little attention in the literature so far. This study involved the creation of a specific dataset of face images. An experimental group with faces of people with Down syndrome, and a control group with faces of people who are not affected by the syndrome. Two commercial tools were tested on the dataset, along three tasks: gender recognition, age prediction and face labelling. The results show an overall lower accuracy of prediction in the experimental group, and other specific patterns of performance differences: i) high error rates in gender recognition in the category of males with Down syndrome; ii) adults with Down syndrome were more often incorrectly labelled as children; iii) social stereotypes are propagated in both the control and experimental groups, with labels related to aesthetics more often associated with women, and labels related to education level and skills more often associated with men. These results, although limited in scope, shed new light on the biases that alter face classification when applied to faces of people with Down syndrome. They confirm the structural limitation of the technology, which is inherently dependent on the datasets used to train the models.

Authors:Soohwan Lee, Mingyu Kim, Seoyeong Hwang, Dajung Kim, Kyungho Lee
Title: Amplifying Minority Voices: AI-Mediated Devil's Advocate System for Inclusive Group Decision-Making
Abstract:
Group decision-making often benefits from diverse perspectives, yet power imbalances and social influence can stifle minority opinions and compromise outcomes. This prequel introduces an AI-mediated communication system that leverages the Large Language Model to serve as a devil's advocate, representing underrepresented viewpoints without exposing minority members' identities. Rooted in persuasive communication strategies and anonymity, the system aims to improve psychological safety and foster more inclusive decision-making. Our multi-agent architecture, which consists of a summary agent, conversation agent, AI duplicate checker, and paraphrase agent, encourages the group's critical thinking while reducing repetitive outputs. We acknowledge that reliance on text-based communication and fixed intervention timings may limit adaptability, indicating pathways for refinement. By focusing on the representation of minority viewpoints anonymously in power-imbalanced settings, this approach highlights how AI-driven methods can evolve to support more divergent and inclusive group decision-making.

Authors:Hui Ye, Chufeng Xiao, Jiaye Leng, Pengfei Xu, Hongbo Fu
Title: MoGraphGPT: Creating Interactive Scenes Using Modular LLM and Graphical Control
Abstract:
Creating interactive scenes often involves complex programming tasks. Although large language models (LLMs) like ChatGPT can generate code from natural language, their output is often error-prone, particularly when scripting interactions among multiple elements. The linear conversational structure limits the editing of individual elements, and lacking graphical and precise control complicates visual integration. To address these issues, we integrate an element-level modularization technique that processes textual descriptions for individual elements through separate LLM modules, with a central module managing interactions among elements. This modular approach allows for refining each element independently. We design a graphical user interface, MoGraphGPT , which combines modular LLMs with enhanced graphical control to generate codes for 2D interactive scenes. It enables direct integration of graphical information and offers quick, precise control through automatically generated sliders. Our comparative evaluation against an AI coding tool, Cursor Composer, as the baseline system and a usability study show MoGraphGPT significantly improves easiness, controllability, and refinement in creating complex 2D interactive scenes with multiple visual elements in a coding-free manner.

Authors:Huiyun Tang, Songqi Sun, Kexin Nie, Ang Li, Anastasia Sergeeva, Ray LC
Title: Breaking the News: Taking the Roles of Influencer vs. Journalist in a LLM-Based Game for Raising Misinformation Awareness
Abstract:
Effectively mitigating online misinformation requires understanding of their mechanisms and learning of practical skills for identification and counteraction. Serious games may serve as tools for combating misinformation, teaching players to recognize common misinformation tactics, and improving their skills of discernment. However, current interventions are designed as single-player, choice-based games, which present players with limited predefined choices. Such restrictions reduce replayability and may lead to an overly simplistic understanding of misinformation and how to debunk them. This study seeks to empower people to understand opinion-influencing and misinformation-debunking processes. We created a Player vs. Player (PvP) game in which participants attempt to generate or debunk misinformation to convince the public opinion represented by LLM. Using a within-subjects mixed-methods study design (N=47), we found that this game significantly raised participants' media literacy and improved their ability to identify misinformation. Qualitative analyses revealed how participants' use of debunking and content creation strategies deepened their understanding of misinformation. This work shows the potential for illuminating contrasting viewpoints of social issues by LLM-based mechanics in PvP games.

Authors:Viktorija Paneva, Maximilian David, Jörg Müller
Title: From Brick to Click: Comparing LEGO Building in Virtual Reality and the Physical World
Abstract:
We present a comparative study of building with LEGO in three environments: the physical world, a Virtual Reality (VR) counterpart, and a VR setting enhanced with "superpowers". The study aims to understand how traditional creative hands-on activities translate to virtual environments, with potential benefits for educational, training, entertainment, and therapeutic uses. 22 participants engaged in both structured assembly and creative free-building tasks across these environments. We investigated differences in user performance, engagement, and creativity, with a focus on how the additional VR functionalities influenced the building experience. The findings reveal that while the physical environment offers a familiar tactile experience, VR, particularly with added superpowers, was clearly favoured by participants in the creative free-building scenario. Our recommendations for VR design include balancing automation with user control to enhance task efficiency while maintaining engagement, and implementing intuitive systems that manage complexity to prevent user overwhelm and support creative freedom.

Authors:Aryan Garg, Yue Jiang, Antti Oulasvirta
Title: Controllable GUI Exploration
Abstract:
During the early stages of interface design, designers need to produce multiple sketches to explore a design space. Design tools often fail to support this critical stage, because they insist on specifying more details than necessary. Although recent advances in generative AI have raised hopes of solving this issue, in practice they fail because expressing loose ideas in a prompt is impractical. In this paper, we propose a diffusion-based approach to the low-effort generation of interface sketches. It breaks new ground by allowing flexible control of the generation process via three types of inputs: A) prompts, B) wireframes, and C) visual flows. The designer can provide any combination of these as input at any level of detail, and will get a diverse gallery of low-fidelity solutions in response. The unique benefit is that large design spaces can be explored rapidly with very little effort in input-specification. We present qualitative results for various combinations of input specifications. Additionally, we demonstrate that our model aligns more accurately with these specifications than other models.

Authors:Siyu Zha, Yuanrong Tang, Jiangtao Gong, Yingqing Xu
Title: COLP: Scaffolding Children's Online Long-term Collaborative Learning
Abstract:
Online collaborative learning and working are important for everyone including children. However, children still face a lot of difficulties communicating and working together while online, which keeps them from engaging in long-term project-based teamwork. We aim to investigate online long-term collaborative learning opportunities to address this gap. We design COLP, an online, 16-week, project-based learning program, as an educational intervention based on multiple learning theories for primary school students. We conducted this program with 67 primary school students ages 8-13, across more than five provinces of China. We found that this program could engage more than one-third of children in teamwork after long-term study. Furthermore, we interview children and their parents to help us understand the communication channel, benefits, and challenges of this program. Interestingly, we discovered that parents play multiple roles in their children's collaborative learning, particularly modeling and guiding the children's collaborative skills. Given the lack of programs designed for children's long-term online collaboration, this study may inspire intervention design in computer-supported collaborative learning communities.

Authors:Muhammad Hassan, Mahnoor Jameel, Tian Wang, Masooda Bashir
Title: Unveiling Privacy and Security Gaps in Female Health Apps
Abstract:
Female Health Applications (FHA), a growing segment of FemTech, aim to provide affordable and accessible healthcare solutions for women globally. These applications gather and monitor health and reproductive data from millions of users. With ongoing debates on women's reproductive rights and privacy, it's crucial to assess how these apps protect users' privacy. In this paper, we undertake a security and data protection assessment of 45 popular FHAs. Our investigation uncovers harmful permissions, extensive collection of sensitive personal and medical data, and the presence of numerous third-party tracking libraries. Furthermore, our examination of their privacy policies reveals deviations from fundamental data privacy principles. These findings highlight a significant lack of privacy and security measures for FemTech apps, especially as women's reproductive rights face growing political challenges. The results and recommendations provide valuable insights for users, app developers, and policymakers, paving the way for better privacy and security in Female Health Applications.

Authors:Xinyue Chen, Nathan Yap, Xinyi Lu, Aylin Gunal, Xu Wang
Title: MeetMap: Real-Time Collaborative Dialogue Mapping with LLMs in Online Meetings
Abstract:
Video meeting platforms display conversations linearly through transcripts or summaries. However, ideas during a meeting do not emerge linearly. We leverage LLMs to create dialogue maps in real time to help people visually structure and connect ideas. Balancing the need to reduce the cognitive load on users during the conversation while giving them sufficient control when using AI, we explore two system variants that encompass different levels of AI assistance. In Human-Map, AI generates summaries of conversations as nodes, and users create dialogue maps with the nodes. In AI-Map, AI produces dialogue maps where users can make edits. We ran a within-subject experiment with ten pairs of users, comparing the two MeetMap variants and a baseline. Users preferred MeetMap over traditional methods for taking notes, which aligned better with their mental models of conversations. Users liked the ease of use for AI-Map due to the low effort demands and appreciated the hands-on opportunity in Human-Map for sense-making.

Authors:Muhammad Hassan, Abdullah Ghani, Muhammad Fareed Zaffar, Masooda Bashir
Title: Decoding User Concerns in AI Health Chatbots: An Exploration of Security and Privacy in App Reviews
Abstract:
AI powered health chatbot applications are increasingly utilized for personalized healthcare services, yet they pose significant challenges related to user data security and privacy. This study evaluates the effectiveness of automated methods, specifically BART and Gemini GenAI, in identifying security privacy related (SPR) concerns within these applications' user reviews, benchmarking their performance against manual qualitative analysis. Our results indicate that while Gemini's performance in SPR classification is comparable to manual labeling, both automated methods have limitations, including the misclassification of unrelated issues. Qualitative analysis revealed critical user concerns, such as data collection practices, data misuse, and insufficient transparency and consent mechanisms. This research enhances the understanding of the relationship between user trust, privacy, and emerging mobile AI health chatbot technologies, offering actionable insights for improving security and privacy practices in AI driven health chatbots. Although exploratory, our findings highlight the necessity for rigorous audits and transparent communication strategies, providing valuable guidance for app developers and vendors in addressing user security and privacy concerns.

Authors:Hüseyin Aydın, Kevin Godin-Dubois, Libio Goncalvez Braz, Floris den Hengst, Kim Baraka, Mustafa Mert Çelikok, Andreas Sauter, Shihan Wang, Frans A. Oliehoek
Title: SHARPIE: A Modular Framework for Reinforcement Learning and Human-AI Interaction Experiments
Abstract:
Reinforcement learning (RL) offers a general approach for modeling and training AI agents, including human-AI interaction scenarios. In this paper, we propose SHARPIE (Shared Human-AI Reinforcement Learning Platform for Interactive Experiments) to address the need for a generic framework to support experiments with RL agents and humans. Its modular design consists of a versatile wrapper for RL environments and algorithm libraries, a participant-facing web interface, logging utilities, deployment on popular cloud and participant recruitment platforms. It empowers researchers to study a wide variety of research questions related to the interaction between humans and RL agents, including those related to interactive reward specification and learning, learning from human feedback, action delegation, preference elicitation, user-modeling, and human-AI teaming. The platform is based on a generic interface for human-RL interactions that aims to standardize the field of study on RL in human contexts.

Authors:Xinyue Gui, Ding Xia, Wang Gao, Mustafa Doga Dogan, Maria Larsson, Takeo Igarashi
Title: Draw2Cut: Direct On-Material Annotations for CNC Milling
Abstract:
Creating custom artifacts with computer numerical control (CNC) milling machines typically requires mastery of complex computer-aided design (CAD) software. To eliminate this user barrier, we introduced Draw2Cut, a novel system that allows users to design and fabricate artifacts by sketching directly on physical materials. Draw2Cut employs a custom-drawing language to convert user-drawn lines, symbols, and colors into toolpaths, thereby enabling users to express their creative intent intuitively. The key features include real-time alignment between material and virtual toolpaths, a preview interface for validation, and an open-source platform for customization. Through technical evaluations and user studies, we demonstrate that Draw2Cut lowers the entry barrier for personal fabrication, enabling novices to create customized artifacts with precision and ease. Our findings highlight the potential of the system to enhance creativity, engagement, and accessibility in CNC-based woodworking.

Authors:Shuchang Xu, Xiaofu Jin, Huamin Qu, Yukang Yan
Title: DanmuA11y: Making Time-Synced On-Screen Video Comments (Danmu) Accessible to Blind and Low Vision Users via Multi-Viewer Audio Discussions
Abstract:
By overlaying time-synced user comments on videos, Danmu creates a co-watching experience for online viewers. However, its visual-centric design poses significant challenges for blind and low vision (BLV) viewers. Our formative study identified three primary challenges that hinder BLV viewers' engagement with Danmu: the lack of visual context, the speech interference between comments and videos, and the disorganization of comments. To address these challenges, we present DanmuA11y, a system that makes Danmu accessible by transforming it into multi-viewer audio discussions. DanmuA11y incorporates three core features: (1) Augmenting Danmu with visual context, (2) Seamlessly integrating Danmu into videos, and (3) Presenting Danmu via multi-viewer discussions. Evaluation with twelve BLV viewers demonstrated that DanmuA11y significantly improved Danmu comprehension, provided smooth viewing experiences, and fostered social connections among viewers. We further highlight implications for enhancing commentary accessibility in video-based social media and live-streaming platforms.

Authors:Ramya Iyer, Mustafa Doga Dogan, Maria Larsson, Takeo Igarashi
Title: XR-penter: Material-Aware and In Situ Design of Scrap Wood Assemblies
Abstract:
Woodworkers have to navigate multiple considerations when planning a project, including available resources, skill-level, and intended effort. Do it yourself (DIY) woodworkers face these challenges most acutely because of tight material constraints and a desire for custom designs tailored to specific spaces. To address these needs, we present XR-penter, an extended reality (XR) application that supports in situ, material-aware woodworking for casual makers. Our system enables users to design virtual scrap wood assemblies directly in their workspace, encouraging sustainable practices through the use of discarded materials. Users register physical material as virtual twins, manipulate these twins into an assembly in XR, and preview cuts needed for fabrication. We conducted a case study and feedback sessions to demonstrate how XR-penter supports improvisational workflows in practice, the type of woodworker who would benefit most from our system, and insights on integrating similar spatial and material considerations into future work.

Authors:Shuchang Xu, Chang Chen, Zichen Liu, Xiaofu Jin, Linping Yuan, Yukang Yan, Huamin Qu
Title: Memory Reviver: Supporting Photo-Collection Reminiscence for People with Visual Impairment via a Proactive Chatbot
Abstract:
Reminiscing with photo collections offers significant psychological benefits but poses challenges for people with visual impairment (PVI). Their current reliance on sighted help restricts the flexibility of this activity. In response, we explored using a chatbot in a preliminary study. We identified two primary challenges that hinder effective reminiscence with a chatbot: the scattering of information and a lack of proactive guidance. To address these limitations, we present Memory Reviver, a proactive chatbot that helps PVI reminisce with a photo collection through natural language communication. Memory Reviver incorporates two novel features: (1) a Memory Tree, which uses a hierarchical structure to organize the information in a photo collection; and (2) a Proactive Strategy, which actively delivers information to users at proper conversation rounds. Evaluation with twelve PVI demonstrated that Memory Reviver effectively facilitated engaging reminiscence, enhanced understanding of photo collections, and delivered natural conversational experiences. Based on our findings, we distill implications for supporting photo reminiscence and designing chatbots for PVI.

Authors:Sara Abdali, Can Goksen, Michael Solodko, Saeed Amizadeh, Julie E. Maybee, Kazuhito Koishida
Title: Self-reflecting Large Language Models: A Hegelian Dialectical Approach
Abstract:
Investigating NLP through a philosophical lens has recently caught researchers' eyes, as it bridges computational methods with classical schools of philosophy. This paper introduces a philosophical framework inspired by the Hegelian Dialectic to enable LLMs' self-reflection, utilizing a self-dialectical approach to emulate internal critiques and synthesize new scientific ideas (spanning domains such as mathematics, physics, and more). Additionally, we explore the effect of generation temperature in LLMs by introducing a dynamic annealing approach, which encourages creativity in the early stages and gradually focuses on refinement and nuance, as well as a constant-temperature strategy. Furthermore, we implement a Multi-Agent Majority Voting (MAMV) strategy to assess the validity and novelty of the generated ideas, which proves useful in the absence of domain experts. We also evaluate the effectiveness of our method in generating novel scientific ideas and improving LLMs' reasoning capabilities. Our experiments demonstrate promising results in ideation, along with significant improvements in mathematical and symbolic reasoning.

Authors:Leon Leibmann, Galen Weld, Amy X. Zhang, Tim Althoff
Title: Reddit Rules and Rulers: Quantifying the Link Between Rules and Perceptions of Governance across Thousands of Communities
Abstract:
Rules are a critical component of the functioning of nearly every online community, yet it is challenging for community moderators to make data-driven decisions about what rules to set for their communities. The connection between a community's rules and how its membership feels about its governance is not well understood. In this work, we conduct the largest-to-date analysis of rules on Reddit, collecting a set of 67,545 unique rules across 5,225 communities which collectively account for more than 67% of all content on Reddit. More than just a point-in-time study, our work measures how communities change their rules over a 5+ year period. We develop a method to classify these rules using a taxonomy of 17 key attributes extended from previous work. We assess what types of rules are most prevalent, how rules are phrased, and how they vary across communities of different types. Using a dataset of communities' discussions about their governance, we are the first to identify the rules most strongly associated with positive community perceptions of governance: rules addressing who participates, how content is formatted and tagged, and rules about commercial activities. We conduct a longitudinal study to quantify the impact of adding new rules to communities, finding that after a rule is added, community perceptions of governance immediately improve, yet this effect diminishes after six months. Our results have important implications for platforms, moderators, and researchers. We make our classification model and rules datasets public to support future research on this topic.

Authors:Tim Rolff, Jenny Gabel, Lauren Zerbin, Niklas Hypki, Susanne Schmidt, Markus Lappe, Frank Steinicke
Title: A Hands-free Spatial Selection and Interaction Technique using Gaze and Blink Input with Blink Prediction for Extended Reality
Abstract:
Gaze-based interaction techniques have created significant interest in the field of spatial interaction. Many of these methods require additional input modalities, such as hand gestures (e.g., gaze coupled with pinch). Those can be uncomfortable and difficult to perform in public or limited spaces, and pose challenges for users who are unable to execute pinch gestures. To address these aspects, we propose a novel, hands-free Gaze+Blink interaction technique that leverages the user's gaze and intentional eye blinks. This technique enables users to perform selections by executing intentional blinks. It facilitates continuous interactions, such as scrolling or drag-and-drop, through eye blinks coupled with head movements. So far, this concept has not been explored for hands-free spatial interaction techniques. We evaluated the performance and user experience (UX) of our Gaze+Blink method with two user studies and compared it with Gaze+Pinch in a realistic user interface setup featuring common menu interaction tasks. Study 1 demonstrated that while Gaze+Blink achieved comparable selection speeds, it was prone to accidental selections resulting from unintentional blinks. In Study 2 we explored an enhanced technique employing a deep learning algorithms for filtering out unintentional blinks.

Authors:Mahir Akgun, Sacip Toker
Title: Modeling Changes in Individuals' Cognitive Self-Esteem With and Without Access To Search Tools
Abstract:
Search engines, as cognitive partners, reshape how individuals evaluate their cognitive abilities. This study examines how search tool access influences cognitive self-esteem (CSE)-users' self-perception of cognitive abilities -- through the lens of transactive memory systems. Using a within-subject design with 164 participants, we found that CSE significantly inflates when users have access to search tools, driven by cognitive offloading. Participants with lower initial CSE exhibited greater shifts, highlighting individual differences. Search self-efficacy mediated the relationship between prior search experience and CSE, emphasizing the role of users' past interactions. These findings reveal opportunities for search engine design: interfaces that promote awareness of cognitive offloading and foster self-reflection can support accurate metacognitive evaluations, reducing overreliance on external tools. This research contributes to HCI by demonstrating how interactive systems shape cognitive self-perception, offering actionable insights for designing human-centered tools that balance user confidence and cognitive independence.

Authors:Yi-Chi Liao, Christian Holz
Title: Redefining Affordance via Computational Rationality
Abstract:
Affordances, a foundational concept in human-computer interaction and design, have traditionally been explained by direct-perception theories, which assume that individuals perceive action possibilities directly from the environment. However, these theories fall short of explaining how affordances are perceived, learned, refined, or misperceived, and how users choose between multiple affordances in dynamic contexts. This paper introduces a novel affordance theory grounded in Computational Rationality, positing that humans construct internal representations of the world based on bounded sensory inputs. Within these internal models, affordances are inferred through two core mechanisms: feature recognition and hypothetical motion trajectories. Our theory redefines affordance perception as a decision-making process, driven by two components: confidence (the perceived likelihood of successfully executing an action) and predicted utility (the expected value of the outcome). By balancing these factors, individuals make informed decisions about which actions to take. Our theory frames affordances perception as dynamic, continuously learned, and refined through reinforcement and feedback. We validate the theory via thought experiments and demonstrate its applicability across diverse types of affordances (e.g., physical, digital, social). Beyond clarifying and generalizing the understanding of affordances across contexts, our theory serves as a foundation for improving design communication and guiding the development of more adaptive and intuitive systems that evolve with user capabilities.

Authors:Xinying Hou, Zihan Wu, Xu Wang, Barbara J. Ericson
Title: Personalized Parsons Puzzles as Scaffolding Enhance Practice Engagement Over Just Showing LLM-Powered Solutions
Abstract:
As generative AI products could generate code and assist students with programming learning seamlessly, integrating AI into programming education contexts has driven much attention. However, one emerging concern is that students might get answers without learning from the LLM-generated content. In this work, we deployed the LLM-powered personalized Parsons puzzles as scaffolding to write-code practice in a Python learning classroom (PC condition) and conducted an 80-minute randomized between-subjects study. Both conditions received the same practice problems. The only difference was that when requesting help, the control condition showed students a complete solution (CC condition), simulating the most traditional LLM output. Results indicated that students who received personalized Parsons puzzles as scaffolding engaged in practicing significantly longer than those who received complete solutions when struggling.

Authors:Christoph Treude, Marco A. Gerosa
Title: How Developers Interact with AI: A Taxonomy of Human-AI Collaboration in Software Engineering
Abstract:
Artificial intelligence (AI), including large language models and generative AI, is emerging as a significant force in software development, offering developers powerful tools that span the entire development lifecycle. Although software engineering research has extensively studied AI tools in software development, the specific types of interactions between developers and these AI-powered tools have only recently begun to receive attention. Understanding and improving these interactions has the potential to enhance productivity, trust, and efficiency in AI-driven workflows. In this paper, we propose a taxonomy of interaction types between developers and AI tools, identifying eleven distinct interaction types, such as auto-complete code suggestions, command-driven actions, and conversational assistance. Building on this taxonomy, we outline a research agenda focused on optimizing AI interactions, improving developer control, and addressing trust and usability challenges in AI-assisted development. By establishing a structured foundation for studying developer-AI interactions, this paper aims to stimulate research on creating more effective, adaptive AI tools for software development.

Authors:Jason T. Isa, Lillian J. Ratliff, Samuel A. Burden
Title: A Learning Algorithm That Attains the Human Optimum in a Repeated Human-Machine Interaction Game
Abstract:
When humans interact with learning-based control systems, a common goal is to minimize a cost function known only to the human. For instance, an exoskeleton may adapt its assistance in an effort to minimize the human's metabolic cost-of-transport. Conventional approaches to synthesizing the learning algorithm solve an inverse problem to infer the human's cost. However, these problems can be ill-posed, hard to solve, or sensitive to problem data. Here we show a game-theoretic learning algorithm that works solely by observing human actions to find the cost minimum, avoiding the need to solve an inverse problem. We evaluate the performance of our algorithm in an extensive set of human subjects experiments, demonstrating consistent convergence to the minimum of a prescribed human cost function in scalar and multidimensional instantiations of the game. We conclude by outlining future directions for theoretical and empirical extensions of our results.

Authors:Xiaoshan Zhou, Carol M. Menassa, Vineet R. Kamat
Title: Towards Probabilistic Inference of Human Motor Intentions by Assistive Mobile Robots Controlled via a Brain-Computer Interface
Abstract:
Assistive mobile robots are a transformative technology that helps persons with disabilities regain the ability to move freely. Although autonomous wheelchairs significantly reduce user effort, they still require human input to allow users to maintain control and adapt to changing environments. Brain Computer Interface (BCI) stands out as a highly user-friendly option that does not require physical movement. Current BCI systems can understand whether users want to accelerate or decelerate, but they implement these changes in discrete speed steps rather than allowing for smooth, continuous velocity adjustments. This limitation prevents the systems from mimicking the natural, fluid speed changes seen in human self-paced motion. The authors aim to address this limitation by redesigning the perception-action cycle in a BCI controlled robotic system: improving how the robotic agent interprets the user's motion intentions (world state) and implementing these actions in a way that better reflects natural physical properties of motion, such as inertia and damping. The scope of this paper focuses on the perception aspect. We asked and answered a normative question "what computation should the robotic agent carry out to optimally perceive incomplete or noisy sensory observations?" Empirical EEG data were collected, and probabilistic representation that served as world state distributions were learned and evaluated in a Generative Adversarial Network framework. The ROS framework was established that connected with a Gazebo environment containing a digital twin of an indoor space and a virtual model of a robotic wheelchair. Signal processing and statistical analyses were implemented to identity the most discriminative features in the spatial-spectral-temporal dimensions, which are then used to construct the world model for the robotic agent to interpret user motion intentions as a Bayesian observer.

Authors:Amy Koike, Yuki Okafuji, Kenya Hoshimure, Jun Baba
Title: What Drives You to Interact?: The Role of User Motivation for a Robot in the Wild
Abstract:
In this paper, we aim to understand how user motivation shapes human-robot interaction (HRI) in the wild. To explore this, we conducted a field study by deploying a fully autonomous conversational robot in a shopping mall over two days. Through sequential video analysis, we identified five patterns of interaction fluency (Smooth, Awkward, Active, Messy, and Quiet), four types of user motivation for interacting with the robot (Function, Experiment, Curiosity, and Education), and user positioning towards the robot. We further analyzed how these motivations and positioning influence interaction fluency. Our findings suggest that incorporating users' motivation types into the design of robot behavior can enhance interaction fluency, engagement, and user satisfaction in real-world HRI scenarios.

Authors:Ana M. Bernardos, Xian Wang, Luca Bergesio, Juan A. Besada, José R. Casar
Title: Assessing the Acceptance of a Mid-Air Gesture Syntax for Smart Space Interaction: An Empirical Study
Abstract:
This article explores the use of a location-aware mid-air gesture-based command triplet syntax to interact with a smart space. The syntax, inspired by human language, is built as a vocative case with an imperative structure. In a sentence like 'Light, please switch on', the object being activated is invoked via making a gesture that mimics its initial letter/acronym (vocative, coincident with the sentence's elliptical subject). A geometrical or directional gesture then identifies the action (imperative verb) and may include an object feature or a second object with which to network (complement), which also represented by the initial or acronym letter. Technically, an interpreter relying on a trainable multidevice gesture recognition layer makes the pair/triplet syntax decoding possible. The recognition layer works on acceleration and position input signals from graspable (smartphone) and free-hand devices (smartwatch and external depth cameras), as well as a specific compiler. On a specific deployment at a Living Lab facility, the syntax has been instantiated via the use of a lexicon derived from English (with respect to the initial letters and acronyms). A within-subject analysis with twelve users has enabled the analysis of the syntax acceptance (in terms of usability, gesture agreement for actions over objects, and social acceptance) and technology preference of the gesture syntax within its three device implementations (graspable, wearable, and device-free ones). Participants express consensus regarding the simplicity of learning the syntax and its potential effectiveness in managing smart resources. Socially, participants favoured the Watch for outdoor activities and the Phone for home and work settings, underscoring the importance of social context in technology design. The Phone emerged as the preferred option for gesture recognition due to its efficiency and familiarity.

Authors:Yasith Samaradivakara, Asela Pathirage, Thavindu Ushan, Prasanth Sasikumar, Kasun Karunanayaka, Chamath Keppitiyagama, Suranga Nanayakkara
Title: Tailored Real-time AR Captioning Interface for Enhancing Learning Experience of Deaf and Hard-of-Hearing (DHH) Students
Abstract:
Deaf and hard-of-hearing (DHH) students face significant challenges in specialized educational settings, such as limited exposure to written and spoken language, a lack of tailored educational tools, and restricted access to resources, impacting their language literacy development and overall educational experience. We, therefore, employed a User-Centered Design (UCD) process, collaborating with 8 DHH students and 2 Teachers of the Deaf (ToDs) from a School of Deaf to effectively develop and utilize a real-time captioning augmented reality (AR) system to their school settings, aiming to enhance their learning experience. User study with 24 DHH participants revealed a strong preference (87.5\%) for our system, underscoring its potential to enhance learning experience. We present a comprehensive needs analysis, the UCD process, system implementation, and user feedback, showcasing the effectiveness of tailored AR caption interfaces for DHH students. We also discuss the implications for future development of educational technologies for DHH students.

Authors:Apaar Bawa, Ugur Kursuncu, Dilshod Achilov, Valerie L. Shalin, Nitin Agarwal, Esra Akbas
Title: Telegram as a Battlefield: Kremlin-related Communications during the Russia-Ukraine Conflict
Abstract:
Telegram emerged as a crucial platform for both parties during the conflict between Russia and Ukraine. Per its minimal policies for content moderation, Pro-Kremlin narratives and potential misinformation were spread on Telegram, while anti-Kremlin narratives with related content were also propagated, such as war footage, troop movements, maps of bomb shelters, and air raid warnings. This paper presents a dataset of posts from both pro-Kremlin and anti-Kremlin Telegram channels, collected over a period spanning a year before and a year after the Russian invasion. The dataset comprises 404 pro-Kremlin channels with 4,109,645 posts and 114 anti-Kremlin channels with 1,117,768 posts. We provide details on the data collection process, processing methods, and dataset characterization. Lastly, we discuss the potential research opportunities this dataset may enable researchers across various disciplines.

Authors:Nilesh Kumar Sahu, Snehil Gupta, Haroon R. Lone
Title: Assessing HRV and HR Dynamics with Wearables During Socially Anxious Situations: Insights from a Controlled Study in a Low-Middle-Income Country
Abstract:
This paper investigates physiological markers of Social Anxiety Disorder (SAD) by examining the relationship between Electrocardiogram (ECG) measurements and speech, a known anxiety-inducing activity. Specifically, we analyze changes in heart rate variability (HRV) and heart rate (HR) during four distinct phases: baseline, anticipation, speech activity, and reflection. Our study, involving 51 participants (31 with SAD and 20 without), found that HRV decreased and HR increased during the anticipation and speech activity phases compared to baseline. In contrast, during the reflection phase, HRV increased and HR decreased. Additionally, participants with SAD exhibited lower HRV, higher HR, and reported greater self-perceived anxiety compared to those without SAD. These findings have implications for developing wearable technology to monitor SAD. We also provide our dataset, which captures anxiety across multiple stages, to support further research in this area.

Authors:Tianqi Song, Jack Jamieson, Tianwen Zhu, Naomi Yamashita, Yi-Chieh Lee
Title: From Interaction to Attitude: Exploring the Impact of Human-AI Cooperation on Mental Illness Stigma
Abstract:
AI conversational agents have demonstrated efficacy in social contact interventions for stigma reduction at a low cost. However, the underlying mechanisms of how interaction designs contribute to these effects remain unclear. This study investigates how participating in three human-chatbot interactions affects attitudes toward mental illness. We developed three chatbots capable of engaging in either one-way information dissemination from chatbot to a human or two-way cooperation where the chatbot and a human exchange thoughts and work together on a cooperation task. We then conducted a two-week mixed-methods study to investigate variations over time and across different group memberships. The results indicate that human-AI cooperation can effectively reduce stigma toward individuals with mental illness by fostering relationships between humans and AI through social contact. Additionally, compared to a one-way chatbot, interacting with a cooperative chatbot led participants to perceive it as more competent and likable, promoting greater empathy during the conversation. However, despite the success in reducing stigma, inconsistencies between the chatbot's role and the mental health context raised concerns. We discuss the implications of our findings for human-chatbot interaction designs aimed at changing human attitudes.

Authors:Tiziano Piccardi, Robert West
Title: Navigating Knowledge: Patterns and Insights from Wikipedia Consumption
Abstract:
The Web has drastically simplified our access to knowledge and learning, and fact-checking online resources has become a part of our daily routine. Studying online knowledge consumption is thus critical for understanding human behavior and informing the design of future platforms. In this Chapter, we approach this subject by describing the navigation patterns of the readers of Wikipedia, the world's largest platform for open knowledge. We provide a comprehensive overview of what is known about the three steps that characterize navigation on Wikipedia: (1) how readers reach the platform, (2) how readers navigate the platform, and (3) how readers leave the platform. Finally, we discuss open problems and opportunities for future research in this field.

Authors:Mallika Garg, Debashis Ghosh, Pyari Mohan Pradhan
Title: Multiscaled Multi-Head Attention-based Video Transformer Network for Hand Gesture Recognition
Abstract:
Dynamic gesture recognition is one of the challenging research areas due to variations in pose, size, and shape of the signer's hand. In this letter, Multiscaled Multi-Head Attention Video Transformer Network (MsMHA-VTN) for dynamic hand gesture recognition is proposed. A pyramidal hierarchy of multiscale features is extracted using the transformer multiscaled head attention model. The proposed model employs different attention dimensions for each head of the transformer which enables it to provide attention at the multiscale level. Further, in addition to single modality, recognition performance using multiple modalities is examined. Extensive experiments demonstrate the superior performance of the proposed MsMHA-VTN with an overall accuracy of 88.22\% and 99.10\% on NVGesture and Briareo datasets, respectively.

Authors:Kateryna Melnyk, Lee Friedman, Dmytro Katrychuk, Oleg Komogortsev
Title: Gaze Prediction as a Function of Eye Movement Type and Individual Differences
Abstract:
Eye movement prediction is a promising area of research with the potential to improve performance and the user experience of systems based on eye-tracking technology. In this study, we analyze individual differences in gaze prediction performance. We use three fundamentally different models within the analysis: the lightweight Long Short-Term Memory network (LSTM), the transformer-based network for multivariate time series representation learning (TST), and the Oculomotor Plant Mathematical Model wrapped in the Kalman Filter framework (OPKF). Each solution was assessed on different eye-movement types. We show important subject-to-subject variation for all models and eye-movement types. We found that fixation noise is associated with poorer gaze prediction in fixation. For saccades, higher velocities are associated with poorer gaze prediction performance. We think these individual differences are important and propose that future research should report statistics related to inter-subject variation. We also propose that future models should be designed to reduce subject-to-subject variation.

Authors:Juliana Gerard, Sahajpreet Singh, Morgan Macleod, Michael McKay, Antoine Rivoire, Tanmoy Chakraborty, Muskaan Singh
Title: AI Across Borders: Exploring Perceptions and Interactions in Higher Education
Abstract:
This study investigates students' perceptions of Generative Artificial Intelligence (GenAI), with a focus on Higher Education institutions in Northern Ireland and India. We collect quantitative Likert ratings and qualitative comments from 1211 students on their awareness and perceptions of AI and investigate variations in attitudes toward AI across institutions and subject areas, as well as interactions between these variables with demographic variables (focusing on gender). We found the following: (a) while perceptions varied across institutions, responses for Computer Sciences students were similar, both in terms of topics and degree of positivity; and (b) after controlling for institution and subject area, we observed no effect of gender. These results are consistent with previous studies, which find that students' perceptions are predicted by prior experience; crucially, however, the results of this study contribute to the literature by identifying important interactions between key factors that can influence experience, revealing a more nuanced picture of students' perceptions and the role of experience. We consider the implications of these relations, and further considerations for the role of experience.

Authors:Patricia Piedade, Peter A Hayton, Cynthia Bennett, Anna R L Carter, Clara Crivellaro, Alan Dix, Jess McGowan, Katta Spiel, Miriam Sturdee, Garreth W. Tigwell, Hugo Nicolau
Title: Access InContext: Futuring Accessible Prototyping Tools and Methods
Abstract:
The popularity of accessibility research has grown recently, improving digital inclusion for people with disabilities. However, researchers, including those who have disabilities, have attempted to include people with disabilities in all aspects of design, and they have identified a myriad of practical accessibility barriers posed by tools and methods leveraged by human-computer interaction (HCI) researchers during prototyping. To build a more inclusive technological landscape, we must question the effectiveness of existing prototyping tools and methods, repurpose/retrofit existing resources, and build new tools and methods to support the participation of both researchers and people with disabilities within the prototyping design process of novel technologies. This full-day workshop at CHI 2025 will provide a platform for HCI researchers, designers, and practitioners to discuss barriers and opportunities for creating accessible prototyping and promote hands-on ideation and fabrication exercises aimed at futuring accessible prototyping.

Authors:Patrick Stadler, Christopher Lazik, Christopher Katins, Thomas Kosch
Title: If You Had to Pitch Your Ideal Software -- Evaluating Large Language Models to Support User Scenario Writing for User Experience Experts and Laypersons
Abstract:
The process of requirements analysis requires an understanding of the end users of a system. Thus, expert stakeholders, such as User Experience (UX) designers, usually create various descriptions containing information about the users and their possible needs. In our paper, we investigate to what extent UX novices are able to write such descriptions into user scenarios. We conducted a user study with 60 participants consisting of 30 UX experts and 30 novices who were asked to write a user scenario with or without the help of an LLM-supported writing assistant. Our findings show that LLMs empower laypersons to write reasonable user scenarios and provide first-hand insights for requirements analysis that are comparable to UX experts in terms of structure and clarity, while especially excelling at audience-orientation. We present our qualitative and quantitative findings, including user scenario anatomies, potential influences, and differences in the way participants approached the task.

Authors:Maysara Alhindi, Joseph Hallett
Title: Not quite a piece of CHERI-cake: Are new digital security by design architectures usable?
Abstract:
A digital security-by-design computer architecture, like CHERI, lets you program without fear of buffer overflows or other memory safety errors, but CHERI also rewrites some of the assumptions about how C works and how fundamental types (such as pointers) are implemented in hardware. We conducted a usability study to examine how developers react to the changes required by CHERI when porting software to run on it. We find that developers struggle with CHERI's display of warnings and errors and a lack of diverse documentation.

Authors:Ehud Sharlin, Benjamin Watson, Steve Sutphen, Lili Liu, Robert Lederer, John Frazer
Title: A tangible user interface for assessing cognitive mapping ability
Abstract:
Wayfinding, the ability to recall the environment and navigate through it, is an essential cognitive skill relied upon almost every day in a person's life. A crucial component of wayfinding is the construction of cognitive maps, mental representations of the environments through which a person travels. Age, disease or injury can severely affect cognitive mapping, making assessment of this basic survival skill particularly important to clinicians and therapists. Cognitive mapping has also been the focus of decades of basic research by cognitive psychologists. Both communities have evolved a number of techniques for assessing cognitive mapping ability. We present the Cognitive Map Probe (CMP), a new computerized tool for assessment of cognitive mapping ability that increases consistency and promises improvements in flexibility, accessibility, sensitivity and control. The CMP uses a tangible user interface that affords spatial manipulation. We describe the design of the CMP, and find that it is sensitive to factors known to affect cognitive mapping performance in extensive experimental testing.

Authors:Aimen Gaba, Emily Wall, Tejas Ramkumar Babu, Yuriy Brun, Kyle Hall, Cindy Xiong Bearfield
Title: Bias, Accuracy, and Trust: Gender-Diverse Perspectives on Large Language Models
Abstract:
Large language models (LLMs) are becoming increasingly ubiquitous in our daily lives, but numerous concerns about bias in LLMs exist. This study examines how gender-diverse populations perceive bias, accuracy, and trustworthiness in LLMs, specifically ChatGPT. Through 25 in-depth interviews with non-binary/transgender, male, and female participants, we investigate how gendered and neutral prompts influence model responses and how users evaluate these responses. Our findings reveal that gendered prompts elicit more identity-specific responses, with non-binary participants particularly susceptible to condescending and stereotypical portrayals. Perceived accuracy was consistent across gender groups, with errors most noted in technical topics and creative tasks. Trustworthiness varied by gender, with men showing higher trust, especially in performance, and non-binary participants demonstrating higher performance-based trust. Additionally, participants suggested improving the LLMs by diversifying training data, ensuring equal depth in gendered responses, and incorporating clarifying questions. This research contributes to the CSCW/HCI field by highlighting the need for gender-diverse perspectives in LLM development in particular and AI in general, to foster more inclusive and trustworthy systems.

Authors:Viktorija Paneva, Verena Winterhalter, Franziska Augustinowski, Florian Alt
Title: User Understanding of Privacy Permissions in Mobile Augmented Reality: Perceptions and Misconceptions
Abstract:
Mobile Augmented Reality (AR) applications leverage various sensors to provide immersive user experiences. However, their reliance on diverse data sources introduces significant privacy challenges. This paper investigates user perceptions and understanding of privacy permissions in mobile AR apps through an analysis of existing applications and an online survey of 120 participants. Findings reveal common misconceptions, including confusion about how permissions relate to specific AR functionalities (e.g., location and measurement of physical distances), and misinterpretations of permission labels (e.g., conflating camera and gallery access). We identify a set of actionable implications for designing more usable and transparent privacy mechanisms tailored to mobile AR technologies, including contextual explanations, modular permission requests, and clearer permission labels. These findings offer actionable guidance for developers, researchers, and policymakers working to enhance privacy frameworks in mobile AR.

Authors:Sarah Schömbs, Yan Zhang, Jorge Goncalves, Wafa Johal
Title: From Conversation to Orchestration: HCI Challenges and Opportunities in Interactive Multi-Agentic Systems
Abstract:
Recent advances in multi-agentic systems (e.g. AutoGen, OpenAI Swarm) allow users to interact with a group of specialised AI agents rather than a single general-purpose agent. Despite the promise of this new paradigm, the HCI community has yet to fully examine the opportunities, risks, and user-centred challenges it introduces. We contribute to research on multi-agentic systems by exploring their architectures and key features through a human-centred lens. While literature and use cases remain limited, we build on existing tools and frameworks available to developers to identify a set of overarching challenges, e.g. orchestration and conflict resolution, that can guide future research in HCI. We illustrate these challenges through examples, offer potential design considerations, and provide research opportunities to spark interdisciplinary conversation. Our work lays the groundwork for future exploration and offers a research agenda focused on user-centred design in multi-agentic systems.

Authors:Zheyuan Zhang, Jingjing Sun, Dorian Peters, Rafael A. Calvo
Title: Beyond Wellbeing Apps: Co-Designing Immersive, Embodied, and Collective Digital Wellbeing Interventions for Healthcare Professionals
Abstract:
Healthcare professionals (HCPs) face increasing levels of stress and burnout. Technological wellbeing interventions provide accessible and flexible support for HCPs. While most studies have focused on mobile- and web-based programs, alternative technologies like virtual reality (VR), augmented reality (AR), tangible interfaces, and embodied technologies are emerging as engaging and effective tools for wellbeing interventions. However, there is still a lack of research on how such technologies are perceived among HCPs. This study explored HCPs' perceptions and preferences for various types of wellbeing technologies, by conducting a 2-phase co-design study involving 26 HCPs in idea generation, concept evaluation, prototype testing, and design iteration. From our findings, HCPs highly valued the potential of technologies to support mental health with immersive, embodied, and collective experiences. Furthermore, we provided design recommendations for wellbeing technologies for HCPs that sustain user engagement by meeting their needs for autonomy, competence, and relatedness in the experiences.

Authors:Timoteo Kelly, Abdulkadir Korkmaz, Samuel Mallet, Connor Souders, Sadra Aliakbarpour, Praveen Rao
Title: HARPT: A Corpus for Analyzing Consumers' Trust and Privacy Concerns in Electronic Health Apps
Abstract:
We present Health App Reviews for Privacy & Trust (HARPT), a large-scale annotated corpus of user reviews from Electronic Health (eHealth) applications (apps) aimed at advancing research in user privacy and trust. The dataset comprises 480K user reviews labeled in seven categories that capture critical aspects of trust in applications (TA), trust in providers (TP), and privacy concerns (PC). Our multistage strategy integrated keyword-based filtering, iterative manual labeling with review, targeted data augmentation, and weak supervision using transformer-based classifiers. In parallel, we manually annotated a curated subset of 7,000 reviews to support the development and evaluation of machine learning models. We benchmarked a broad range of models, providing a baseline for future work. HARPT is released under an open resource license to support reproducible research in usable privacy and trust in digital libraries and health informatics.

Authors:Marianne Bossema, Somaya Ben Allouch, Aske Plaat, Rob Saunders
Title: LLM-enhanced Interactions in Human-Robot Collaborative Drawing with Older Adults
Abstract:
The goal of this study is to identify factors that support and enhance older adults' creative experiences in human-robot co-creativity. Because the research into the use of robots for creativity support with older adults remains underexplored, we carried out an exploratory case study. We took a participatory approach and collaborated with professional art educators to design a course Drawing with Robots for adults aged 65 and over. The course featured human-human and human-robot drawing activities with various types of robots. We observed collaborative drawing interactions, interviewed participants on their experiences, and analyzed collected data. Findings show that participants preferred acting as curators, evaluating creative suggestions from the robot in a teacher or coach role. When we enhanced a robot with a multimodal Large Language Model (LLM), participants appreciated its spoken dialogue capabilities. They reported however, that the robot's feedback sometimes lacked an understanding of the context, and sensitivity to their artistic goals and preferences. Our findings highlight the potential of LLM-enhanced robots to support creativity and offer future directions for advancing human-robot co-creativity with older adults.

Authors:Emerson Sie, Enguang Fan, Federico Cifuentes-Urtubey, Deepak Vasisht
Title: Crowdsourcing Ubiquitous Indoor Localization with Non-Cooperative Wi-Fi Ranging
Abstract:
Indoor localization opens the path to potentially transformative applications. Although many indoor localization methods have been proposed over the years, they remain too impractical for widespread deployment in the real world. In this paper, we introduce PeepLoc, a deployable and scalable Wi-Fi-based solution for indoor localization that relies only on pre-existing devices and infrastructure. Specifically, PeepLoc works on any mobile device with an unmodified Wi-Fi transceiver and in any indoor environment with a sufficient number of Wi-Fi access points (APs) and pedestrian traffic. At the core of PeepLoc is (a) a mechanism which allows any Wi-Fi device to obtain non-cooperative time-of-flight (ToF) to any Wi-Fi AP and (b) a novel bootstrapping mechanism that relies on pedestrian dead reckoning (PDR) and crowdsourcing to opportunistically initialize pre-existing APs as anchor points within an environment. We implement PeepLoc using commodity hardware and evaluate it extensively across 4 campus buildings. We show PeepLoc leads to a mean and median positional error of 3.41 m and 3.06 m respectively, which is superior to existing deployed indoor localization systems and is competitive with commodity GPS in outdoor environments.

Authors:Ji-Youn Jung, Devansh Saxena, Minjung Park, Jini Kim, Jodi Forlizzi, Kenneth Holstein, John Zimmerman
Title: Making the Right Thing: Bridging HCI and Responsible AI in Early-Stage AI Concept Selection
Abstract:
AI projects often fail due to financial, technical, ethical, or user acceptance challenges -- failures frequently rooted in early-stage decisions. While HCI and Responsible AI (RAI) research emphasize this, practical approaches for identifying promising concepts early remain limited. Drawing on Research through Design, this paper investigates how early-stage AI concept sorting in commercial settings can reflect RAI principles. Through three design experiments -- including a probe study with industry practitioners -- we explored methods for evaluating risks and benefits using multidisciplinary collaboration. Participants demonstrated strong receptivity to addressing RAI concerns early in the process and effectively identified low-risk, high-benefit AI concepts. Our findings highlight the potential of a design-led approach to embed ethical and service design thinking at the front end of AI innovation. By examining how practitioners reason about AI concepts, our study invites HCI and RAI communities to see early-stage innovation as a critical space for engaging ethical and commercial considerations together.

Authors:Samuel Rhys Cox, Helena Bøjer Djernæs, Niels van Berkel
Title: Reflecting Human Values in XAI: Emotional and Reflective Benefits in Creativity Support Tools
Abstract:
In this workshop paper, we discuss the potential for measures of user-centric benefits (such as emotional well-being) that could be explored when evaluating explainable AI (XAI) systems within the arts. As a background to this, we draw from our recent review of creativity support tool (CST) evaluations, that found a paucity of studies evaluating CSTs for user-centric measures that benefit the user themselves. Specifically, we discuss measures of: (1) developing intrinsic abilities, (2) emotional well-being, (3) self-reflection, and (4) self-perception. By discussing these user-centric measures within the context of XAI and the arts, we wish to provoke discussion regarding the potential of such measures.

Authors:Zhiqing Wang, Haoxiang Fan, Shiwei Wu, Qiaoyi Chen, Yongqi Liang, Zhenhui Peng
Title: Exploring the Usage of Generative AI for Group Project-Based Offline Art Courses in Elementary Schools
Abstract:
The integration of Generative Artificial Intelligence (GenAI) in K-6 project-based art courses presents both opportunities and challenges for enhancing creativity, engagement, and group collaboration. This study introduces a four-phase field study, involving in total two experienced K-6 art teachers and 132 students in eight offline course sessions, to investigate the usage and impact of GenAI. Specifically, based on findings in Phases 1 and 2, we developed AskArt, an interactive interface that combines DALL-E and GPT and is tailored to support elementary school students in their art projects, and deployed it in Phases 3 and 4. Our findings revealed the benefits of GenAI in providing background information, inspirations, and personalized guidance. However, challenges in query formulation for generating expected content were also observed. Moreover, students employed varied collaboration strategies, and teachers noted increased engagement alongside concerns regarding misuse and interface suitability. This study offers insights into the effective integration of GenAI in elementary education, presents AskArt as a practical tool, and provides recommendations for educators and researchers to enhance project-based learning with GenAI technologies.

Authors:Qixin Wang, Songtao Zhou, Zeyu Jin, Chenglin Guo, Shikun Sun, Xiaoyu Qin
Title: V-CASS: Vision-context-aware Expressive Speech Synthesis for Enhancing User Understanding of Videos
Abstract:
Automatic video commentary systems are widely used on multimedia social media platforms to extract factual information about video content. However, current systems may overlook essential para-linguistic cues, including emotion and attitude, which are critical for fully conveying the meaning of visual content. The absence of these cues can limit user understanding or, in some cases, distort the video's original intent. Expressive speech effectively conveys these cues and enhances the user's comprehension of videos. Building on these insights, this paper explores the usage of vision-context-aware expressive speech in enhancing users' understanding of videos in video commentary systems. Firstly, our formatting study indicates that semantic-only speech can lead to ambiguity, and misaligned emotions between speech and visuals may distort content interpretation. To address this, we propose a method called vision-context-aware speech synthesis (V-CASS). It analyzes para-linguistic cues from visuals using a vision-language model and leverages a knowledge-infused language model to guide the expressive speech model in generating context-aligned speech. User studies show that V-CASS enhances emotional and attitudinal resonance, as well as user audio-visual understanding and engagement, with 74.68% of participants preferring the system. Finally, we explore the potential of our method in helping blind and low-vision users navigate web videos, improving universal accessibility.

Authors:Jonathan Zong, Isabella Pedraza Pineros, Mengzhu Katie Chen, Daniel Hajas, Arvind Satyanarayan
Title: Semantic Scaffolding: Augmenting Textual Structures with Domain-Specific Groupings for Accessible Data Exploration
Abstract:
Drawing connections between interesting groupings of data and their real-world meaning is an important, yet difficult, part of encountering a new dataset. A lay reader might see an interesting visual pattern in a chart but lack the domain expertise to explain its meaning. Or, a reader might be familiar with a real-world concept but struggle to express it in terms of a dataset's fields. In response, we developed semantic scaffolding, a technique for using domain-specific information from large language models (LLMs) to identify, explain, and formalize semantically meaningful data groupings. We present groupings in two ways: as semantic bins, which segment a field into domain-specific intervals and categories; and data highlights, which annotate subsets of data records with their real-world meaning. We demonstrate and evaluate this technique in Olli, an accessible visualization tool that exemplifies tensions around explicitly defining groupings while respecting the agency of readers to conduct independent data exploration. We conducted a study with 15 blind and low-vision (BLV) users and found that readers used semantic scaffolds to quickly understand the meaning of the data, but were often also critically aware of its influence on their interpretation.

Authors:Isabella Pu, Prerna Ravi, Linh Dieu Dinh, Chelsea Joe, Caitlin Ogoe, Zixuan Li, Cynthia Breazeal, Anastasia K. Ostrowski
Title: "How can we learn and use AI at the same time?": Participatory Design of GenAI with High School Students
Abstract:
As generative AI (GenAI) emerges as a transformative force, clear understanding of high school students' perspectives is essential for GenAI's meaningful integration in high school environments. In this work, we draw insights from a participatory design workshop where we engaged 17 high school students -- a group rarely involved in prior research in this area -- through the design of novel GenAI tools and school policies addressing their key concerns. Students identified challenges and developed solutions outlining their ideal features in GenAI tools, appropriate school use, and regulations. These centered around the problem spaces of combating bias & misinformation, tackling crime & plagiarism, preventing over-reliance on AI, and handling false accusations of academic dishonesty. Building on our participants' underrepresented perspectives, we propose new guidelines targeted at educational technology designers for development of GenAI technologies in high schools. We also argue for further incorporation of student voices in development of AI policies in their schools.

Authors:Lucile Favero, Daniel Frases, Juan Antonio Pérez-Ortiz, Tanja Käser, Nuria Oliver
Title: ELLIS Alicante at CQs-Gen 2025: Winning the critical thinking questions shared task: LLM-based question generation and selection
Abstract:
The widespread adoption of chat interfaces based on Large Language Models (LLMs) raises concerns about promoting superficial learning and undermining the development of critical thinking skills. Instead of relying on LLMs purely for retrieving factual information, this work explores their potential to foster deeper reasoning by generating critical questions that challenge unsupported or vague claims in debate interventions. This study is part of a shared task of the 12th Workshop on Argument Mining, co-located with ACL 2025, focused on automatic critical question generation. We propose a two-step framework involving two small-scale open source language models: a Questioner that generates multiple candidate questions and a Judge that selects the most relevant ones. Our system ranked first in the shared task competition, demonstrating the potential of the proposed LLM-based approach to encourage critical engagement with argumentative texts.

Authors:Anders Giovanni Møller, Daniel M. Romero, David Jurgens, Luca Maria Aiello
Title: The Impact of Generative AI on Social Media: An Experimental Study
Abstract:
Generative Artificial Intelligence (AI) tools are increasingly deployed across social media platforms, yet their implications for user behavior and experience remain understudied, particularly regarding two critical dimensions: (1) how AI tools affect the behaviors of content producers in a social media context, and (2) how content generated with AI assistance is perceived by users. To fill this gap, we conduct a controlled experiment with a representative sample of 680 U.S. participants in a realistic social media environment. The participants are randomly assigned to small discussion groups, each consisting of five individuals in one of five distinct experimental conditions: a control group and four treatment groups, each employing a unique AI intervention-chat assistance, conversation starters, feedback on comment drafts, and reply suggestions. Our findings highlight a complex duality: some AI-tools increase user engagement and volume of generated content, but at the same time decrease the perceived quality and authenticity of discussion, and introduce a negative spill-over effect on conversations. Based on our findings, we propose four design principles and recommendations aimed at social media platforms, policymakers, and stakeholders: ensuring transparent disclosure of AI-generated content, designing tools with user-focused personalization, incorporating context-sensitivity to account for both topic and user intent, and prioritizing intuitive user interfaces. These principles aim to guide an ethical and effective integration of generative AI into social media.

Authors:Laura Aymerich-Franch, Tarek Taha, Takahiro Miyashita, Hiroko Kamide, Hiroshi Ishiguro, Paolo Dario
Title: Public Acceptance of Cybernetic Avatars in the service sector: Evidence from a Large-Scale Survey in Dubai
Abstract:
Cybernetic avatars are hybrid interaction robots or digital representations that combine autonomous capabilities with teleoperated control. This study investigates the acceptance of cybernetic avatars in the highly multicultural society of Dubai, with particular emphasis on robotic avatars for customer service. Specifically, we explore how acceptance varies as a function of robot appearance (e.g., android, robotic-looking, cartoonish), deployment settings (e.g., shopping malls, hotels, hospitals), and functional tasks (e.g., providing information, patrolling). To this end, we conducted a large-scale survey with over 1,000 participants. Overall, cybernetic avatars received a high level of acceptance, with physical robot avatars receiving higher acceptance than digital avatars. In terms of appearance, robot avatars with a highly anthropomorphic robotic appearance were the most accepted, followed by cartoonish designs and androids. Animal-like appearances received the lowest level of acceptance. Among the tasks, providing information and guidance was rated as the most valued. Shopping malls, airports, public transport stations, and museums were the settings with the highest acceptance, whereas healthcare-related spaces received lower levels of support. An analysis by community cluster revealed among others that Emirati respondents showed significantly greater acceptance of android appearances compared to the overall sample, while participants from the 'Other Asia' cluster were significantly more accepting of cartoonish appearances. Our study underscores the importance of incorporating citizen feedback into the design and deployment of cybernetic avatars from the early stages to enhance acceptance of this technology in society.

Authors:Bernhard Hilpert, Muhan Hou, Kim Baraka, Joost Broekens
Title: Can you see how I learn? Human observers' inferences about Reinforcement Learning agents' learning processes
Abstract:
Reinforcement Learning (RL) agents often exhibit learning behaviors that are not intuitively interpretable by human observers, which can result in suboptimal feedback in collaborative teaching settings. Yet, how humans perceive and interpret RL agent's learning behavior is largely unknown. In a bottom-up approach with two experiments, this work provides a data-driven understanding of the factors of human observers' understanding of the agent's learning process. A novel, observation-based paradigm to directly assess human inferences about agent learning was developed. In an exploratory interview study (\textit{N}=9), we identify four core themes in human interpretations: Agent Goals, Knowledge, Decision Making, and Learning Mechanisms. A second confirmatory study (\textit{N}=34) applied an expanded version of the paradigm across two tasks (navigation/manipulation) and two RL algorithms (tabular/function approximation). Analyses of 816 responses confirmed the reliability of the paradigm and refined the thematic framework, revealing how these themes evolve over time and interrelate. Our findings provide a human-centered understanding of how people make sense of agent learning, offering actionable insights for designing interpretable RL systems and improving transparency in Human-Robot Interaction.

Authors:Zhenning Yang, Archit Bhatnagar, Yiming Qiu, Tongyuan Miao, Patrick Tser Jern Kon, Yunming Xiao, Yibo Huang, Martin Casado, Ang Chen
Title: Cloud Infrastructure Management in the Age of AI Agents
Abstract:
Cloud infrastructure is the cornerstone of the modern IT industry. However, managing this infrastructure effectively requires considerable manual effort from the DevOps engineering team. We make a case for developing AI agents powered by large language models (LLMs) to automate cloud infrastructure management tasks. In a preliminary study, we investigate the potential for AI agents to use different cloud/user interfaces such as software development kits (SDK), command line interfaces (CLI), Infrastructure-as-Code (IaC) platforms, and web portals. We report takeaways on their effectiveness on different management tasks, and identify research challenges and potential solutions.

Authors:Ana Müller, Anja Richert
Title: The Space Between Us: A Methodological Framework for Researching Bonding and Proxemics in Situated Group-Agent Interactions
Abstract:
This paper introduces a multimethod framework for studying spatial and social dynamics in real-world group-agent interactions with socially interactive agents. Drawing on proxemics and bonding theories, the method combines subjective self-reports and objective spatial tracking. Applied in two field studies in a museum (N = 187) with a robot and a virtual agent, the paper addresses the challenges in aligning human perception and behavior. We focus on presenting an open source, scalable, and field-tested toolkit for future studies.

Authors:Gaspard Merten, Gilles Dejaegere, Mahmoud Sakr
Title: GeoPandas-AI: A Smart Class Bringing LLM as Stateful AI Code Assistant
Abstract:
Geospatial data analysis plays a crucial role in tackling intricate societal challenges such as urban planning and climate modeling. However, employing tools like GeoPandas, a prominent Python library for geospatial data manipulation, necessitates expertise in complex domain-specific syntax and workflows. GeoPandas-AI addresses this gap by integrating LLMs directly into the GeoPandas workflow, transforming the GeoDataFrame class into an intelligent, stateful class for both data analysis and geospatial code development. This paper formalizes the design of such a smart class and provides an open-source implementation of GeoPandas-AI in PyPI package manager. Through its innovative combination of conversational interfaces and stateful exploitation of LLMs for code generation and data analysis, GeoPandas-AI introduces a new paradigm for code-copilots and instantiates it for geospatial development.

Authors:Kevin Cogan, Vuong M. Ngo, Mark Roantree
Title: Developing a Dyslexia Indicator Using Eye Tracking
Abstract:
Dyslexia, affecting an estimated 10% to 20% of the global population, significantly impairs learning capabilities, highlighting the need for innovative and accessible diagnostic methods. This paper investigates the effectiveness of eye-tracking technology combined with machine learning algorithms as a cost-effective alternative for early dyslexia detection. By analyzing general eye movement patterns, including prolonged fixation durations and erratic saccades, we proposed an enhanced solution for determining eye-tracking-based dyslexia features. A Random Forest Classifier was then employed to detect dyslexia, achieving an accuracy of 88.58\%. Additionally, hierarchical clustering methods were applied to identify varying severity levels of dyslexia. The analysis incorporates diverse methodologies across various populations and settings, demonstrating the potential of this technology to identify individuals with dyslexia, including those with borderline traits, through non-invasive means. Integrating eye-tracking with machine learning represents a significant advancement in the diagnostic process, offering a highly accurate and accessible method in clinical research.

Authors:Ritik Batra, Lydia Kim, Ilan Mandel, Amritansh Kwatra, Jane L. E., Steven J. Jackson, Thijs Roumen
Title: (De)composing Craft: An Elementary Grammar for Sharing Expertise in Craft Workflows
Abstract:
Craft practices rely on evolving archives of skill and knowledge, developed through generations of craftspeople experimenting with designs, materials, and techniques. Better documentation of these practices enables the sharing of knowledge and expertise between sites and generations. However, most documentation focuses solely on the linear steps leading to final artifacts, neglecting the tacit knowledge necessary to improvise, or adapt workflows to meet the unique demands of each craft project. This omission limits knowledge sharing and reduces craft to a mechanical endeavor, rather than a sophisticated way of seeing, thinking, and doing. Drawing on expert interviews and literature from HCI, CSCW and the social sciences, we develop an elementary grammar to document improvisational actions of real-world craft practices. We demonstrate the utility of this grammar with an interface called CraftLink that can be used to analyze expert videos and semi-automatically generate documentation to convey material and contextual variations of craft practices. Our user study with expert crocheters (N=7) using this interface evaluates our grammar's effectiveness in capturing and sharing expert knowledge with other craftspeople, offering new pathways for computational systems to support collaborative archives of knowledge and practice within communities.

Authors:Aakriti Kumar, Nalin Poungpeth, Diyi Yang, Erina Farrell, Bruce Lambert, Matthew Groh
Title: When Large Language Models are Reliable for Judging Empathic Communication
Abstract:
Large language models (LLMs) excel at generating empathic responses in text-based conversations. But, how reliably do they judge the nuances of empathic communication? We investigate this question by comparing how experts, crowdworkers, and LLMs annotate empathic communication across four evaluative frameworks drawn from psychology, natural language processing, and communications applied to 200 real-world conversations where one speaker shares a personal problem and the other offers support. Drawing on 3,150 expert annotations, 2,844 crowd annotations, and 3,150 LLM annotations, we assess inter-rater reliability between these three annotator groups. We find that expert agreement is high but varies across the frameworks' sub-components depending on their clarity, complexity, and subjectivity. We show that expert agreement offers a more informative benchmark for contextualizing LLM performance than standard classification metrics. Across all four frameworks, LLMs consistently approach this expert level benchmark and exceed the reliability of crowdworkers. These results demonstrate how LLMs, when validated on specific tasks with appropriate benchmarks, can support transparency and oversight in emotionally sensitive applications including their use as conversational companions.

Authors:Qihan Yang, Xin Zhou, Adam J. Spiers
Title: Investigating the Perception of Translational Shape-Changing Haptic Interfaces
Abstract:
Shape-changing haptic interfaces (SCHIs) are a promising and emerging field. However, compared to more established stimulus modalities, such as vibration, there is sparse literature on the perception of dynamic shapes. Furthermore, the influence of properties such as grasp types and displacement magnitude/direction has not been formally evaluated. This work attempts to initiate a formal perceptual evaluation of SCHIs via a psychophysical user study involving a 1-DOF translational shape-changing interface that can move its body with 1.25-micrometer resolution. Participants completed a Method of Constant Stimulus study while holding the device with three different grasps. Stimuli direction occurred both toward and away from the thumb, while the standard stimuli varied between small (0.48 mm) and large (6 mm). Our results indicate that translational SCHIs should maximize the translation magnitude rather than the number of fingers in contact. We also demonstrated how to apply our findings to real-world applications via a simple 'paddle game', where we compared conventional linear mapping with non-linear mapping derived from our perceptual experiment outcomes between the device position and its represented value. Results indicate that the non-linear mapping was more effective, with improved error distribution. We hope this work inspires further formal perceptual investigation into other SCHI morphologies.

Authors:Griffin Pitts, Sanaz Motamedi
Title: Understanding Human-AI Trust in Education
Abstract:
As AI chatbots become increasingly integrated in education, students are turning to these systems for guidance, feedback, and information. However, the anthropomorphic characteristics of these chatbots create ambiguity regarding whether students develop trust toward them as they would a human peer or instructor, based in interpersonal trust, or as they would any other piece of technology, based in technology trust. This ambiguity presents theoretical challenges, as interpersonal trust models may inappropriately ascribe human intentionality and morality to AI, while technology trust models were developed for non-social technologies, leaving their applicability to anthropomorphic systems unclear. To address this gap, we investigate how human-like and system-like trusting beliefs comparatively influence students' perceived enjoyment, trusting intention, behavioral intention to use, and perceived usefulness of an AI chatbot - factors associated with students' engagement and learning outcomes. Through partial least squares structural equation modeling, we found that human-like and system-like trust significantly influenced student perceptions, with varied effects. Human-like trust more strongly predicted trusting intention, while system-like trust better predicted behavioral intention and perceived usefulness. Both had similar effects on perceived enjoyment. Given the partial explanatory power of each type of trust, we propose that students develop a distinct form of trust with AI chatbots (human-AI trust) that differs from human-human and human-technology models of trust. Our findings highlight the need for new theoretical frameworks specific to human-AI trust and offer practical insights for fostering appropriately calibrated trust, which is critical for the effective adoption and pedagogical impact of AI in education.

Authors:Tanjil Hasan Sakib, Samia Jahan Mojumder, Rajan Das Gupta, Md Imrul Hasan Showmick, Md. Yeasin Rahat, Md. Jakir Hossen
Title: Real-Time Confidence Detection through Facial Expressions and Hand Gestures
Abstract:
Real-time face orientation recognition is a cutting-edge technology meant to track and analyze facial movements in virtual environments such as online interviews, remote meetings, and virtual classrooms. As the demand for virtual interactions grows, it becomes increasingly important to measure participant engagement, attention, and overall interaction. This research presents a novel solution that leverages the Media Pipe Face Mesh framework to identify facial landmarks and extract geometric data for calculating Euler angles, which determine head orientation in real time. The system tracks 3D facial landmarks and uses this data to compute head movements with a focus on accuracy and responsiveness. By studying Euler angles, the system can identify a user's head orientation with an accuracy of 90\%, even at a distance of up to four feet. This capability offers significant enhancements for monitoring user interaction, allowing for more immersive and interactive virtual ex-periences. The proposed method shows its reliability in evaluating participant attentiveness during online assessments and meetings. Its application goes beyond engagement analysis, potentially providing a means for improving the quality of virtual communication, fostering better understanding between participants, and ensuring a higher level of interaction in digital spaces. This study offers a basis for future developments in enhancing virtual user experiences by integrating real-time facial tracking technologies, paving the way for more adaptive and interactive web-based platform.

Authors:Tauhid Tanjim, Jonathan St. George, Kevin Ching, Angelique Taylor
Title: Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams
Abstract:
The human-robot interaction (HRI) field has recognized the importance of enabling robots to interact with teams. Human teams rely on effective communication for successful collaboration in time-sensitive environments. Robots can play a role in enhancing team coordination through real-time assistance. Despite significant progress in human-robot teaming research, there remains an essential gap in how robots can effectively communicate with action teams using multimodal interaction cues in time-sensitive environments. This study addresses this knowledge gap in an experimental in-lab study to investigate how multimodal robot communication in action teams affects workload and human perception of robots. We explore team collaboration in a medical training scenario where a robotic crash cart (RCC) provides verbal and non-verbal cues to help users remember to perform iterative tasks and search for supplies. Our findings show that verbal cues for object search tasks and visual cues for task reminders reduce team workload and increase perceived ease of use and perceived usefulness more effectively than a robot with no feedback. Our work contributes to multimodal interaction research in the HRI field, highlighting the need for more human-robot teaming research to understand best practices for integrating collaborative robots in time-sensitive environments such as in hospitals, search and rescue, and manufacturing applications.

Authors:Tauhid Tanjim, Promise Ekpo, Huajie Cao, Jonathan St. George, Kevin Ching, Hee Rin Lee, Angelique Taylor
Title: Human-Robot Teaming Field Deployments: A Comparison Between Verbal and Non-verbal Communication
Abstract:
Healthcare workers (HCWs) encounter challenges in hospitals, such as retrieving medical supplies quickly from crash carts, which could potentially result in medical errors and delays in patient care. Robotic crash carts (RCCs) have shown promise in assisting healthcare teams during medical tasks through guided object searches and task reminders. Limited exploration has been done to determine what communication modalities are most effective and least disruptive to patient care in real-world settings. To address this gap, we conducted a between-subjects experiment comparing the RCC's verbal and non-verbal communication of object search with a standard crash cart in resuscitation scenarios to understand the impact of robot communication on workload and attitudes toward using robots in the workplace. Our findings indicate that verbal communication significantly reduced mental demand and effort compared to visual cues and with a traditional crash cart. Although frustration levels were slightly higher during collaborations with the robot compared to a traditional cart, these research insights provide valuable implications for human-robot teamwork in high-stakes environments.

Authors:Rajan Das Gupta, Ashikur Rahman, Md Imrul Hasan Showmick, Md. Yeasin Rahat, Md. Jakir Hossen
Title: Exploring the Convergence of HCI and Evolving Technologies in Information Systems
Abstract:
Modern technology driven information systems are part of our daily lives. However, this deep integration poses new challenges to the human computer interaction (HCI) professionals. With the rapid growth of mobile and cloud computing and the Internet of Things (IoT), the demand for HCI specialists to design user-friendly and adaptable interfaces has never been more pressing. Especially for diverse user groups such as children, the elderly and people with disabilities who need interfaces tailored to their needs regardless of time and location. This study reviewed 50 recent papers on HCI interface design for modern information systems. The goal is to see how well these methods address the demands of current technology. The findings show that most HCI design methods are still based on old desktop models and do not support mobile users and location-based services well. Most existing interface design guidelines do not align with the flexibility and dynamism of emerging technologies. The goal of this study is to improve interface design by combining agile methodologies with human-centered design principles. Future studies should also incorporate both qualitative and quantitative approaches, particularly in the context of cloud-based technologies and organizational information systems. This approach aims to bridge the gap between current interface design practices and the changing technological landscape.

Authors:Ruanqianqian Huang, Ayana Monroe, Peli de Halleux, Sorin Lerner, Nikolaj Bjørner
Title: Z3Guide: A Scalable, Student-Centered, and Extensible Educational Environment for Logic Modeling
Abstract:
Constraint-satisfaction problems (CSPs) are ubiquitous, ranging from budgeting for grocery shopping to verifying software behavior. Logic modeling helps solve CSPs programmatically using SMT solvers. Despite its importance in many Computer Science disciplines, resources for teaching and learning logic modeling are scarce and scattered, and challenges remain in designing educational environments for logic modeling that are accessible and meet the needs of teachers and students. This paper explores how to design such an environment and probes the impact of the design on the learning experience. From a need-finding interview study and a design iteration with teachers of logic modeling, we curated 10 design guidelines spanning three main requirements: providing easy access, supporting various educational modalities, and allowing extensions for customized pedagogical needs. We implemented nine guidelines in Z3Guide, an open-source browser-based tool. Using Z3Guide in a logic modeling learning workshop with more than 100 students, we gathered positive feedback on its support for learning and identified opportunities for future improvements.

Authors:Fan Yang, Yuan Tian, Jiansong Zhang
Title: Supporting Construction Worker Well-Being with a Multi-Agent Conversational AI System
Abstract:
The construction industry is characterized by both high physical and psychological risks, yet supports of mental health remain limited. While advancements in artificial intelligence (AI), particularly large language models (LLMs), offer promising solutions, their potential in construction remains largely underexplored. To bridge this gap, we developed a conversational multi-agent system that addresses industry-specific challenges through an AI-driven approach integrated with domain knowledge. In parallel, it fulfills construction workers' basic psychological needs by enabling interactions with multiple agents, each has a distinct persona. This approach ensures that workers receive both practical problem-solving support and social engagement, ultimately contributing to their overall well-being. We evaluate its usability and effectiveness through a within-subjects user study with 12 participants. The results show that our system significantly outperforms the single-agent baseline, achieving improvements of 18% in usability, 40% in self-determination, 60% in social presence, and 60% in trust. These findings highlight the promise of LLM-driven AI systems in providing domain-specific support for construction workers.

Authors:Hritom Das, Imran Fahad, SNB Tushar, Sk Hasibul Alam, Graham Buchanan, Danny Scott, Garrett S. Rose, Sai Swaminathan
Title: In-Sensor Motion Recognition with Memristive System and Light Sensing Surfaces
Abstract:
In this paper, we introduce a novel device architecture that merges memristive devices with light-sensing surfaces, for energy-efficient motion recognition at the edge. Our light-sensing surface captures motion data through in-sensor computation. This data is then processed using a memristive system equipped with a HfO2-based synaptic device, coupled with a winner-take-all (WTA) circuit, tailored for low-power motion classification tasks. We validate our end-to-end system using four distinct human hand gestures - left-to-right, right-to-left, bottom-to-top, and top-to-bottom movements - to assess energy efficiency and classification robustness. Our experiments show that the system requires an average of only 4.17 nJ for taking our processed analog signal and mapping weights onto our memristive system and 0.952 nJ for testing per movement class, achieving 97.22% accuracy even under 5% noise interference. A key advantage of our proposed architecture is its low energy requirement, enabling the integration of energy-harvesting solutions such as solar power for sustainable autonomous operation. Additionally, our approach enhances data privacy by processing data locally, reducing the need for external data transmission and storage.

Authors:Dipto Das, Shion Guha, Bryan Semaan
Title: How do datasets, developers, and models affect biases in a low-resourced language?
Abstract:
Sociotechnical systems, such as language technologies, frequently exhibit identity-based biases. These biases exacerbate the experiences of historically marginalized communities and remain understudied in low-resource contexts. While models and datasets specific to a language or with multilingual support are commonly recommended to address these biases, this paper empirically tests the effectiveness of such approaches in the context of gender, religion, and nationality-based identities in Bengali, a widely spoken but low-resourced language. We conducted an algorithmic audit of sentiment analysis models built on mBERT and BanglaBERT, which were fine-tuned using all Bengali sentiment analysis (BSA) datasets from Google Dataset Search. Our analyses showed that BSA models exhibit biases across different identity categories despite having similar semantic content and structure. We also examined the inconsistencies and uncertainties arising from combining pre-trained models and datasets created by individuals from diverse demographic backgrounds. We connected these findings to the broader discussions on epistemic injustice, AI alignment, and methodological decisions in algorithmic audits.

Authors:Dipto Das, Syed Ishtiaque Ahmed, Shion Guha
Title: BTPD: A Multilingual Hand-curated Dataset of Bengali Transnational Political Discourse Across Online Communities
Abstract:
Understanding political discourse in online spaces is crucial for analyzing public opinion and ideological polarization. While social computing and computational linguistics have explored such discussions in English, such research efforts are significantly limited in major yet under-resourced languages like Bengali due to the unavailability of datasets. In this paper, we present a multilingual dataset of Bengali transnational political discourse (BTPD) collected from three online platforms, each representing distinct community structures and interaction dynamics. Besides describing how we hand-curated the dataset through community-informed keyword-based retrieval, this paper also provides a general overview of its topics and multilingual content.

Authors:Imran Fahad, Danny Scott, Azizul Zahid, Matthew Bringle, Srinayana Patil, Ella Bevins, Carmen Palileo, Sai Swaminathan
Title: RadioGami: Batteryless, Long-Range Wireless Paper Sensors Using Tunnel Diodes
Abstract:
Paper-based interactive RF devices have opened new possibilities for wireless sensing, yet they are typically constrained by short operational ranges. This paper introduces RadioGami, a method for creating long-range, batteryless RF sensing surfaces on paper using low-cost, DIY materials like copper tape, paper, and off-the-shelf electronics paired with an affordable radio receiver (approx. $20). We explore the design space enabled by RadioGami, including sensing paper deformations like bending, tearing, and origami patterns (Miura, Kresling) at ranges up to 45.73 meters. RadioGami employs a novel ultra-low power (35uW) switching circuit with a tunnel diode for wireless functionality. These surfaces can sustainably operate by harvesting energy using tiny photodiodes. We demonstrate applications that monitor object status, track user interactions (rotation, sliding), and detect environmental changes. We characterize performance, sensitivity, range, and power consumption with deployment studies. RadioGami advances sustainable, tangible, and batteryless interfaces for embodied interaction.

Authors:Weiyan Shi, Kenny Tsu Wei Choo
Title: Human-AI Alignment of Multimodal Large Language Models with Speech-Language Pathologists in Parent-Child Interactions
Abstract:
Joint attention is a critical marker of early social-communicative development, yet remains difficult for caregivers to assess without expert guidance. In this work, we explore how multimodal large language models (MLLMs) can be aligned with the reasoning processes of speech-language pathologists (SLPs) to support the interpretation of everyday parent-child interactions. We conducted in-depth interviews and video annotation studies with three experienced SLPs to uncover how they evaluate joint attention based on three core behavioural cues: gaze, action, and vocalisation. Using these insights, we developed a two-stage MLLM-based system that first extracts fine-grained behavioural descriptions from video segments and then judge joint attention quality using expert-aligned prompts. Our evaluation across 26 parent-child interaction videos shows that MLLMs can achieve up to 85% accuracy in perceptual cue extraction and over 75% average precision in simulating expert judgement. We further propose design guidelines for building MLLM-based behaviour observation-judgement systems that align with SLPs, emphasising the structuring of behavioural cues, the construction of exemplar libraries grounded in expert annotations, and the need to personalise system responses based on developmental stage and neurotypical or atypical presentation. This work provides structured behavioural cues derived from SLP expertise, demonstrates the feasibility of aligning SLPs observation and judgement using MLLMs, and offers practical design guidelines for building aligned systems to support parent-child interaction analysis.

Authors:Botao Amber Hu, Helena Rong
Title: Spore in the Wild: A Case Study of Spore.fun as an Open-Environment Evolution Experiment with Sovereign AI Agents on TEE-Secured Blockchains
Abstract:
In Artificial Life (ALife) research, replicating Open-Ended Evolution (OEE)-the continuous emergence of novelty observed in biological life-has usually been pursued within isolated, closed system simulations, such as Tierra and Avida, which have typically plateaued after an initial burst of novelty, failing to achieve sustained OEE. Scholars suggest that OEE requires an open-environment system that continually exchanges information or energy with its environment. A recent technological innovation in Decentralized Physical Infrastructure Network (DePIN), which provides permissionless computational substrates, enables the deployment of Large Language Model-based AI agents on blockchains integrated with Trusted Execution Environments (TEEs). This enables on-chain agents to operate autonomously "in the wild," achieving self-sovereignty without human oversight. These agents can control their own social media accounts and cryptocurrency wallets, allowing them to interact directly with blockchain-based financial networks and broader human social media. Building on this new paradigm of on-chain agents, Spore.fun is a recent real-world AI evolution experiment that enables autonomous breeding and evolution of new on-chain agents. This paper presents a detailed case study of Spore.fun, examining agent behaviors and their evolutionary trajectories through digital ethology. We aim to spark discussion about whether open-environment ALife systems "in the wild," based on permissionless computational substrates and driven by economic incentives to interact with their environment, could finally achieve the long-sought goal of OEE.

Authors:Junling Wang, Anna Rutkiewicz, April Yi Wang, Mrinmaya Sachan
Title: Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models
Abstract:
Visuals are valuable tools for teaching math word problems (MWPs), helping young learners interpret textual descriptions into mathematical expressions before solving them. However, creating such visuals is labor-intensive and there is a lack of automated methods to support this process. In this paper, we present Math2Visual, an automatic framework for generating pedagogically meaningful visuals from MWP text descriptions. Math2Visual leverages a pre-defined visual language and a design space grounded in interviews with math teachers, to illustrate the core mathematical relationships in MWPs. Using Math2Visual, we construct an annotated dataset of 1,903 visuals and evaluate Text-to-Image (TTI) models for their ability to generate visuals that align with our design. We further fine-tune several TTI models with our dataset, demonstrating improvements in educational visual generation. Our work establishes a new benchmark for automated generation of pedagogically meaningful visuals and offers insights into key challenges in producing multimodal educational content, such as the misrepresentation of mathematical relationships and the omission of essential visual elements.

Authors:Hannah Vy Nguyen, Yu-Chun Grace Yen, Omar Shakir, Hang Huynh, Sebastian Gutierrez, June A. Smith, Sheila Jimenez, Salma Abdelgelil, Stephen MacNeil
Title: Feedstack: Layering Structured Representations over Unstructured Feedback to Scaffold Human AI Conversation
Abstract:
Many conversational user interfaces facilitate linear conversations with turn-based dialogue, similar to face-to-face conversations between people. However, digital conversations can afford more than simple back-and-forth; they can be layered with interaction techniques and structured representations that scaffold exploration, reflection, and shared understanding between users and AI systems. We introduce Feedstack, a speculative interface that augments feedback conversations with layered affordances for organizing, navigating, and externalizing feedback. These layered structures serve as a shared representation of the conversation that can surface user intent and reveal underlying design principles. This work represents an early exploration of this vision using a research-through-design approach. We describe system features and design rationale, and present insights from two formative (n=8, n=8) studies to examine how novice designers engage with these layered supports. Rather than presenting a conclusive evaluation, we reflect on Feedstack as a design probe that opens up new directions for conversational feedback systems.

Authors:Bo Peng, Zhiheng Wang, Heyang Gong, Chaochao Lu
Title: IP-Dialog: Evaluating Implicit Personalization in Dialogue Systems with Synthetic Data
Abstract:
In modern dialogue systems, the ability to implicitly infer user backgrounds from conversations and leverage this information for personalized assistance is crucial. However, the scarcity of high-quality data remains a fundamental challenge to evaluating and improving this capability. Traditional dataset construction methods are labor-intensive, resource-demanding, and raise privacy concerns. To address these issues, we propose a novel approach for automatic synthetic data generation and introduce the Implicit Personalized Dialogue (IP-Dialog) benchmark along with a training dataset, covering 10 tasks and 12 user attribute types. Additionally, we develop a systematic evaluation framework with four metrics to assess both attribute awareness and reasoning capabilities. We further propose five causal graphs to elucidate models' reasoning pathways during implicit personalization. Extensive experiments yield insightful observations and prove the reliability of our dataset.

Authors:Émilie Fabre, Katie Seaborn, Shuta Koiwai, Mizuki Watanabe, Paul Riesch
Title: More-than-Human Storytelling: Designing Longitudinal Narrative Engagements with Generative AI
Abstract:
Longitudinal engagement with generative AI (GenAI) storytelling agents is a timely but less charted domain. We explored multi-generational experiences with "Dreamsmithy," a daily dream-crafting app, where participants (N = 28) co-created stories with AI narrator "Makoto" every day. Reflections and interactions were captured through a two-week diary study. Reflexive thematic analysis revealed themes likes "oscillating ambivalence" and "socio-chronological bonding," highlighting the complex dynamics that emerged between individuals and the AI narrator over time. Findings suggest that while people appreciated the personal notes, opportunities for reflection, and AI creativity, limitations in narrative coherence and control occasionally caused frustration. The results underscore the potential of GenAI for longitudinal storytelling, but also raise critical questions about user agency and ethics. We contribute initial empirical insights and design considerations for developing adaptive, more-than-human storytelling systems.

Authors:Jun-Hsiang Yao, Mingzheng Li, Jiayi Liu, Yuxiao Li, Jielin Feng, Jun Han, Qibao Zheng, Jianfeng Feng, Siming Chen
Title: DTBIA: An Immersive Visual Analytics System for Brain-Inspired Research
Abstract:
The Digital Twin Brain (DTB) is an advanced artificial intelligence framework that integrates spiking neurons to simulate complex cognitive functions and collaborative behaviors. For domain experts, visualizing the DTB's simulation outcomes is essential to understanding complex cognitive activities. However, this task poses significant challenges due to DTB data's inherent characteristics, including its high-dimensionality, temporal dynamics, and spatial complexity. To address these challenges, we developed DTBIA, an Immersive Visual Analytics System for Brain-Inspired Research. In collaboration with domain experts, we identified key requirements for effectively visualizing spatiotemporal and topological patterns at multiple levels of detail. DTBIA incorporates a hierarchical workflow - ranging from brain regions to voxels and slice sections - along with immersive navigation and a 3D edge bundling algorithm to enhance clarity and provide deeper insights into both functional (BOLD) and structural (DTI) brain data. The utility and effectiveness of DTBIA are validated through two case studies involving with brain research experts. The results underscore the system's role in enhancing the comprehension of complex neural behaviors and interactions.

Authors:Raffles Xingqi Zhu, Charlie S. Burlingham, Olivier Mercier, Phillip Guan
Title: Errors in Stereo Geometry Induce Distance Misperception
Abstract:
Stereoscopic head-mounted displays (HMDs) render and present binocular images to create an egocentric, 3D percept to the HMD user. Within this render and presentation pipeline there are potential rendering camera and viewing position errors that can induce deviations in the depth and distance that a user perceives compared to the underlying intended geometry. For example, rendering errors can arise when HMD render cameras are incorrectly positioned relative to the assumed centers of projections of the HMD displays and viewing errors can arise when users view stereo geometry from the incorrect location in the HMD eyebox. In this work we present a geometric framework that predicts errors in distance perception arising from inaccurate HMD perspective geometry and build an HMD platform to reliably simulate render and viewing error in a Quest 3 HMD with eye tracking to experimentally test these predictions. We present a series of five experiments to explore the efficacy of this geometric framework and show that errors in perspective geometry can induce both under- and over-estimations in perceived distance. We further demonstrate how real-time visual feedback can be used to dynamically recalibrate visuomotor mapping so that an accurate reach distance is achieved even if the perceived visual distance is negatively impacted by geometric error.

Authors:Tolulope Oshinowo, Sohyeon Hwang, Amy X. Zhang, Andrés Monroy-Hernández
Title: Seeing the Politics of Decentralized Social Media Protocols
Abstract:
Calls to decentralize feed-based social media have been driven by concerns about the concentrated power of centralized platforms and their societal impact. In response, numerous decentralized social media protocols have emerged, each interpreting "decentralization" in different ways. We analyze four such protocols -- ActivityPub, AT Protocol, Nostr, and Farcaster -- to develop a novel conceptual framework for understanding how protocols operationalize decentralization. Drawing from protocol documentation, media coverage, and first-hand interviews with protocol developers and experts, we contextualize each protocol's approach within their respective socio-technical goals. Our framework highlights how control over key components is distributed differently across each protocol, shaping who holds power over what kinds of decisions. How components are arranged in relation to one another further impacts how component owners might offset each other's power in shaping social media. We argue that examining protocols as artifacts reveals how values shape infrastructure and power dynamics -- and that with a holistic framework as a guide, we can more effectively evaluate and design decentralized platforms aligned with the social and political futures we envision.

Authors:Tawfiq Ammari, Anna Gutowska, Jacob Ziff, Casey Randazzo, Harihan Subramonyam
Title: Retweets, Receipts, and Resistance: Discourse, Sentiment, and Credibility in Public Health Crisis Twitter
Abstract:
As the COVID-19 pandemic evolved, the Centers for Disease Control and Prevention (CDC) used Twitter to disseminate safety guidance and updates, reaching millions of users. This study analyzes two years of tweets from, to, and about the CDC using a mixed methods approach to examine discourse characteristics, credibility, and user engagement. We found that the CDCs communication remained largely one directional and did not foster reciprocal interaction, while discussions around COVID19 were deeply shaped by political and ideological polarization. Users frequently cited earlier CDC messages to critique new and sometimes contradictory guidance. Our findings highlight the role of sentiment, media richness, and source credibility in shaping the spread of public health messages. We propose design strategies to help the CDC tailor communications to diverse user groups and manage misinformation more effectively during high-stakes health crises.

Authors:Arooj Zaidi, Giulia Barbareschi, Kai Kunze, Yun Suen Pai, Junichi Yamaoka
Title: TIEboard: A Digital Educational Tool for Kids Geometric Learning
Abstract:
Tangible User Interfaces have shown potential in supporting the acquisition of key concepts in computing and mathematics while fostering engagement in young learners, but these approaches are less commonly utilised in the context of geometry. In this paper we introduce TIEboard, an interactive device to promote early learning of basic geometry concepts. TIEboard draws inspiration from traditional geoboards and lacing toys to leverage children's familiarity with these traditional tools. It employs instructional lights to guide children in creating shapes using colourful threads of optical fiber. The use of conductive materials allows the system to detect lacing activity and provide feedback in real-time. TIEboard incorporates six interaction modes of varying difficulty based on an incremental learning framework. The study evaluated TIEboard's effectiveness in supporting early geometric learning, facilitating creativity and promoting collaboration among 16 children aged 5-9.

Authors:Kristina Radivojevic, Caleb Reinking, Shaun Whitfield, Paul Brenner
Title: Public Discourse Sandbox: Facilitating Human and AI Digital Communication Research
Abstract:
Social media serves as a primary communication and information dissemination platform for major global events, entertainment, and niche or topically focused community discussions. Therefore, it represents a valuable resource for researchers who aim to understand numerous questions. However, obtaining data can be difficult, expensive, and often unreliable due to the presence of bots, fake accounts, and manipulated content. Additionally, there are ethical concerns if researchers decide to conduct an online experiment without explicitly notifying social media users about their intent. There is a need for more controlled and scalable mechanisms to evaluate the impacts of digital discussion interventions on audiences. We introduce the Public Discourse Sandbox (PDS), which serves as a digital discourse research platform for human-AI as well as AI-AI discourse research, testing, and training. PDS provides a safe and secure space for research experiments that are not viable on public, commercial social media platforms. Its main purpose is to enable the understanding of AI behaviors and the impacts of customized AI participants via techniques such as prompt engineering, retrieval-augmented generation (RAG), and fine-tuning. We provide a hosted live version of the sandbox to support researchers as well as the open-sourced code on GitHub for community collaboration and contribution.

Authors:Ding Xia, Xinyue Gui, Fan Gao, Dongyuan Li, Mark Colley, Takeo Igarashi
Title: Automating eHMI Action Design with LLMs for Automated Vehicle Communication
Abstract:
The absence of explicit communication channels between automated vehicles (AVs) and other road users requires the use of external Human-Machine Interfaces (eHMIs) to convey messages effectively in uncertain scenarios. Currently, most eHMI studies employ predefined text messages and manually designed actions to perform these messages, which limits the real-world deployment of eHMIs, where adaptability in dynamic scenarios is essential. Given the generalizability and versatility of large language models (LLMs), they could potentially serve as automated action designers for the message-action design task. To validate this idea, we make three contributions: (1) We propose a pipeline that integrates LLMs and 3D renderers, using LLMs as action designers to generate executable actions for controlling eHMIs and rendering action clips. (2) We collect a user-rated Action-Design Scoring dataset comprising a total of 320 action sequences for eight intended messages and four representative eHMI modalities. The dataset validates that LLMs can translate intended messages into actions close to a human level, particularly for reasoning-enabled LLMs. (3) We introduce two automated raters, Action Reference Score (ARS) and Vision-Language Models (VLMs), to benchmark 18 LLMs, finding that the VLM aligns with human preferences yet varies across eHMI modalities.

Authors:Samuel Rhys Cox, Rune Møberg Jacobsen, Niels van Berkel
Title: The Impact of a Chatbot's Ephemerality-Framing on Self-Disclosure Perceptions
Abstract:
Self-disclosure, the sharing of one's thoughts and feelings, is affected by the perceived relationship between individuals. While chatbots are increasingly used for self-disclosure, the impact of a chatbot's framing on users' self-disclosure remains under-explored. We investigated how a chatbot's description of its relationship with users, particularly in terms of ephemerality, affects self-disclosure. Specifically, we compared a Familiar chatbot, presenting itself as a companion remembering past interactions, with a Stranger chatbot, presenting itself as a new, unacquainted entity in each conversation. In a mixed factorial design, participants engaged with either the Familiar or Stranger chatbot in two sessions across two days, with one conversation focusing on Emotional- and another Factual-disclosure. When Emotional-disclosure was sought in the first chatting session, Stranger-condition participants felt more comfortable self-disclosing. However, when Factual-disclosure was sought first, these differences were replaced by more enjoyment among Familiar-condition participants. Qualitative findings showed Stranger afforded anonymity and reduced judgement, whereas Familiar sometimes felt intrusive unless rapport was built via low-risk Factual-disclosure.

Authors:Bhanuka Gamage, Thanh-Toan Do, Nicholas Seow Chiang Price, Arthur Lowery, Kim Marriott
Title: What do Blind and Low-Vision People Really Want from Assistive Smart Devices? Comparison of the Literature with a Focus Study
Abstract:
Over the last decade there has been considerable research into how artificial intelligence (AI), specifically computer vision, can assist people who are blind or have low-vision (BLV) to understand their environment. However, there has been almost no research into whether the tasks (object detection, image captioning, text recognition etc.) and devices (smartphones, smart-glasses etc.) investigated by researchers align with the needs and preferences of BLV people. We identified 646 studies published in the last two and a half years that have investigated such assistive AI techniques. We analysed these papers to determine the task, device and participation by BLV individuals. We then interviewed 24 BLV people and asked for their top five AI-based applications and to rank the applications found in the literature. We found only a weak positive correlation between BLV participants' perceived importance of tasks and researchers' focus and that participants prefer conversational agent interface and head-mounted devices.

Authors:Ugur Kursuncu, Trilok Padhi, Gaurav Sinha, Abdulkadir Erol, Jaya Krishna Mandivarapu, Christopher R. Larrison
Title: From Reddit to Generative AI: Evaluating Large Language Models for Anxiety Support Fine-tuned on Social Media Data
Abstract:
The growing demand for accessible mental health support, compounded by workforce shortages and logistical barriers, has led to increased interest in utilizing Large Language Models (LLMs) for scalable and real-time assistance. However, their use in sensitive domains such as anxiety support remains underexamined. This study presents a systematic evaluation of LLMs (GPT and Llama) for their potential utility in anxiety support by using real user-generated posts from the r/Anxiety subreddit for both prompting and fine-tuning. Our approach utilizes a mixed-method evaluation framework incorporating three main categories of criteria: (i) linguistic quality, (ii) safety and trustworthiness, and (iii) supportiveness. Results show that fine-tuning LLMs with naturalistic anxiety-related data enhanced linguistic quality but increased toxicity and bias, and diminished emotional responsiveness. While LLMs exhibited limited empathy, GPT was evaluated as more supportive overall. Our findings highlight the risks of fine-tuning LLMs on unprocessed social media content without mitigation strategies.

Authors:Jeba Rezwana, Corey Ford
Title: Human-Centered AI Communication in Co-Creativity: An Initial Framework and Insights
Abstract:
Effective communication between AI and humans is essential for successful human-AI co-creation. However, many current co-creative AI systems lack effective communication, which limits their potential for collaboration. This paper presents the initial design of the Framework for AI Communication (FAICO) for co-creative AI, developed through a systematic review of 107 full-length papers. FAICO presents key aspects of AI communication and their impact on user experience, offering preliminary guidelines for designing human-centered AI communication. To improve the framework, we conducted a preliminary study with two focus groups involving skilled individuals in AI, HCI, and design. These sessions sought to understand participants' preferences for AI communication, gather their perceptions of the framework, collect feedback for refinement, and explore its use in co-creative domains like collaborative writing and design. Our findings reveal a preference for a human-AI feedback loop over linear communication and emphasize the importance of context in fostering mutual understanding. Based on these insights, we propose actionable strategies for applying FAICO in practice and future directions, marking the first step toward developing comprehensive guidelines for designing effective human-centered AI communication in co-creation.

Authors:Charles Kiene, Sohyeon Hwang, Nathan TeBlunthuis, Carl Colglazier, Aaron Shaw, Benjamin Mako Hill
Title: The Relational Origins of Rules in Online Communities
Abstract:
Where do rules come from in online communities? While prior studies of online community governance in social computing have sought to characterize rules by their functions within communities and documented practices of rule enforcement, they have largely overlooked rule adoption and change. This study investigates how and why online communities adopt and change their rules. We conducted a grounded theory-based analysis of 40 in-depth interviews with community leaders from subreddits, Fandom wikis, and Fediverse servers, and identified seven processes involved in the adoption of online community rules. Our findings reveal that, beyond regulating behavior and solving functional intra-community problems, rules are also adopted and changed for relational reasons, such as signaling or reinforcing community legitimacy and identity to other communities. While rule change was often prompted by challenges during community growth or decline, change also depended on volunteer leaders' work capacity, the presence of member feedback mechanisms, and relational dynamics between leaders and members. The findings extend prior theories from social computing and organizational research, illustrating how institutionalist and ecological explanations of the relational origins of rules complement more functional accounts. The results also support design recommendations that integrate the relational aspects of rules and rulemaking to facilitate successful governance across communities' lifecycles.

Authors:Olivier Toubia, George Z. Gui, Tianyi Peng, Daniel J. Merlau, Ang Li, Haozhe Chen
Title: Twin-2K-500: A dataset for building digital twins of over 2,000 people based on their answers to over 500 questions
Abstract:
LLM-based digital twin simulation, where large language models are used to emulate individual human behavior, holds great promise for research in AI, social science, and digital experimentation. However, progress in this area has been hindered by the scarcity of real, individual-level datasets that are both large and publicly available. This lack of high-quality ground truth limits both the development and validation of digital twin methodologies. To address this gap, we introduce a large-scale, public dataset designed to capture a rich and holistic view of individual human behavior. We survey a representative sample of $N = 2,058$ participants (average 2.42 hours per person) in the US across four waves with 500 questions in total, covering a comprehensive battery of demographic, psychological, economic, personality, and cognitive measures, as well as replications of behavioral economics experiments and a pricing survey. The final wave repeats tasks from earlier waves to establish a test-retest accuracy baseline. Initial analyses suggest the data are of high quality and show promise for constructing digital twins that predict human behavior well at the individual and aggregate levels. By making the full dataset publicly available, we aim to establish a valuable testbed for the development and benchmarking of LLM-based persona simulations. Beyond LLM applications, due to its unique breadth and scale the dataset also enables broad social science research, including studies of cross-construct correlations and heterogeneous treatment effects.

Authors:Haoming Huang, Musen Zhang, Jianxin Yang, Zhen Li, Jinkai Li, Yao Guo
Title: MAGE: A Multi-task Architecture for Gaze Estimation with an Efficient Calibration Module
Abstract:
Eye gaze can provide rich information on human psychological activities, and has garnered significant attention in the field of Human-Robot Interaction (HRI). However, existing gaze estimation methods merely predict either the gaze direction or the Point-of-Gaze (PoG) on the screen, failing to provide sufficient information for a comprehensive six Degree-of-Freedom (DoF) gaze analysis in 3D space. Moreover, the variations of eye shape and structure among individuals also impede the generalization capability of these methods. In this study, we propose MAGE, a Multi-task Architecture for Gaze Estimation with an efficient calibration module, to predict the 6-DoF gaze information that is applicable for the real-word HRI. Our basic model encodes both the directional and positional features from facial images, and predicts gaze results with dedicated information flow and multiple decoders. To reduce the impact of individual variations, we propose a novel calibration module, namely Easy-Calibration, to fine-tune the basic model with subject-specific data, which is efficient to implement without the need of a screen. Experimental results demonstrate that our method achieves state-of-the-art performance on the public MPIIFaceGaze, EYEDIAP, and our built IMRGaze datasets.

Authors:Minjung Park, Jodi Forlizzi, John Zimmerman
Title: Exploring the Innovation Opportunities for Pre-trained Models
Abstract:
Innovators transform the world by understanding where services are successfully meeting customers' needs and then using this knowledge to identify failsafe opportunities for innovation. Pre-trained models have changed the AI innovation landscape, making it faster and easier to create new AI products and services. Understanding where pre-trained models are successful is critical for supporting AI innovation. Unfortunately, the hype cycle surrounding pre-trained models makes it hard to know where AI can really be successful. To address this, we investigated pre-trained model applications developed by HCI researchers as a proxy for commercially successful applications. The research applications demonstrate technical capabilities, address real user needs, and avoid ethical challenges. Using an artifact analysis approach, we categorized capabilities, opportunity domains, data types, and emerging interaction design patterns, uncovering some of the opportunity space for innovation with pre-trained models.

Authors:Mohammad Belal, Nguyen Luong, Talayeh Aledavood, Juhi Kulshrestha
Title: Stress Bytes: Decoding the Associations between Internet Use and Perceived Stress
Abstract:
In today's digital era, internet plays a pervasive role in our lives, influencing everyday activities such as communication, work, and leisure. This online engagement intertwines with offline experiences, shaping individuals' overall well-being. Despite its significance, existing research often falls short in capturing the relationship between internet use and well-being, relying primarily on isolated studies and self-reported data. One of the major contributors to deteriorated well-being - both physical and mental - is stress. While some research has examined the relationship between internet use and stress, both positive and negative associations have been reported. Our primary goal in this work is to identify the associations between an individual's internet use and their stress. For achieving our goal, we conducted a longitudinal multimodal study that spanned seven months. We combined fine-grained URL-level web browsing traces of 1490 German internet users with their sociodemographics and monthly measures of stress. Further, we developed a conceptual framework that allows us to simultaneously explore different contextual dimensions, including how, where, when, and by whom the internet is used. Our analysis revealed several associations between internet use and stress that vary by context. Social media, entertainment, online shopping, and gaming were positively associated with stress, while productivity, news, and adult content use were negatively associated. In the future, the behavioral markers we identified can pave the way for designing individualized tools for people to self-monitor and self-moderate their online behaviors to enhance their well-being, reducing the burden on already overburdened mental health services.

Authors:Hannah R. Nolasco, Andrew Vargo, Koichi Kise
Title: AI Solutionism and Digital Self-Tracking with Wearables
Abstract:
Self-tracking technologies and wearables automate the process of data collection and insight generation with the support of artificial intelligence systems, with many emerging studies exploring ways to evolve these features further through large-language models (LLMs). This is done with the intent to reduce capture burden and the cognitive stress of health-based decision making, but studies neglect to consider how automation has stymied the agency and independent reflection of users of self-tracking interventions. In this position paper, we explore the consequences of automation in self-tracking by relating it to our experiences with investigating the Oura Ring, a sleep wearable, and navigate potential remedies.

Authors:Ian Steenstra, Timothy W. Bickmore
Title: A Risk Ontology for Evaluating AI-Powered Psychotherapy Virtual Agents
Abstract:
The proliferation of Large Language Models (LLMs) and Intelligent Virtual Agents acting as psychotherapists presents significant opportunities for expanding mental healthcare access. However, their deployment has also been linked to serious adverse outcomes, including user harm and suicide, facilitated by a lack of standardized evaluation methodologies capable of capturing the nuanced risks of therapeutic interaction. Current evaluation techniques lack the sensitivity to detect subtle changes in patient cognition and behavior during therapy sessions that may lead to subsequent decompensation. We introduce a novel risk ontology specifically designed for the systematic evaluation of conversational AI psychotherapists. Developed through an iterative process including review of the psychotherapy risk literature, qualitative interviews with clinical and legal experts, and alignment with established clinical criteria (e.g., DSM-5) and existing assessment tools (e.g., NEQ, UE-ATR), the ontology aims to provide a structured approach to identifying and assessing user/patient harms. We provide a high-level overview of this ontology, detailing its grounding, and discuss potential use cases. We discuss four use cases in detail: monitoring real user interactions, evaluation with simulated patients, benchmarking and comparative analysis, and identifying unexpected outcomes. The proposed ontology offers a foundational step towards establishing safer and more responsible innovation in the domain of AI-driven mental health support.

Authors:Mai Lee Chang, Samantha Reig, Alicia, Lee, Anna Huang, Hugo Simão, Nara Han, Neeta M Khanuja, Abdullah Ubed Mohammad Ali, Rebekah Martinez, John Zimmerman, Jodi Forlizzi, Aaron Steinfeld
Title: Unremarkable to Remarkable AI Agent: Exploring Boundaries of Agent Intervention for Adults With and Without Cognitive Impairment
Abstract:
As the population of older adults increases, there is a growing need for support for them to age in place. This is exacerbated by the growing number of individuals struggling with cognitive decline and shrinking number of youth who provide care for them. Artificially intelligent agents could provide cognitive support to older adults experiencing memory problems, and they could help informal caregivers with coordination tasks. To better understand this possible future, we conducted a speed dating with storyboards study to reveal invisible social boundaries that might keep older adults and their caregivers from accepting and using agents. We found that healthy older adults worry that accepting agents into their homes might increase their chances of developing dementia. At the same time, they want immediate access to agents that know them well if they should experience cognitive decline. Older adults in the early stages of cognitive decline expressed a desire for agents that can ease the burden they saw themselves becoming for their caregivers. They also speculated that an agent who really knew them well might be an effective advocate for their needs when they were less able to advocate for themselves. That is, the agent may need to transition from being unremarkable to remarkable. Based on these findings, we present design opportunities and considerations for agents and articulate directions of future research.

Authors:Maggie Hughes, Cassandra Overney, Ashima Kamra, Jasmin Tepale, Elizabeth Hamby, Mahmood Jasim, Deb Roy
Title: Voice to Vision: Enhancing Civic Decision-Making through Co-Designed Data Infrastructure
Abstract:
Trust and transparency in civic decision-making processes, like neighborhood planning, are eroding as community members frequently report sending feedback "into a void" without understanding how, or whether, their input influences outcomes. To address this gap, we introduce Voice to Vision, a sociotechnical system that bridges community voices and planning outputs through a structured yet flexible data infrastructure and complementary interfaces for both community members and planners. Through a five-month iterative design process with 21 stakeholders and subsequent field evaluation involving 24 participants, we examine how this system facilitates shared understanding across the civic ecosystem. Our findings reveal that while planners value systematic sensemaking tools that find connections across diverse inputs, community members prioritize seeing themselves reflected in the process, discovering patterns within feedback, and observing the rigor behind decisions, while emphasizing the importance of actionable outcomes. We contribute insights into participatory design for civic contexts, a complete sociotechnical system with an interoperable data structure for civic decision-making, and empirical findings that inform how digital platforms can promote shared understanding among elected or appointed officials, planners, and community members by enhancing transparency and legitimacy.

Authors:Mengyao Guo, Jinda Han, Ze Gao, Yuan Zhuang, Xingting Wu
Title: Human and Machine as Seen at the Co-Creation Age: A Co-Word Analysis in Human Machine Co-creation (2014-2024)
Abstract:
This paper explores the evolving landscape of human-machine co-creation, focusing on its development in the context of the ACM Conference on Human Factors in Computing Systems (CHI) from 2014 to 2024. We employ co-word analysis to identify emerging trends, central themes, and the intellectual trajectory of this field. The study highlights the shift from viewing machines as mere tools to recognizing them as collaborative partners in creative processes. By understanding these dynamics, we aim to provide insights into the implications of this paradigm shift for creativity, innovation, and societal impact, ultimately fostering a more inclusive and effective approach to human-machine interaction in various domains.

Authors:Oier Mentxaka, Natalia Díaz-Rodríguez, Mark Coeckelbergh, Marcos López de Prado, Emilia Gómez, David Fernández Llorca, Enrique Herrera-Viedma, Francisco Herrera
Title: Aligning Trustworthy AI with Democracy: A Dual Taxonomy of Opportunities and Risks
Abstract:
Artificial Intelligence (AI) poses both significant risks and valuable opportunities for democratic governance. This paper introduces a dual taxonomy to evaluate AI's complex relationship with democracy: the AI Risks to Democracy (AIRD) taxonomy, which identifies how AI can undermine core democratic principles such as autonomy, fairness, and trust; and the AI's Positive Contributions to Democracy (AIPD) taxonomy, which highlights AI's potential to enhance transparency, participation, efficiency, and evidence-based policymaking. Grounded in the European Union's approach to ethical AI governance, and particularly the seven Trustworthy AI requirements proposed by the European Commission's High-Level Expert Group on AI, each identified risk is aligned with mitigation strategies based on EU regulatory and normative frameworks. Our analysis underscores the transversal importance of transparency and societal well-being across all risk categories and offers a structured lens for aligning AI systems with democratic values. By integrating democratic theory with practical governance tools, this paper offers a normative and actionable framework to guide research, regulation, and institutional design to support trustworthy, democratic AI. It provides scholars with a conceptual foundation to evaluate the democratic implications of AI, equips policymakers with structured criteria for ethical oversight, and helps technologists align system design with democratic principles. In doing so, it bridges the gap between ethical aspirations and operational realities, laying the groundwork for more inclusive, accountable, and resilient democratic systems in the algorithmic age.

Authors:Max Grobbel, Daniel Flögel, Philipp Rigoll, Sören Hohmann
Title: Disentangling Coordiante Frames for Task Specific Motion Retargeting in Teleoperation using Shared Control and VR Controllers
Abstract:
Task performance in terms of task completion time in teleoperation is still far behind compared to humans conducting tasks directly. One large identified impact on this is the human capability to perform transformations and alignments, which is directly influenced by the point of view and the motion retargeting strategy. In modern teleoperation systems, motion retargeting is usually implemented through a one time calibration or switching modes. Complex tasks, like concatenated screwing, might be difficult, because the operator has to align (e.g. mirror) rotational and translational input commands. Recent research has shown, that the separation of translation and rotation leads to increased task performance. This work proposes a formal motion retargeting method, which separates translational and rotational input commands. This method is then included in a optimal control based trajectory planner and shown to work on a UR5e manipulator.

Authors:Axel Abels, Tom Lenaerts
Title: Wisdom from Diversity: Bias Mitigation Through Hybrid Human-LLM Crowds
Abstract:
Despite their performance, large language models (LLMs) can inadvertently perpetuate biases found in the data they are trained on. By analyzing LLM responses to bias-eliciting headlines, we find that these models often mirror human biases. To address this, we explore crowd-based strategies for mitigating bias through response aggregation. We first demonstrate that simply averaging responses from multiple LLMs, intended to leverage the "wisdom of the crowd", can exacerbate existing biases due to the limited diversity within LLM crowds. In contrast, we show that locally weighted aggregation methods more effectively leverage the wisdom of the LLM crowd, achieving both bias mitigation and improved accuracy. Finally, recognizing the complementary strengths of LLMs (accuracy) and humans (diversity), we demonstrate that hybrid crowds containing both significantly enhance performance and further reduce biases across ethnic and gender-related contexts.

Authors:Yihe Yan, Zhenguo Shi, Yanxiang Wang, Cheng Jiang, Chun Tung Chou, Wen Hu
Title: mmMirror: Device Free mmWave Indoor NLoS Localization Using Van-Atta-Array IRS
Abstract:
Industry 4.0 is transforming manufacturing and logistics by integrating robots into shared human environments, such as factories, warehouses, and healthcare facilities. However, the risk of human-robot collisions, especially in Non-Line-of-Sight (NLoS) scenarios like around corners, remains a critical challenge. Existing solutions, such as vision-based and LiDAR systems, often fail under occlusion, lighting constraints, or privacy concerns, while RF-based systems are limited by range and accuracy. To address these limitations, we propose mmMirror, a novel system leveraging a Van Atta Array-based millimeter-wave (mmWave) reconfigurable intelligent reflecting surface (IRS) for precise, device-free NLoS localization. mmMirror integrates seamlessly with existing frequency-modulated continuous-wave (FMCW) radars and offers: (i) robust NLoS localization with centimeter-level accuracy at ranges up to 3 m, (ii) seamless uplink and downlink communication between radar and IRS, (iii) support for multi-radar and multi-target scenarios via dynamic beam steering, and (iv) reduced scanning latency through adaptive time slot allocation. Implemented using commodity 24 GHz radars and a PCB-based IRS prototype, mmMirror demonstrates its potential in enabling safe human-robot interactions in dynamic and complex environments.

Authors:Anja Heim, Thomas Lang, Christoph Heinzl
Title: Exploring Large Quantities of Secondary Data from High-Resolution Synchrotron X-ray Computed Tomography Scans Using AccuStripes
Abstract:
The analysis of secondary quantitative data extracted from high-resolution synchrotron X-ray computed tomography scans represents a significant challenge for users. While a number of methods have been introduced for processing large three-dimensional images in order to generate secondary data, there are only a few techniques available for simple and intuitive visualization of such data in their entirety. This work employs the AccuStripes visualization technique for that purpose, which enables the visual analysis of secondary data represented by an ensemble of univariate distributions. It supports different schemes for adaptive histogram binnings in combination with several ways of rendering aggregated data and it allows the interactive selection of optimal visual representations depending on the data and the use case. We demonstrate the usability of AccuStripes on a high-resolution synchrotron scan of a particle-reinforced metal matrix composite sample, containing more than 20 million particles. Through AccuStripes, detailed insights are facilitated into distributions of derived particle characteristics of the entire sample. Furthermore, research questions such as how the overall shape of the particles is or how homogeneously they are distributed across the sample can be answered.

Authors:Botao Amber Hu, Yuhan Liu, Helena Rong
Title: Trustless Autonomy: Understanding Motivations, Benefits, and Governance Dilemmas in Self-Sovereign Decentralized AI Agents
Abstract:
The recent trend of self-sovereign Decentralized AI Agents (DeAgents) combines Large Language Model (LLM)-based AI agents with decentralization technologies such as blockchain smart contracts and trusted execution environments (TEEs). These tamper-resistant trustless substrates allow agents to achieve self-sovereignty through ownership of cryptowallet private keys and control of digital assets and social media accounts. DeAgents eliminate centralized control and reduce human intervention, addressing key trust concerns inherent in centralized AI systems. This contributes to social computing by enabling new human cooperative paradigm "intelligence as commons." However, given ongoing challenges in LLM reliability such as hallucinations, this creates paradoxical tension between trustlessness and unreliable autonomy. This study addresses this empirical research gap through interviews with DeAgents stakeholders-experts, founders, and developers-to examine their motivations, benefits, and governance dilemmas. The findings will guide future DeAgents system and protocol design and inform discussions about governance in sociotechnical AI systems in the future agentic web.

Authors:Yumou Wei, Paulo Carvalho, John Stamper
Title: Small but Significant: On the Promise of Small Language Models for Accessible AIED
Abstract:
GPT has become nearly synonymous with large language models (LLMs), an increasingly popular term in AIED proceedings. A simple keyword-based search reveals that 61% of the 76 long and short papers presented at AIED 2024 describe novel solutions using LLMs to address some of the long-standing challenges in education, and 43% specifically mention GPT. Although LLMs pioneered by GPT create exciting opportunities to strengthen the impact of AI on education, we argue that the field's predominant focus on GPT and other resource-intensive LLMs (with more than 10B parameters) risks neglecting the potential impact that small language models (SLMs) can make in providing resource-constrained institutions with equitable and affordable access to high-quality AI tools. Supported by positive results on knowledge component (KC) discovery, a critical challenge in AIED, we demonstrate that SLMs such as Phi-2 can produce an effective solution without elaborate prompting strategies. Hence, we call for more attention to developing SLM-based AIED approaches.

Authors:Jessica Y. Bo, Majeed Kazemitabaar, Emma Zhuang, Ashton Anderson
Title: Who's the Leader? Analyzing Novice Workflows in LLM-Assisted Debugging of Machine Learning Code
Abstract:
While LLMs are often touted as tools for democratizing specialized knowledge to beginners, their actual effectiveness for improving task performance and learning is still an open question. It is known that novices engage with LLMs differently from experts, with prior studies reporting meta-cognitive pitfalls that affect novices' ability to verify outputs and prompt effectively. We focus on a task domain, machine learning (ML), which embodies both high complexity and low verifiability to understand the impact of LLM assistance on novices. Provided a buggy ML script and open access to ChatGPT, we conduct a formative study with eight novice ML engineers to understand their reliance on, interactions with, and perceptions of the LLM. We find that user actions can be roughly categorized into leading the LLM and led-by the LLM, and further investigate how they affect reliance outcomes like over- and under-reliance. These results have implications on novices' cognitive engagement in LLM-assisted tasks and potential negative effects on downstream learning. Lastly, we pose potential augmentations to the novice-LLM interaction paradigm to promote cognitive engagement.

Authors:Djamel Laps-Bouraba, Markus Zajac, Uta Störl
Title: QC-Adviser: Quantum Hardware Recommendations for Solving Industrial Optimization Problems
Abstract:
The availability of quantum hardware via the cloud offers opportunities for new approaches to computing optimization problems in an industrial environment. However, selecting the right quantum hardware is difficult for non-experts due to its technical characteristics. In this paper, we present the QC-Adviser prototype, which supports users in selecting suitable quantum annealer hardware without requiring quantum computing knowledge.

Authors:Kayhan Latifzadeh, Luis A. Leiva
Title: Thalamus: A User Simulation Toolkit for Prototyping Multimodal Sensing Studies
Abstract:
Conducting user studies that involve physiological and behavioral measurements is very time-consuming and expensive, as it not only involves a careful experiment design, device calibration, etc. but also a careful software testing. We propose Thalamus, a software toolkit for collecting and simulating multimodal signals that can help the experimenters to prepare in advance for unexpected situations before reaching out to the actual study participants and even before having to install or purchase a specific device. Among other features, Thalamus allows the experimenter to modify, synchronize, and broadcast physiological signals (as coming from various data streams) from different devices simultaneously and not necessarily located in the same place. Thalamus is cross-platform, cross-device, and simple to use, making it thus a valuable asset for HCI research.

Authors:Tong Zhang, Fenghua Shao, Runsheng Zhang, Yifan Zhuang, Liuqingqing Yang
Title: DeepSORT-Driven Visual Tracking Approach for Gesture Recognition in Interactive Systems
Abstract:
Based on the DeepSORT algorithm, this study explores the application of visual tracking technology in intelligent human-computer interaction, especially in the field of gesture recognition and tracking. With the rapid development of artificial intelligence and deep learning technology, visual-based interaction has gradually replaced traditional input devices and become an important way for intelligent systems to interact with users. The DeepSORT algorithm can achieve accurate target tracking in dynamic environments by combining Kalman filters and deep learning feature extraction methods. It is especially suitable for complex scenes with multi-target tracking and fast movements. This study experimentally verifies the superior performance of DeepSORT in gesture recognition and tracking. It can accurately capture and track the user's gesture trajectory and is superior to traditional tracking methods in terms of real-time and accuracy. In addition, this study also combines gesture recognition experiments to evaluate the recognition ability and feedback response of the DeepSORT algorithm under different gestures (such as sliding, clicking, and zooming). The experimental results show that DeepSORT can not only effectively deal with target occlusion and motion blur but also can stably track in a multi-target environment, achieving a smooth user interaction experience. Finally, this paper looks forward to the future development direction of intelligent human-computer interaction systems based on visual tracking and proposes future research focuses such as algorithm optimization, data fusion, and multimodal interaction in order to promote a more intelligent and personalized interactive experience. Keywords-DeepSORT, visual tracking, gesture recognition, human-computer interaction

Authors:Yumou Wei, Paulo Carvalho, John Stamper
Title: KCluster: An LLM-based Clustering Approach to Knowledge Component Discovery
Abstract:
Educators evaluate student knowledge using knowledge component (KC) models that map assessment questions to KCs. Still, designing KC models for large question banks remains an insurmountable challenge for instructors who need to analyze each question by hand. The growing use of Generative AI in education is expected only to aggravate this chronic deficiency of expert-designed KC models, as course engineers designing KCs struggle to keep up with the pace at which questions are generated. In this work, we propose KCluster, a novel KC discovery algorithm based on identifying clusters of congruent questions according to a new similarity metric induced by a large language model (LLM). We demonstrate in three datasets that an LLM can create an effective metric of question similarity, which a clustering algorithm can use to create KC models from questions with minimal human effort. Combining the strengths of LLM and clustering, KCluster generates descriptive KC labels and discovers KC models that predict student performance better than the best expert-designed models available. In anticipation of future work, we illustrate how KCluster can reveal insights into difficult KCs and suggest improvements to instruction.

Authors:Xiangzhe Yuan, Jiajun Wang, Qian Wan, Siying Hu
Title: A Day in Their Shoes: Using LLM-Based Perspective-Taking Interactive Fiction to Reduce Stigma Toward Dirty Work
Abstract:
Occupations referred to as "dirty work" often face entrenched social stigma, which adversely affects the mental health of workers in these fields and impedes occupational equity. In this study, we propose a novel Interactive Fiction (IF) framework powered by Large Language Models (LLMs) to encourage perspective-taking and reduce biases against these stigmatized yet essential roles. Through an experiment with participants (n = 100) across four such occupations, we observed a significant increase in participants' understanding of these occupations, as well as a high level of empathy and a strong sense of connection to individuals in these roles. Additionally, qualitative interviews with participants (n = 15) revealed that the LLM-based perspective-taking IF enhanced immersion, deepened emotional resonance and empathy toward "dirty work," and allowed participants to experience a sense of professional fulfillment in these occupations. However, participants also highlighted ongoing challenges, such as limited contextual details generated by the LLM and the unintentional reinforcement of existing stereotypes. Overall, our findings underscore that an LLM-based perspective-taking IF framework offers a promising and scalable strategy for mitigating stigma and promoting social equity in marginalized professions.

Authors:Agnese Chiatti, Sara Bernardini, Lara Shibelski Godoy Piccolo, Viola Schiaffonati, Matteo Matteucci
Title: Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects
Abstract:
The rapid adoption of Vision Language Models (VLMs), pre-trained on large image-text and video-text datasets, calls for protecting and informing users about when to trust these systems. This survey reviews studies on trust dynamics in user-VLM interactions, through a multi-disciplinary taxonomy encompassing different cognitive science capabilities, collaboration modes, and agent behaviours. Literature insights and findings from a workshop with prospective VLM users inform preliminary requirements for future VLM trust studies.

Authors:Elizabeth Ankrah, Stephanie Nyairo, Mercy Muchai, Kagonya Awori, Millicent Ochieng, Mark Kariuki, Jacki O'Neill
Title: Dukawalla: Voice Interfaces for Small Businesses in Africa
Abstract:
Small and medium sized businesses often struggle with data driven decision making do to a lack of advanced analytics tools, especially in African countries where they make up a majority of the workforce. Though many tools exist they are not designed to fit into the ways of working of SMB workers who are mobile first, have limited time to learn new workflows, and for whom social and business are tightly coupled. To address this, the Dukawalla prototype was created. This intelligent assistant bridges the gap between raw business data, and actionable insights by leveraging voice interaction and the power of generative AI. Dukawalla provides an intuitive way for business owners to interact with their data, aiding in informed decision making. This paper examines Dukawalla's deployment across SMBs in Nairobi, focusing on their experiences using this voice based assistant to streamline data collection and provide business insights

Authors:Jessica Y. Bo, Tianyu Xu, Ishan Chatterjee, Katrina Passarella-Ward, Achin Kulshrestha, D Shin
Title: Steerable Chatbots: Personalizing LLMs with Preference-Based Activation Steering
Abstract:
As large language models (LLMs) improve in their capacity to serve as personal AI assistants, their ability to output uniquely tailored, personalized responses that align with the soft preferences of their users is essential for enhancing user satisfaction and retention. However, untrained lay users have poor prompt specification abilities and often struggle with conveying their latent preferences to AI assistants. To address this, we leverage activation steering to guide LLMs to align with interpretable preference dimensions during inference. In contrast to memory-based personalization methods that require longer user history, steering is extremely lightweight and can be easily controlled by the user via an linear strength factor. We embed steering into three different interactive chatbot interfaces and conduct a within-subjects user study (n=14) to investigate how end users prefer to personalize their conversations. The results demonstrate the effectiveness of preference-based steering for aligning real-world conversations with hidden user preferences, and highlight further insights on how diverse values around control, usability, and transparency lead users to prefer different interfaces.

Authors:Tobias Hallmen, Kathrin Gietl, Karoline Hillesheim, Moritz Bauermann, Annemarie Friedrich, Elisabeth André
Title: AI-Based Feedback in Counselling Competence Training of Prospective Teachers
Abstract:
This study explores the use of AI-based feedback to enhance the counselling competence of prospective teachers. An iterative block seminar was designed, incorporating theoretical foundations, practical applications, and AI tools for analysing verbal, paraverbal, and nonverbal communication. The seminar included recorded simulated teacher-parent conversations, followed by AI-based feedback and qualitative interviews with students. The study investigated correlations between communication characteristics and conversation quality, student perceptions of AI-based feedback, and the training of AI models to identify conversation phases and techniques. Results indicated significant correlations between nonverbal and paraverbal features and conversation quality, and students positively perceived the AI feedback. The findings suggest that AI-based feedback can provide objective, actionable insights to improve teacher training programs. Future work will focus on refining verbal skill annotations, expanding the dataset, and exploring additional features to enhance the feedback system.

Authors:Yiheng Bian, Yunpeng Song, Guiyu Ma, Rongrong Zhu, Zhongmin Cai
Title: DroidRetriever: An Autonomous Navigation and Information Integration System Facilitating Mobile Sensemaking
Abstract:
Users regularly rely on mobile applications for their daily information needs, and mobile sensemaking is prevalent in various domains such as education, healthcare, business intelligence, and emergency response, where timely and context-aware information-processing and decision-making is critical. However, valuable information is often scattered across the closed ecosystems within various applications, posing challenges for traditional search engines to retrieve data openly and in real-time. Additionally, due to limitations such as mobile device screen sizes, language differences, and unfamiliarity with specific applications and domain knowledge, users have to frequently switch between multiple applications and spend substantial time locating and integrating the information. To address these challenges, we present DroidRetriever, a system for cross-application information retrieval to facilitate mobile sensemaking. DroidRetriever can automatically navigate to relevant interfaces based on users' natural language commands, capture screenshots, extract and integrate information, and finally present the results. Our experimental results demonstrate that DroidRetriever can extract and integrate information with near-human accuracy while significantly reducing processing time. Furthermore, with minimal user intervention, DroidRetriever effectively corrects and completes various information retrieval tasks, substantially reducing the user's workload. Our summary of the motivations for intervention and the discussion of their necessity provide valuable implications for future research. We will open-source our code upon acceptance of the paper.

Authors:Zihao Zhu, Ao Yu, Xin Tong, Pan Hui
Title: Exploring LLM-Powered Role and Action-Switching Pedagogical Agents for History Education in Virtual Reality
Abstract:
Multi-role pedagogical agents can create engaging and immersive learning experiences, helping learners better understand knowledge in history learning. However, existing pedagogical agents often struggle with multi-role interactions due to complex controls, limited feedback forms, and difficulty dynamically adapting to user inputs. In this study, we developed a VR prototype with LLM-powered adaptive role-switching and action-switching pedagogical agents to help users learn about the history of the Pavilion of Prince Teng. A 2 x 2 between-subjects study was conducted with 84 participants to assess how adaptive role-switching and action-switching affect participants' learning outcomes and experiences. The results suggest that adaptive role-switching enhances participants' perception of the pedagogical agent's trustworthiness and expertise but may lead to inconsistent learning experiences. Adaptive action-switching increases participants' perceived social presence, expertise, and humanness. The study did not uncover any effects of role-switching and action-switching on usability, learning motivation, and cognitive load. Based on the findings, we proposed five design implications for incorporating adaptive role-switching and action-switching into future VR history education tools.

Authors:Valentin Foucher, Santiago de Leon-Martinez, Robert Moro
Title: Eye Movements as Indicators of Deception: A Machine Learning Approach
Abstract:
Gaze may enhance the robustness of lie detectors but remains under-studied. This study evaluated the efficacy of AI models (using fixations, saccades, blinks, and pupil size) for detecting deception in Concealed Information Tests across two datasets. The first, collected with Eyelink 1000, contains gaze data from a computerized experiment where 87 participants revealed, concealed, or faked the value of a previously selected card. The second, collected with Pupil Neon, involved 36 participants performing a similar task but facing an experimenter. XGBoost achieved accuracies up to 74% in a binary classification task (Revealing vs. Concealing) and 49% in a more challenging three-classification task (Revealing vs. Concealing vs. Faking). Feature analysis identified saccade number, duration, amplitude, and maximum pupil size as the most important for deception prediction. These results demonstrate the feasibility of using gaze and AI to enhance lie detectors and encourage future research that may improve on this.

Authors:Ryan Louie, Ifdita Hasan Orney, Juan Pablo Pacheco, Raj Sanjay Shah, Emma Brunskill, Diyi Yang
Title: Can LLM-Simulated Practice and Feedback Upskill Human Counselors? A Randomized Study with 90+ Novice Counselors
Abstract:
Training more counselors, from clinical students to peer supporters, can help meet the demand for accessible mental health support; however, current training approaches remain resource-intensive and difficult to scale effectively. Large Language Models (LLMs) offer promising solutions for growing counseling skills training through simulated practice and automated feedback. Despite successes in aligning LLMs with expert-counselor annotations, we do not know whether LLM-based counseling training tools -- such as AI patients that simulate real-world challenges and generative AI feedback with suggested alternatives and rationales -- actually lead to improvements in novice counselor skill development. We develop CARE, an LLM-simulated practice and feedback system, and randomize 94 novice counselors to practice using an AI patient, either alone or with AI feedback, measuring changes in their behavioral performance, self-assessments, and qualitative learning takeaways. Our results show the practice-and-feedback group improved in their use of reflections and questions (d=0.32-0.39, p$<$0.05). In contrast, the group that practiced with an AI patient alone did not show improvements, and in the case of empathy, actually had worse uses across time (d=$-$0.52, p=0.001) and when compared against the practice-and-feedback group (d=0.72, p=0.001). Participants' qualitative self-reflections revealed key differences: the practice-and-feedback group adopted a client-centered approach involving listening to and validating feelings, while the practice-alone group remained solution-oriented but delayed offering suggestions until gathering more information. Overall, these results suggest that LLM-based training systems can promote effective skill development, but that combining both simulated practice and structured feedback is critical.

Authors:Hirotaka Hiraki, Jun Rekimoto
Title: MaskClip: Detachable Clip-on Piezoelectric Sensing of Mask Surface Vibrations for Real-time Noise-Robust Speech Input
Abstract:
Masks are essential in medical settings and during infectious outbreaks but significantly impair speech communication, especially in environments with background noise. Existing solutions often require substantial computational resources or compromise hygiene and comfort. We propose a novel sensing approach that captures only the wearer's voice by detecting mask surface vibrations using a piezoelectric sensor. Our developed device, MaskClip, employs a stainless steel clip with an optimally positioned piezoelectric sensor to selectively capture speech vibrations while inherently filtering out ambient noise. Evaluation experiments demonstrated superior performance with a low Character Error Rate of 6.1\% in noisy environments compared to conventional microphones. Subjective evaluations by 102 participants also showed high satisfaction scores. This approach shows promise for applications in settings where clear voice communication must be maintained while wearing protective equipment, such as medical facilities, cleanrooms, and industrial environments.

Authors:Samuel Rhys Cox, Helena Bøjer Djernæs, Niels van Berkel
Title: Beyond Productivity: Rethinking the Impact of Creativity Support Tools
Abstract:
Creativity Support Tools (CSTs) are widely used across diverse creative domains, with generative AI recently increasing the abilities of CSTs. To better understand how the success of CSTs is determined in the literature, we conducted a review of outcome measures used in CST evaluations. Drawing from (n=173) CST evaluations in the ACM Digital Library, we identified the metrics commonly employed to assess user interactions with CSTs. Our findings reveal prevailing trends in current evaluation practices, while exposing underexplored measures that could broaden the scope of future research. Based on these results, we argue for a more holistic approach to evaluating CSTs, encouraging the HCI community to consider not only user experience and the quality of the generated output, but also user-centric aspects such as self-reflection and well-being as critical dimensions of assessment. We also highlight a need for validated measures specifically suited to the evaluation of generative AI in CSTs.

Authors:Sachin R. Pendse, Darren Gergle, Rachel Kornfield, Jonah Meyerhoff, David Mohr, Jina Suh, Annie Wescott, Casey Williams, Jessica Schleider
Title: When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines
Abstract:
Red-teaming is a core part of the infrastructure that ensures that AI models do not produce harmful content. Unlike past technologies, the black box nature of generative AI systems necessitates a uniquely interactional mode of testing, one in which individuals on red teams actively interact with the system, leveraging natural language to simulate malicious actors and solicit harmful outputs. This interactional labor done by red teams can result in mental health harms that are uniquely tied to the adversarial engagement strategies necessary to effectively red team. The importance of ensuring that generative AI models do not propagate societal or individual harm is widely recognized -- one less visible foundation of end-to-end AI safety is also the protection of the mental health and wellbeing of those who work to keep model outputs safe. In this paper, we argue that the unmet mental health needs of AI red-teamers is a critical workplace safety concern. Through analyzing the unique mental health impacts associated with the labor done by red teams, we propose potential individual and organizational strategies that could be used to meet these needs, and safeguard the mental health of red-teamers. We develop our proposed strategies through drawing parallels between common red-teaming practices and interactional labor common to other professions (including actors, mental health professionals, conflict photographers, and content moderators), describing how individuals and organizations within these professional spaces safeguard their mental health given similar psychological demands. Drawing on these protective practices, we describe how safeguards could be adapted for the distinct mental health challenges experienced by red teaming organizations as they mitigate emerging technological risks on the new digital frontlines.

Authors:Angelika Kothe, Volker Hohmann, Giso Grimm
Title: Effect of Avatar Head Movement on Communication Behaviour, Experience of Presence and Conversation Success in Triadic Conversations
Abstract:
Interactive communication in virtual reality can be used in experimental paradigms to increase the ecological validity of hearing device evaluations. This requires the virtual environment to elicit natural communication behaviour in listeners. This study evaluates the effect of virtual animated characters' head movements on participants' communication behaviour and experience. Triadic conversations were conducted between a test participant and two confederates. To facilitate the manipulation of head movements, the conversation was conducted in telepresence using a system that transmitted audio, head movement data and video with low delay. The confederates were represented by virtual animated characters (avatars) with different levels of animation: Static heads, automated head movement animations based on speech level onsets, and animated head movements based on the transmitted head movements of the interlocutors. A condition was also included in which the videos of the interlocutors' heads were embedded in the visual scene. The results show significant effects of animation level on the participants' speech and head movement behaviour as recorded by physical sensors, as well as on the subjective sense of presence and the success of the conversation. The largest effects were found for the range of head orientation during speech and the perceived realism of avatars. Participants reported that they were spoken to in a more helpful way when the avatars showed head movements transmitted from the interlocutors than when the avatars' heads were static. We therefore conclude that the representation of interlocutors must include sufficiently realistic head movements in order to elicit natural communication behaviour.

Authors:Daniel Gaspar-Figueiredo, Marta Fernández-Diego, Silvia Abrahão, Emilio Insfran
Title: Integrating Human Feedback into a Reinforcement Learning-Based Framework for Adaptive User Interfaces
Abstract:
Adaptive User Interfaces (AUI) play a crucial role in modern software applications by dynamically adjusting interface elements to accommodate users' diverse and evolving needs. However, existing adaptation strategies often lack real-time responsiveness. Reinforcement Learning (RL) has emerged as a promising approach for addressing complex, sequential adaptation challenges, enabling adaptive systems to learn optimal policies based on previous adaptation experiences. Although RL has been applied to AUIs,integrating RL agents effectively within user interactions remains a challenge. In this paper, we enhance a RL-based Adaptive User Interface adaption framework by incorporating personalized human feedback directly into the leaning process. Unlike prior approaches that rely on a single pre-trained RL model, our approach trains a unique RL agent for each user, allowing individuals to actively shape their personal RL agent's policy, potentially leading to more personalized and responsive UI adaptations. To evaluate this approach, we conducted an empirical study to assess the impact of integrating human feedback into the RL-based Adaptive User Interface adaption framework and its effect on User Experience (UX). The study involved 33 participants interacting with AUIs incorporating human feedback and non-adaptive user interfaces in two domains: an e-learning platform and a trip-planning application. The results suggest that incorporating human feedback into RL-driven adaptations significantly enhances UX, offering promising directions for advancing adaptive capabilities and user-centered design in AUIs.

Authors:Ilya Zakharov, Ekaterina Koshchenko, Agnia Sergeyuk
Title: AI in Software Engineering: Perceived Roles and Their Impact on Adoption
Abstract:
This paper investigates how developers conceptualize AI-powered Development Tools and how these role attributions influence technology acceptance. Through qualitative analysis of 38 interviews and a quantitative survey with 102 participants, we identify two primary Mental Models: AI as an inanimate tool and AI as a human-like teammate. Factor analysis further groups AI roles into Support Roles (e.g., assistant, reference guide) and Expert Roles (e.g., advisor, problem solver). We find that assigning multiple roles to AI correlates positively with Perceived Usefulness and Perceived Ease of Use, indicating that diverse conceptualizations enhance AI adoption. These insights suggest that AI4SE tools should accommodate varying user expectations through adaptive design strategies that align with different Mental Models.

Authors:Daye Nam, Ahmed Omran, Ambar Murillo, Saksham Thakur, Abner Araujo, Marcel Blistein, Alexander Frömmgen, Vincent Hellendoorn, Satish Chandra
Title: Prompting LLMs for Code Editing: Struggles and Remedies
Abstract:
Large Language Models (LLMs) are rapidly transforming software engineering, with coding assistants embedded in an IDE becoming increasingly prevalent. While research has focused on improving the tools and understanding developer perceptions, a critical gap exists in understanding how developers actually use these tools in their daily workflows, and, crucially, where they struggle. This paper addresses part of this gap through a multi-phased investigation of developer interactions with an LLM-powered code editing and transformation feature, Transform Code, in an IDE widely used at Google. First, we analyze telemetry logs of the feature usage, revealing that frequent re-prompting can be an indicator of developer struggles with using Transform Code. Second, we conduct a qualitative analysis of unsatisfactory requests, identifying five key categories of information often missing from developer prompts. Finally, based on these findings, we propose and evaluate a tool, AutoPrompter, for automatically improving prompts by inferring missing information from the surrounding code context, leading to a 27% improvement in edit correctness on our test set.

Authors:Indrajeet Ghosh, Kasthuri Jayarajah, Nicholas Waytowich, Nirmalya Roy
Title: Memento: Augmenting Personalized Memory via Practical Multimodal Wearable Sensing in Visual Search and Wayfinding Navigation
Abstract:
Working memory involves the temporary retention of information over short periods. It is a critical cognitive function that enables humans to perform various online processing tasks, such as dialing a phone number, recalling misplaced items' locations, or navigating through a store. However, inherent limitations in an individual's capacity to retain information often result in forgetting important details during such tasks. Although previous research has successfully utilized wearable and assistive technologies to enhance long-term memory functions (e.g., episodic memory), their application to supporting short-term recall in daily activities remains underexplored. To address this gap, we present Memento, a framework that uses multimodal wearable sensor data to detect significant changes in cognitive state and provide intelligent in situ cues to enhance recall. Through two user studies involving 15 and 25 participants in visual search navigation tasks, we demonstrate that participants receiving visual cues from Memento achieved significantly better route recall, improving approximately 20-23% compared to free recall. Furthermore, Memento reduced cognitive load and review time by 46% while also substantially reducing computation time (3.86 seconds vs. 15.35 seconds), offering an average of 75% effectiveness compared to computer vision-based cue selection approaches.

Authors:Alexander Htet Kyaw, Se Hwan Jeon, Miana Smith, Neil Gershenfeld
Title: Making Physical Objects with Generative AI and Robotic Assembly: Considering Fabrication Constraints, Sustainability, Time, Functionality, and Accessibility
Abstract:
3D generative AI enables rapid and accessible creation of 3D models from text or image inputs. However, translating these outputs into physical objects remains a challenge due to the constraints in the physical world. Recent studies have focused on improving the capabilities of 3D generative AI to produce fabricable outputs, with 3D printing as the main fabrication method. However, this workshop paper calls for a broader perspective by considering how fabrication methods align with the capabilities of 3D generative AI. As a case study, we present a novel system using discrete robotic assembly and 3D generative AI to make physical objects. Through this work, we identified five key aspects to consider in a physical making process based on the capabilities of 3D generative AI. 1) Fabrication Constraints: Current text-to-3D models can generate a wide range of 3D designs, requiring fabrication methods that can adapt to the variability of generative AI outputs. 2) Time: While generative AI can generate 3D models in seconds, fabricating physical objects can take hours or even days. Faster production could enable a closer iterative design loop between humans and AI in the making process. 3) Sustainability: Although text-to-3D models can generate thousands of models in the digital world, extending this capability to the real world would be resource-intensive, unsustainable and irresponsible. 4) Functionality: Unlike digital outputs from 3D generative AI models, the fabrication method plays a crucial role in the usability of physical objects. 5) Accessibility: While generative AI simplifies 3D model creation, the need for fabrication equipment can limit participation, making AI-assisted creation less inclusive. These five key aspects provide a framework for assessing how well a physical making process aligns with the capabilities of 3D generative AI and values in the world.

Authors:Will Epperson, Arpit Mathur, Adam Perer, Dominik Moritz
Title: Texture: Structured Exploration of Text Datasets
Abstract:
Exploratory analysis of a text corpus is essential for assessing data quality and developing meaningful hypotheses. Text analysis relies on understanding documents through structured attributes spanning various granularities of the documents such as words, phrases, sentences, topics, or clusters. However, current text visualization tools typically adopt a fixed representation tailored to specific tasks or domains, requiring users to switch tools as their analytical goals change. To address this limitation, we present Texture, a general-purpose interactive text exploration tool. Texture introduces a configurable data schema for representing text documents enriched with descriptive attributes. These attributes can appear at arbitrary levels of granularity in the text and possibly have multiple values, including document-level attributes, multi-valued attributes (e.g., topics), fine-grained span-level attributes (e.g., words), and vector embeddings. The system then combines existing interactive methods for text exploration into a single interface that provides attribute overview visualizations, supports cross-filtering attribute charts to explore subsets, uses embeddings for a dataset overview and similar instance search, and contextualizes filters in the actual documents. We evaluated Texture through a two-part user study with 10 participants from varied domains who each analyzed their own dataset in a baseline session and then with Texture. Texture was able to represent all of the previously derived dataset attributes, enabled participants to more quickly iterate during their exploratory analysis, and discover new insights about their data. Our findings contribute to the design of scalable, interactive, and flexible exploration systems that improve users' ability to make sense of text data.

Authors:Tao Long, Kendra Wannamaker, Jo Vermeulen, George Fitzmaurice, Justin Matejka
Title: FeedQUAC: Quick Unobtrusive AI-Generated Commentary
Abstract:
Design thrives on feedback. However, gathering constant feedback throughout the design process can be labor-intensive and disruptive. We explore how AI can bridge this gap by providing effortless, ambient feedback. We introduce FeedQUAC, a design companion that delivers real-time AI-generated commentary from a variety of perspectives through different personas. A design probe study with eight participants highlights how designers can leverage quick yet ambient AI feedback to enhance their creative workflows. Participants highlight benefits such as convenience, playfulness, confidence boost, and inspiration from this lightweight feedback agent, while suggesting additional features, like chat interaction and context curation. We discuss the role of AI feedback, its strengths and limitations, and how to integrate it into existing design workflows while balancing user involvement. Our findings also suggest that ambient interaction is a valuable consideration for both the design and evaluation of future creativity support systems.

Authors:Samuel J. Levulis, Kevin W. Rio, Pablo Ramon Soria, James Wilmott, Charlie S. Burlingham, Phillip Guan
Title: Subthreshold Jitter in VR Can Induce Visual Discomfort
Abstract:
Visual-vestibular conflicts (VVCs) are a primary contributor to visually induced motion sickness (VIMS) in head-mounted displays (HMDs). However, virtual reality (VR) comfort studies often rely on exposing seated or standing users to experiences with high intensity visual motion (such as roller coasters). These drastic VVCs tend to induce pronounced VIMS symptoms that can be reliably detected across individuals using common survey measures. The conclusions from studies using these extreme motion-based conflicts may not accurately generalize to naturalistic use cases in VR where efforts are made to minimize, rather than maximize, VIMS symptoms. In this work, we show that a subthreshold visual-vestibular conflict can induce measurable discomfort during naturalistic, long duration use. We first present a psychophysical study, conducted outside of an HMD, to rigorously identify the perceptual thresholds for sinusoidal noise in render pose (i.e., jitter) resulting in erroneous 3D motion of rendered content. We next introduce subthreshold levels of jitter to a Meta Quest 3 VR HMD and demonstrate that this can induce visual discomfort in participants playing the commercially-available game Cubism across a three-session, repeated-measures study. Importantly, we did not identify statistically significant comfort differences between control and jitter conditions with traditional pre- and post-test comparison of Simulator Sickness Questionnaire (SSQ) scores. Significant differences were only identified using the Motion Illness Symptoms Classification (MISC) survey administered every 10 minutes across each 90 minute session. This highlights the benefits of incorporating time-resolved data points and suggests that lightweight, more frequent surveys may be important tools for measuring visual discomfort in more ecologically-valid scenarios.

Authors:Lukas Gehrke, Aleksandrs Koselevs, Marius Klug, Klaus Gramann
Title: Neuroadaptive Haptics: Comparing Reinforcement Learning from Explicit Ratings and Neural Signals for Adaptive XR Systems
Abstract:
Neuroadaptive haptics offers a path to more immersive extended reality (XR) experiences by dynamically tuning multisensory feedback to user preferences. We present a neuroadaptive haptics system that adapts XR feedback through reinforcement learning (RL) from explicit user ratings and brain-decoded neural signals. In a user study, participants interacted with virtual objects in VR while Electroencephalography (EEG) data were recorded. An RL agent adjusted haptic feedback based either on explicit ratings or on outputs from a neural decoder. Results show that the RL agent's performance was comparable across feedback sources, suggesting that implicit neural feedback can effectively guide personalization without requiring active user input. The EEG-based neural decoder achieved a mean F1 score of 0.8, supporting reliable classification of user experience. These findings demonstrate the feasibility of combining brain-computer interfaces (BCI) and RL to autonomously adapt XR interactions, reducing cognitive load and enhancing immersion.

Authors:Xinyi Zhang, Muskan Gupta, Emily Altland, Sang Won Lee
Title: Understanding the Perceptions of Trigger Warning and Content Warning on Social Media Platforms in the U.S
Abstract:
The prevalence of distressing content on social media raises concerns about users' mental well-being, prompting the use of trigger warnings (TW) and content warnings (CW). However, inconsistent implementation of TW/CW across platforms and the lack of standardized practices confuse users regarding these warnings. To better understand how users experienced and utilized these warnings, we conducted a semi-structured interview study with 15 general social media users. Our findings reveal challenges across three key stakeholders: viewers, who need to decide whether to engage with warning-labeled content; posters, who struggle with whether and how to apply TW/CW to the content; and platforms, whose design features shape the visibility and usability of warnings. While users generally expressed positive attitudes toward warnings, their understanding of TW/CW usage was limited. Based on these insights, we proposed a conceptual framework of the TW/CW mechanisms from multiple stakeholders' perspectives. Lastly, we further reflected on our findings and discussed the opportunities for social media platforms to enhance users' TW/CW experiences, fostering a more trauma-informed social media environment.

Authors:Dong Zhang, Yanjun Zhou, Jingyi Yu
Title: The Ephemeral Shadow: Hyperreal Beings in Stimulative Performance
Abstract:
The Ephemeral Shadow is an interactive art installation centered on the concept of "simulacrum," focusing on the reconstruction of subjectivity at the intersection of reality and virtuality. Drawing inspiration from the aesthetic imagery of traditional shadow puppetry, the installation combines robotic performance and digital projection to create a multi-layered visual space, presenting a progressively dematerialized hyperreal experience. By blurring the audience's perception of the boundaries between entity and image, the work employs the replacement of physical presence with imagery as its core technique, critically reflecting on issues of technological subjectivity, affective computing, and ethics. Situated within the context of posthumanism and digital media, the installation prompts viewers to contemplate: as digital technologies increasingly approach and simulate "humanity," how can we reshape identity and perception while safeguarding the core values and ethical principles of human subjectivity?

Authors:K M Sajjadul Islam, Ravi Teja Karri, Srujan Vegesna, Jiawei Wu, Praveen Madiraju
Title: Contextual Embedding-based Clustering to Identify Topics for Healthcare Service Improvement
Abstract:
Understanding patient feedback is crucial for improving healthcare services, yet analyzing unlabeled short-text feedback presents significant challenges due to limited data and domain-specific nuances. Traditional supervised learning approaches require extensive labeled datasets, making unsupervised methods more viable for uncovering meaningful insights from patient feedback. This study explores unsupervised methods to extract meaningful topics from 439 survey responses collected from a healthcare system in Wisconsin, USA. A keyword-based filtering approach was applied to isolate complaint-related feedback using a domain-specific lexicon. To delve deeper and analyze dominant topics in feedback, we explored traditional topic modeling methods, including Latent Dirichlet Allocation (LDA) and Gibbs Sampling Dirichlet Multinomial Mixture (GSDMM), alongside BERTopic, an advanced neural embedding-based clustering approach. To improve coherence and interpretability where data are scarce and consist of short-texts, we propose kBERT, an integration of BERT embeddings with k-means clustering. Model performance was assessed using coherence scores (Cv ) for topic interpretability and average Inverted Rank-Biased Overlap (IRBOavg) for topic diversity. Results indicate that kBERT achieves the highest coherence (Cv = 0.53) and distinct topic separation (IRBOavg = 1.00), outperforming all other models in short-text healthcare feedback analysis. Our findings emphasize the importance of embedding-based techniques for topic identification and highlight the need for context-aware models in healthcare analytics.

Authors:Shahan Ali Memon, Soham De, Sungha Kang, Riyan Mujtaba, Bedoor AlShebli, Katie Davis, Jaime Snyder, Jevin D. West
Title: From job titles to jawlines: Using context voids to study generative AI systems
Abstract:
In this paper, we introduce a speculative design methodology for studying the behavior of generative AI systems, framing design as a mode of inquiry. We propose bridging seemingly unrelated domains to generate intentional context voids, using these tasks as probes to elicit AI model behavior. We demonstrate this through a case study: probing the ChatGPT system (GPT-4 and DALL-E) to generate headshots from professional Curricula Vitae (CVs). In contrast to traditional ways, our approach assesses system behavior under conditions of radical uncertainty -- when forced to invent entire swaths of missing context -- revealing subtle stereotypes and value-laden assumptions. We qualitatively analyze how the system interprets identity and competence markers from CVs, translating them into visual portraits despite the missing context (i.e. physical descriptors). We show that within this context void, the AI system generates biased representations, potentially relying on stereotypical associations or blatant hallucinations.

Authors:Yuanda Hu, Hou Jiani, Zhang Junyu, Yate Ge, Xiaohua Sun, Weiwei Guo
Title: Task Matters: Investigating Human Questioning Behavior in Different Household Service for Learning by Asking Robots
Abstract:
Learning by Asking (LBA) enables robots to identify knowledge gaps during task execution and acquire the missing information by asking targeted questions. However, different tasks often require different types of questions, and how to adapt questioning strategies accordingly remains underexplored. This paper investigates human questioning behavior in two representative household service tasks: a Goal-Oriented task (refrigerator organization) and a Process-Oriented task (cocktail mixing). Through a human-human study involving 28 participants, we analyze the questions asked using a structured framework that encodes each question along three dimensions: acquired knowledge, cognitive process, and question form. Our results reveal that participants adapt both question types and their temporal ordering based on task structure. Goal-Oriented tasks elicited early inquiries about user preferences, while Process-Oriented tasks led to ongoing, parallel questioning of procedural steps and preferences. These findings offer actionable insights for developing task-sensitive questioning strategies in LBA-enabled robots for more effective and personalized human-robot collaboration.

Authors:Ilya Zakharov, Ekaterina Koshchenko, Agnia Sergeyuk
Title: From Teacher to Colleague: How Coding Experience Shapes Developer Perceptions of AI Tools
Abstract:
AI-assisted development tools promise productivity gains and improved code quality, yet their adoption among developers remains inconsistent. Prior research suggests that professional expertise influences technology adoption, but its role in shaping developers' perceptions of AI tools is unclear. We analyze survey data from 3380 developers to examine how coding experience relates to AI awareness, adoption, and the roles developers assign to AI in their workflow. Our findings reveal that coding experience does not predict AI adoption but significantly influences mental models of AI's role. Experienced developers are more likely to perceive AI as a junior colleague, a content generator, or assign it no role, whereas less experienced developers primarily view AI as a teacher. These insights suggest that AI tools must align with developers' expertise levels to drive meaningful adoption.

Authors:Yuanjun Feng, Vivek Chodhary, Yash Raj Shrestha
Title: Human aversion? Do AI Agents Judge Identity More Harshly Than Performance
Abstract:
This study examines the understudied role of algorithmic evaluation of human judgment in hybrid decision-making systems, a critical gap in management research. While extant literature focuses on human reluctance to follow algorithmic advice, we reverse the perspective by investigating how AI agents based on large language models (LLMs) assess and integrate human input. Our work addresses a pressing managerial constraint: firms barred from deploying LLMs directly due to privacy concerns can still leverage them as mediating tools (for instance, anonymized outputs or decision pipelines) to guide high-stakes choices like pricing or discounts without exposing proprietary data. Through a controlled prediction task, we analyze how an LLM-based AI agent weights human versus algorithmic predictions. We find that the AI system systematically discounts human advice, penalizing human errors more severely than algorithmic errors--a bias exacerbated when the agent's identity (human vs AI) is disclosed and the human is positioned second. These results reveal a disconnect between AI-generated trust metrics and the actual influence of human judgment, challenging assumptions about equitable human-AI collaboration. Our findings offer three key contributions. First, we identify a reverse algorithm aversion phenomenon, where AI agents undervalue human input despite comparable error rates. Second, we demonstrate how disclosure and positional bias interact to amplify this effect, with implications for system design. Third, we provide a framework for indirect LLM deployment that balances predictive power with data privacy. For practitioners, this research emphasize the need to audit AI weighting mechanisms, calibrate trust dynamics, and strategically design decision sequences in human-AI systems.

Authors:Laura Aymerich-Franch, Tarek Taha, Hiroshi Ishiguro, Takahiro Miyashita, Paolo Dario
Title: Stakeholder perspectives on designing socially acceptable social robots and robot avatars for Dubai and multicultural societies
Abstract:
Robot avatars for customer service are gaining traction in Japan. However, their acceptance in other societal contexts remains underexplored, complicating efforts to design robot avatars suitable for diverse cultural environments. To address this, we interviewed key stakeholders in Dubai's service sector to gain insights into their experiences deploying social robots for customer service, as well as their opinions on the most useful tasks and design features that could maximize customer acceptance of robot avatars in Dubai. Providing information and guiding individuals to specific locations were identified as the most valued functions. Regarding appearance, robotic-looking, highly anthropomorphic designs were the most preferred. Ultra-realistic androids and cartoonish-looking robots elicited mixed reactions, while hybrid androids, low-anthropomorphic robotic designs, and animal-looking robots were considered less suitable or discouraged. Additionally, a psycho-sociological analysis revealed that interactions with robot avatars are influenced by their symbolic meaning, context, and affordances. These findings offer pioneering insights into culturally adaptive robot avatar design, addressing a significant research gap and providing actionable guidelines for deploying socially acceptable robots and avatars in multicultural contexts worldwide.

Authors:Tiancheng Liu, Anqi Wang, Xinda Chen, Jing Yan, Yin Li, Pan Hui, Kang Zhang
Title: PoEmotion: Can AI Utilize Chinese Calligraphy to Express Emotion from Poems?
Abstract:
This paper presents PoEmotion, an approach to visualizing emotions in poetry with Chinese calligraphy strokes. Traditional textual emotion analysis often lacks emotional resonance due to its mechanical nature. PoEmotion combines natural language processing with deep learning generative algorithms to create Chinese calligraphy that effectively conveys the emotions in poetry. The created calligraphy represents four fundamental emotions: excitement, anger, sadness, and relaxation, making the visual representation of emotions intuitive and concise. Furthermore, the approach delves into the relationship be-tween time, emotion, and cultural communication. Its goal is to provide a more natural means of communicating emotions through non-verbal mediums to enhance human emotional expression.

Authors:Lisa Egede, Ebtesam Al Haque, Gabriella Thompson, Alicia Boyd, Angela D. R. Smith, Brittany Johnson
Title: Exploring Culturally Informed AI Assistants: A Comparative Study of ChatBlackGPT and ChatGPT
Abstract:
In recent years, we have seen an influx in reliance on AI assistants for information seeking. Given this widespread use and the known challenges AI poses for Black users, recent efforts have emerged to identify key considerations needed to provide meaningful support. One notable effort is the development of ChatBlackGPT, a culturally informed AI assistant designed to provide culturally relevant responses. Despite the existence of ChatBlackGPT, there is no research on when and how Black communities might engage with culturally informed AI assistants and the distinctions between engagement with general purpose tools like ChatGPT. To fill this gap, we propose a research agenda grounded in results from a preliminary comparative analysis of outputs provided by ChatGPT and ChatBlackGPT for travel-related inquiries. Our efforts thus far emphasize the need to consider Black communities' values, perceptions, and experiences when designing AI assistants that acknowledge the Black lived experience.

Authors:Xiao Jin, Bo Xiao, Huijiang Wang, Wendong Wang, Zhenhua Yu
Title: Multi-Sensor Fusion-Based Mobile Manipulator Remote Control for Intelligent Smart Home Assistance
Abstract:
This paper proposes a wearable-controlled mobile manipulator system for intelligent smart home assistance, integrating MEMS capacitive microphones, IMU sensors, vibration motors, and pressure feedback to enhance human-robot interaction. The wearable device captures forearm muscle activity and converts it into real-time control signals for mobile manipulation. The wearable device achieves an offline classification accuracy of 88.33\%\ across six distinct movement-force classes for hand gestures by using a CNN-LSTM model, while real-world experiments involving five participants yield a practical accuracy of 83.33\%\ with an average system response time of 1.2 seconds. In Human-Robot synergy in navigation and grasping tasks, the robot achieved a 98\%\ task success rate with an average trajectory deviation of only 3.6 cm. Finally, the wearable-controlled mobile manipulator system achieved a 93.3\%\ gripping success rate, a transfer success of 95.6\%\, and a full-task success rate of 91.1\%\ during object grasping and transfer tests, in which a total of 9 object-texture combinations were evaluated. These three experiments' results validate the effectiveness of MEMS-based wearable sensing combined with multi-sensor fusion for reliable and intuitive control of assistive robots in smart home scenarios.

Authors:Jiaying "Lizzy" Liu, Shuer Zhuo, Xingyu Li, Andrew Dillon, Noura Howell, Angela D. R. Smith, Yan Zhang
Title: From Regulation to Support: Centering Humans in Technology-Mediated Emotion Intervention in Care Contexts
Abstract:
Enhancing emotional well-being has become a significant focus in HCI and CSCW, with technologies increasingly designed to track, visualize, and manage emotions. However, these approaches have faced criticism for potentially suppressing certain emotional experiences. Through a scoping review of 53 empirical studies from ACM proceedings implementing Technology-Mediated Emotion Intervention (TMEI), we critically examine current practices through lenses drawn from HCI critical theories. Our analysis reveals emotion intervention mechanisms that extend beyond traditional emotion regulation paradigms, identifying care-centered goals that prioritize non-judgmental emotional support and preserve users' identities. The findings demonstrate how researchers design technologies for generating artificial care, intervening in power dynamics, and nudging behavioral changes. We contribute the concept of "emotion support" as an alternative approach to "emotion regulation," emphasizing human-centered approaches to emotional well-being. This work advances the understanding of diverse human emotional needs beyond individual and cognitive perspectives, offering design implications that critically reimagine how technologies can honor emotional complexity, preserve human agency, and transform power dynamics in care contexts.

Authors:Vasco Xu, Chenfeng Gao, Henry Hoffmann, Karan Ahuja
Title: MobilePoser: Real-Time Full-Body Pose Estimation and 3D Human Translation from IMUs in Mobile Consumer Devices
Abstract:
There has been a continued trend towards minimizing instrumentation for full-body motion capture, going from specialized rooms and equipment, to arrays of worn sensors and recently sparse inertial pose capture methods. However, as these techniques migrate towards lower-fidelity IMUs on ubiquitous commodity devices, like phones, watches, and earbuds, challenges arise including compromised online performance, temporal consistency, and loss of global translation due to sensor noise and drift. Addressing these challenges, we introduce MobilePoser, a real-time system for full-body pose and global translation estimation using any available subset of IMUs already present in these consumer devices. MobilePoser employs a multi-stage deep neural network for kinematic pose estimation followed by a physics-based motion optimizer, achieving state-of-the-art accuracy while remaining lightweight. We conclude with a series of demonstrative applications to illustrate the unique potential of MobilePoser across a variety of fields, such as health and wellness, gaming, and indoor navigation to name a few.

Authors:Mohi Reza, Jeb Thomas-Mitchell, Peter Dushniku, Nathan Laundry, Joseph Jay Williams, Anastasia Kuzminykh
Title: Co-Writing with AI, on Human Terms: Aligning Research with User Demands Across the Writing Process
Abstract:
As generative AI tools like ChatGPT become integral to everyday writing, critical questions arise about how to preserve writers' sense of agency and ownership when using these tools. Yet, a systematic understanding of how AI assistance affects different aspects of the writing process - and how this shapes writers' agency - remains underexplored. To address this gap, we conducted a systematic review of 109 HCI papers using the PRISMA approach. From this literature, we identify four overarching design strategies for AI writing support: structured guidance, guided exploration, active co-writing, and critical feedback - mapped across the four key cognitive processes in writing: planning, translating, reviewing, and monitoring. We complement this analysis with interviews of 15 writers across diverse domains. Our findings reveal that writers' desired levels of AI intervention vary across the writing process: content-focused writers (e.g., academics) prioritize ownership during planning, while form-focused writers (e.g., creatives) value control over translating and reviewing. Writers' preferences are also shaped by contextual goals, values, and notions of originality and authorship. By examining when ownership matters, what writers want to own, and how AI interactions shape agency, we surface both alignment and gaps between research and user needs. Our findings offer actionable design guidance for developing human-centered writing tools for co-writing with AI, on human terms.

Authors:Han Zhang, Yiyi Ren, Paula S. Nurius, Jennifer Mankoff, Anind K. Dey
Title: Towards Human-Centered Early Prediction Models for Academic Performance in Real-World Contexts
Abstract:
Supporting student success requires collaboration among multiple stakeholders. Researchers have explored machine learning models for academic performance prediction; yet key challenges remain in ensuring these models are interpretable, equitable, and actionable within real-world educational support systems. First, many models prioritize predictive accuracy but overlook human-centered machine learning principles, limiting trust among students and reducing their usefulness for educators and institutional decision-makers. Second, most models require at least a month of data before making reliable predictions, delaying opportunities for early intervention. Third, current models primarily rely on sporadically collected, classroom-derived data, missing broader behavioral patterns that could provide more continuous and actionable insights. To address these gaps, we present three modeling approaches-LR, 1D-CNN, and MTL-1D-CNN-to classify students as low or high academic performers. We evaluate them based on explainability, fairness, and generalizability to assess their alignment with key social values. Using behavioral and self-reported data collected within the first week of two Spring terms, we demonstrate that these models can identify at-risk students as early as week one. However, trade-offs across human-centered machine learning principles highlight the complexity of designing predictive models that effectively support multi-stakeholder decision-making and intervention strategies. We discuss these trade-offs and their implications for different stakeholders, outlining how predictive models can be integrated into student support systems. Finally, we examine broader socio-technical challenges in deploying these models and propose future directions for advancing human-centered, collaborative academic prediction systems.

Authors:Yujia Liu, Siyu Zha, Yuewen Zhang, Yanjin Wang, Yangming Zhang, Qi Xin, Lunyiu Nie, Chao Zhang, Yingqing Xu
Title: BrickSmart: Leveraging Generative AI to Support Children's Spatial Language Learning in Family Block Play
Abstract:
Block-building activities are crucial for developing children's spatial reasoning and mathematical skills, yet parents often lack the expertise to guide these activities effectively. BrickSmart, a pioneering system, addresses this gap by providing spatial language guidance through a structured three-step process: Discovery & Design, Build & Learn, and Explore & Expand. This system uniquely supports parents in 1) generating personalized block-building instructions, 2) guiding parents to teach spatial language during building and interactive play, and 3) tracking children's learning progress, altogether enhancing children's engagement and cognitive development. In a comparative study involving 12 parent-child pairs children aged 6-8 years) for both experimental and control groups, BrickSmart demonstrated improvements in supportiveness, efficiency, and innovation, with a significant increase in children's use of spatial vocabularies during block play, thereby offering an effective framework for fostering spatial language skills in children.

Authors:Naoto Nishida, Hirotaka Hiraki, Jun Rekimoto, Yoshio Ishiguro
Title: Real-Time Word-Level Temporal Segmentation in Streaming Speech Recognition
Abstract:
Rich-text captions are essential to help communication for Deaf and hard-of-hearing (DHH) people, second-language learners, and those with autism spectrum disorder (ASD). They also preserve nuances when converting speech to text, enhancing the realism of presentation scripts and conversation or speech logs. However, current real-time captioning systems lack the capability to alter text attributes (ex. capitalization, sizes, and fonts) at the word level, hindering the accurate conveyance of speaker intent that is expressed in the tones or intonations of the speech. For example, ''YOU should do this'' tends to be considered as indicating ''You'' as the focus of the sentence, whereas ''You should do THIS'' tends to be ''This'' as the focus. This paper proposes a solution that changes the text decorations at the word level in real time. As a prototype, we developed an application that adjusts word size based on the loudness of each spoken word. Feedback from users implies that this system helped to convey the speaker's intent, offering a more engaging and accessible captioning experience.

Authors:Ralf Schmälzle, Sue Lim, Yuetong Du, Gary Bente
Title: The Art of Audience Engagement: LLM-Based Thin-Slicing of Scientific Talks
Abstract:
This paper examines the thin-slicing approach - the ability to make accurate judgments based on minimal information - in the context of scientific presentations. Drawing on research from nonverbal communication and personality psychology, we show that brief excerpts (thin slices) reliably predict overall presentation quality. Using a novel corpus of over one hundred real-life science talks, we employ Large Language Models (LLMs) to evaluate transcripts of full presentations and their thin slices. By correlating LLM-based evaluations of short excerpts with full-talk assessments, we determine how much information is needed for accurate predictions. Our results demonstrate that LLM-based evaluations align closely with human ratings, proving their validity, reliability, and efficiency. Critically, even very short excerpts (less than 10 percent of a talk) strongly predict overall evaluations. This suggests that the first moments of a presentation convey relevant information that is used in quality evaluations and can shape lasting impressions. The findings are robust across different LLMs and prompting strategies. This work extends thin-slicing research to public speaking and connects theories of impression formation to LLMs and current research on AI communication. We discuss implications for communication and social cognition research on message reception. Lastly, we suggest an LLM-based thin-slicing framework as a scalable feedback tool to enhance human communication.

Authors:Ashwin Ram, Varsha Suresh, Artin Saberpour Abadian, Vera Demberg, Jürgen Steimle
Title: GestureCoach: Rehearsing for Engaging Talks with LLM-Driven Gesture Recommendations
Abstract:
This paper introduces GestureCoach, a system designed to help speakers deliver more engaging talks by guiding them to gesture effectively during rehearsal. GestureCoach combines an LLM-driven gesture recommendation model with a rehearsal interface that proactively cues speakers to gesture appropriately. Trained on experts' gesturing patterns from TED talks, the model consists of two modules: an emphasis proposal module, which predicts when to gesture by identifying gesture-worthy text segments in the presenter notes, and a gesture identification module, which determines what gesture to use by retrieving semantically appropriate gestures from a curated gesture database. Results of a model performance evaluation and user study (N=30) show that the emphasis proposal module outperforms off-the-shelf LLMs in identifying suitable gesture regions, and that participants rated the majority of these predicted regions and their corresponding gestures as highly appropriate. A subsequent user study (N=10) showed that rehearsing with GestureCoach encouraged speakers to gesture and significantly increased gesture diversity, resulting in more engaging talks. We conclude with design implications for future AI-driven rehearsal systems.

Authors:Mariana Fernandez-Espinosa, Mariana Gonzalez-Bejar, Jacobo Wiesner, Diego Gomez-Zara
Title: When Technologies Are Not Enough: Understanding How Domestic Workers Employ (and Avoid) Online Technologies in Their Work Practices
Abstract:
Although domestic work is often viewed as manual labor, it involves significant interaction with online technologies. However, the detailed exploration of how domestic workers use these technologies remains limited. This study examines the impact of online technologies on domestic workers' work practices, perceptions, and relationships with customers and employers. We interviewed 30 domestic workers residing in the United States, who provided examples that highlight the insufficient transformative role of current online technologies in their work. By conducting a thematic analysis, we characterize how they approach and avoid these digital tools at different stages of their work. Through these findings, we investigate the limitations of technology and identify challenges and opportunities that could inform the design of more suitable tools to improve the conditions of this marginalized group.

Authors:Diana Robinson, Neil Lawrence
Title: The Human Visual System Can Inspire New Interaction Paradigms for LLMs
Abstract:
The dominant metaphor of LLMs-as-minds leads to misleading conceptions of machine agency and is limited in its ability to help both users and developers build the right degree of trust and understanding for outputs from LLMs. It makes it harder to disentangle hallucinations from useful model interactions. This position paper argues that there are fundamental similarities between visual perception and the way LLMs process and present language. These similarities inspire a metaphor for LLMs which could open new avenues for research into interaction paradigms and shared representations. Our visual system metaphor introduces possibilities for addressing these challenges by understanding the information landscape assimilated by LLMs. In this paper we motivate our proposal, introduce the interrelating theories from the fields that inspired this view and discuss research directions that stem from this abstraction.

Authors:Yue Yang, Mengyao Guo, Yuxuan Wu, Wally Niu, Emmanuel A Corona, Bruce Daniel, Christoph Leuze, Fred Baik
Title: VR MRI Training for Adolescents: A Comparative Study of Gamified VR, Passive VR, 360° Video, and Traditional Educational Video
Abstract:
Meta Quest Store: https://www.meta.com/experiences/stanford-mri-simulator/8205539289482347/ Magnetic Resonance Imaging (MRI) can be a stressful experience for pediatric patients due to the loud acoustic environment, enclosed scanner bore, and a prolonged requirement to remain still. While sedation is commonly used to manage anxiety and motion, it carries clinical risks and logistical burdens. Traditional preparatory approaches, such as instructional videos and mock scans, often lack engagement for older children and adolescents. In this study, we present a comparative evaluation of four MRI preparation modalities: (1) a gamified virtual reality (VR) simulation that trains stillness through real-time feedback; (2) a passive VR experience replicating the MRI environment without interactivity; (3) a 360° first-person video of a real MRI procedure; and (4) a standard 2D educational video. Using a within-subjects design (N = 11, ages 10-16), we assess each method's impact on head motion data, anxiety reduction, procedural preparedness, usability, cognitive workload, and subjective preference. Results show that the gamified VR condition has significantly lower head motion (p < 0.001) and yielded the highest preparedness scores (p < 0.05). Head motion data were significantly correlated with learning outcomes (p < 0.01), suggesting that behavioral performance in VR strongly indicates procedural readiness. While all modalities reduced anxiety and were rated usable, interactive VR was preferred by most participants and demonstrated unique advantages in promoting engagement and behavioral rehearsal. We conclude with design recommendations for designing immersive simulations and integrating VR training into pediatric imaging workflows.

Authors:Anja Heim, Thomas Lang, Alexander Gall, Eduard Gröller, Christoph Heinzl
Title: Quantum Image Visualizer: Visual Debugging of Quantum Image Processing Circuits
Abstract:
Quantum computing is an emerging field that utilizes the unique principles of quantum mechanics to offer significant advantages in algorithm execution over classical approaches. This potential is particularly promising in the domain of quantum image processing, which aims to manipulate all pixels simultaneously. However, the process of designing and verifying these algorithms remains a complex and error-prone task. To address this challenge, new methods are needed to support effective debugging of quantum circuits. The Quantum Image Visualizer is an interactive visual analysis tool that allows for the examination of quantum images and their transformation throughout quantum circuits. The framework incorporates two overview visualizations that trace image evolution across a sequence of gates based on the most probable outcomes. Interactive exploration allows users to focus on relevant gates, and select pixels of interest. Upon selection, detailed visualizations enable in-depth inspection of individual pixels and their probability distributions, revealing how specific gates influence the likelihood of pixel color values and the magnitude of these changes. An evaluation of the Quantum Image Visualizer was conducted through in-depth interviews with eight domain experts. The findings demonstrate the effectiveness and practical value of our approach in supporting visual debugging of quantum image processing circuits.

Authors:Weiye Xu, Tony Li, Yuntao Wang, Xing-dong Yang, Te-yen Wu
Title: BIT: Battery-free, IC-less and Wireless Smart Textile Interface and Sensing System
Abstract:
The development of smart textile interfaces is hindered by the inclusion of rigid hardware components and batteries within the fabric, which pose challenges in terms of manufacturability, usability, and environmental concerns related to electronic waste. To mitigate these issues, we propose a smart textile interface and its wireless sensing system to eliminate the need for ICs, batteries, and connectors embedded into textiles. Our technique is established on the integration of multi-resonant circuits in smart textile interfaces, and utilizing near-field electromagnetic coupling between two coils to facilitate wireless power transfer and data acquisition from smart textile interface. A key aspect of our system is the development of a mathematical model that accurately represents the equivalent circuit of the sensing system. Using this model, we developed a novel algorithm to accurately estimate sensor signals based on changes in system impedance. Through simulation-based experiments and a user study, we demonstrate that our technique effectively supports multiple textile sensors of various types.

Authors:Shira Michel, Sufi Kaur, Sarah Elizabeth Gillespie, Jeffrey Gleason, Christo Wilson, Avijit Ghosh
Title: "It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services
Abstract:
Recent advances in artificial intelligence (AI) speech generation and voice cloning technologies have produced naturalistic speech and accurate voice replication, yet their influence on sociotechnical systems across diverse accents and linguistic traits is not fully understood. This study evaluates two synthetic AI voice services (Speechify and ElevenLabs) through a mixed methods approach using surveys and interviews to assess technical performance and uncover how users' lived experiences influence their perceptions of accent variations in these speech technologies. Our findings reveal technical performance disparities across five regional, English-language accents and demonstrate how current speech generation technologies may inadvertently reinforce linguistic privilege and accent-based discrimination, potentially creating new forms of digital exclusion. Overall, our study highlights the need for inclusive design and regulation by providing actionable insights for developers, policymakers, and organizations to ensure equitable and socially responsible AI speech technologies.

Authors:Hsuan Wei Liao, Christopher Klugmann, Daniel Kondermann, Rafid Mahmood
Title: Minority Reports: Balancing Cost and Quality in Ground Truth Data Annotation
Abstract:
High-quality data annotation is an essential but laborious and costly aspect of developing machine learning-based software. We explore the inherent tradeoff between annotation accuracy and cost by detecting and removing minority reports -- instances where annotators provide incorrect responses -- that indicate unnecessary redundancy in task assignments. We propose an approach to prune potentially redundant annotation task assignments before they are executed by estimating the likelihood of an annotator disagreeing with the majority vote for a given task. Our approach is informed by an empirical analysis over computer vision datasets annotated by a professional data annotation platform, which reveals that the likelihood of a minority report event is dependent primarily on image ambiguity, worker variability, and worker fatigue. Simulations over these datasets show that we can reduce the number of annotations required by over 60% with a small compromise in label quality, saving approximately 6.6 days-equivalent of labor. Our approach provides annotation service platforms with a method to balance cost and dataset quality. Machine learning practitioners can tailor annotation accuracy levels according to specific application needs, thereby optimizing budget allocation while maintaining the data quality necessary for critical settings like autonomous driving technology.

Authors:Priyan Vaithilingam, Munyeong Kim, Frida-Cecilia Acosta-Parenteau, Daniel Lee, Amine Mhedhbi, Elena L. Glassman, Ian Arawjo
Title: Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale
Abstract:
How do we update AI memory of user intent as intent changes? We consider how an AI interface may assist the integration of new information into a repository of natural language data. Inspired by software engineering concepts like impact analysis, we develop methods and a UI for managing semantic changes with non-local effects, which we call "semantic conflict resolution." The user commits new intent to a project -- makes a "semantic commit" -- and the AI helps the user detect and resolve semantic conflicts within a store of existing information representing their intent (an "intent specification"). We develop an interface, SemanticCommit, to better understand how users resolve conflicts when updating intent specifications such as Cursor Rules and game design documents. A knowledge graph-based RAG pipeline drives conflict detection, while LLMs assist in suggesting resolutions. We evaluate our technique on an initial benchmark. Then, we report a 12 user within-subjects study of SemanticCommit for two task domains -- game design documents, and AI agent memory in the style of ChatGPT memories -- where users integrated new information into an existing list. Half of our participants adopted a workflow of impact analysis, where they would first flag conflicts without AI revisions then resolve conflicts locally, despite having access to a global revision feature. We argue that AI agent interfaces, such as software IDEs like Cursor and Windsurf, should provide affordances for impact analysis and help users validate AI retrieval independently from generation. Our work speaks to how AI agent designers should think about updating memory as a process that involves human feedback and decision-making.

Authors:Jiho Kim, Philippe Laban, Xiang 'Anthony' Chen, Kenneth C. Arnold
Title: Voice Interaction With Conversational AI Could Facilitate Thoughtful Reflection and Substantive Revision in Writing
Abstract:
Writing well requires not only expressing ideas but also refining them through revision, a process facilitated by reflection. Prior research suggests that feedback delivered through dialogues, such as those in writing center tutoring sessions, can help writers reflect more thoughtfully on their work compared to static feedback. Recent advancements in multi-modal large language models (LLMs) now offer new possibilities for supporting interactive and expressive voice-based reflection in writing. In particular, we propose that LLM-generated static feedback can be repurposed as conversation starters, allowing writers to seek clarification, request examples, and ask follow-up questions, thereby fostering deeper reflection on their writing. We argue that voice-based interaction can naturally facilitate this conversational exchange, encouraging writers' engagement with higher-order concerns, facilitating iterative refinement of their reflections, and reduce cognitive load compared to text-based interactions. To investigate these effects, we propose a formative study exploring how text vs. voice input influence writers' reflection and subsequent revisions. Findings from this study will inform the design of intelligent and interactive writing tools, offering insights into how voice-based interactions with LLM-powered conversational agents can support reflection and revision.

Authors:Adithya Krishna, Sohan Debnath, Madhuvanthi Srivatsav, André van Schaik, Mahesh Mehendale, Chetan Singh Thakur
Title: Neural Signal Compression using RAMAN tinyML Accelerator for BCI Applications
Abstract:
High-quality, multi-channel neural recording is indispensable for neuroscience research and clinical applications. Large-scale brain recordings often produce vast amounts of data that must be wirelessly transmitted for subsequent offline analysis and decoding, especially in brain-computer interfaces (BCIs) utilizing high-density intracortical recordings with hundreds or thousands of electrodes. However, transmitting raw neural data presents significant challenges due to limited communication bandwidth and resultant excessive heating. To address this challenge, we propose a neural signal compression scheme utilizing Convolutional Autoencoders (CAEs), which achieves a compression ratio of up to 150 for compressing local field potentials (LFPs). The CAE encoder section is implemented on RAMAN, an energy-efficient tinyML accelerator designed for edge computing. RAMAN leverages sparsity in activation and weights through zero skipping, gating, and weight compression techniques. Additionally, we employ hardware-software co-optimization by pruning the CAE encoder model parameters using a hardware-aware balanced stochastic pruning strategy, resolving workload imbalance issues and eliminating indexing overhead to reduce parameter storage requirements by up to 32.4%. Post layout simulation shows that the RAMAN encoder can be implemented in a TSMC 65-nm CMOS process, occupying a core area of 0.0187 mm2 per channel. Operating at a clock frequency of 2 MHz and a supply voltage of 1.2 V, the estimated power consumption is 15.1 uW per channel for the proposed DS-CAE1 model. For functional validation, the RAMAN encoder was also deployed on an Efinix Ti60 FPGA, utilizing 37.3k LUTs and 8.6k flip-flops. The compressed neural data from RAMAN is reconstructed offline with SNDR of 22.6 dB and 27.4 dB, along with R2 scores of 0.81 and 0.94, respectively, evaluated on two monkey neural recordings.

Authors:Sarah Gillet, Katie Winkle, Giulia Belgiovine, Iolanda Leite
Title: Ice-Breakers, Turn-Takers and Fun-Makers: Exploring Robots for Groups with Teenagers
Abstract:
Successful, enjoyable group interactions are important in public and personal contexts, especially for teenagers whose peer groups are important for self-identity and self-esteem. Social robots seemingly have the potential to positively shape group interactions, but it seems difficult to effect such impact by designing robot behaviors solely based on related (human interaction) literature. In this article, we take a user-centered approach to explore how teenagers envisage a social robot "group assistant". We engaged 16 teenagers in focus groups, interviews, and robot testing to capture their views and reflections about robots for groups. Over the course of a two-week summer school, participants co-designed the action space for such a robot and experienced working with/wizarding it for 10+ hours. This experience further altered and deepened their insights into using robots as group assistants. We report results regarding teenagers' views on the applicability and use of a robot group assistant, how these expectations evolved throughout the study, and their repeat interactions with the robot. Our results indicate that each group moves on a spectrum of need for the robot, reflected in use of the robot more (or less) for ice-breaking, turn-taking, and fun-making as the situation demanded.

Authors:Tri Tung Nguyen Nguyen, Quang Tien Dam, Dinh Tuan Tran, Joo-Ho Lee
Title: When Less Is More: A Sparse Facial Motion Structure For Listening Motion Learning
Abstract:
Effective human behavior modeling is critical for successful human-robot interaction. Current state-of-the-art approaches for predicting listening head behavior during dyadic conversations employ continuous-to-discrete representations, where continuous facial motion sequence is converted into discrete latent tokens. However, non-verbal facial motion presents unique challenges owing to its temporal variance and multi-modal nature. State-of-the-art discrete motion token representation struggles to capture underlying non-verbal facial patterns making training the listening head inefficient with low-fidelity generated motion. This study proposes a novel method for representing and predicting non-verbal facial motion by encoding long sequences into a sparse sequence of keyframes and transition frames. By identifying crucial motion steps and interpolating intermediate frames, our method preserves the temporal structure of motion while enhancing instance-wise diversity during the learning process. Additionally, we apply this novel sparse representation to the task of listening head prediction, demonstrating its contribution to improving the explanation of facial motion patterns.

Authors:Lilit Avetisyan, Emmanuel Abolarin, Vanik Zakarian, X. Jessie Yang, Feng Zhou
Title: The Mediating Effects of Emotions on Trust through Risk Perception and System Performance in Automated Driving
Abstract:
Trust in automated vehicles (AVs) has traditionally been explored through a cognitive lens, but growing evidence highlights the significant role emotions play in shaping trust. This study investigates how risk perception and AV performance (error vs. no error) influence emotional responses and trust in AVs, using mediation analysis to examine the indirect effects of emotions. In this study, 70 participants (42 male, 28 female) watched real-life recorded videos of AVs operating with or without errors, coupled with varying levels of risk information (high, low, or none). They reported their anticipated emotional responses using 19 discrete emotion items, and trust was assessed through dispositional, learned, and situational trust measures. Factor analysis identified four key emotional components, namely hostility, confidence, anxiety, and loneliness, that were influenced by risk perception and AV performance. The linear mixed model showed that risk perception was not a significant predictor of trust, while performance and individual differences were. Mediation analysis revealed that confidence was a strong positive mediator, while hostile and anxious emotions negatively impacted trust. However, lonely emotions did not significantly mediate the relationship between AV performance and trust. The results show that real-time AV behavior is more influential on trust than pre-existing risk perceptions, indicating trust in AVs might be more experience-based than shaped by prior beliefs. Our findings also underscore the importance of fostering positive emotional responses for trust calibration, which has important implications for user experience design in automated driving.

Authors:Sadra Sabouri, Sepand Haghighi, Elena Masrour
Title: Samila: A Generative Art Generator
Abstract:
Generative art merges creativity with computation, using algorithms to produce aesthetic works. This paper introduces Samila, a Python-based generative art library that employs mathematical functions and randomness to create visually compelling compositions. The system allows users to control the generation process through random seeds, function selections, and projection modes, enabling the exploration of randomness and artistic expression. By adjusting these parameters, artists can create diverse compositions that reflect intentionality and unpredictability. We demonstrate that Samila's outputs are uniquely determined by two random generation seeds, making regeneration nearly impossible without both. Additionally, altering the point generation functions while preserving the seed produces artworks with distinct graphical characteristics, forming a visual family. Samila serves as both a creative tool for artists and an educational resource for teaching mathematical and programming concepts. It also provides a platform for research in generative design and computational aesthetics. Future developments could include AI-driven generation and aesthetic evaluation metrics to enhance creative control and accessibility.

Authors:Xavier V. Caddle, Sarvech Qadir, Charles Hughes, Elizabeth A. Sweigart, Jinkyung Katie Park, Pamela J. Wisniewski
Title: Building a Village: A Multi-stakeholder Approach to Open Innovation and Shared Governance to Promote Youth Online Safety
Abstract:
The SIGCHI and Social Computing research communities have been at the forefront of online safety efforts for youth, ranging from understanding the serious risks youth face online to developing evidence-based interventions for risk protection. Yet, to bring these efforts to bear, we must partner with practitioners, such as industry stakeholders who know how to bring such technologies to market, and youth service providers who work directly with youth. Therefore, we interviewed 33 stakeholders in the space of youth online safety, including industry professionals (n=12), youth service providers (n=11), and researchers (n=10) to understand where their visions toward working together to protect youth online converged and surfaced tensions, as well as how we might reconcile conflicting viewpoints to move forward as one community with synergistic expertise on how to change the current sociotechnical landscape for youth online safety. Overall, we found that non-partisan leadership is necessary to chart actionable, equitable goals to facilitate collaboration between stakeholders, combat feelings of isolation, and foster trust between the stakeholder groups. Based on these findings, we recommend the use of open-innovation methods with their inherent transparency, federated governance models, and clear but inclusive leadership structures to promote collaboration between youth online safety stakeholders. We propose the creation of an open-innovation organization that unifies the diverse voices in youth online safety to develop open-standards and evidence-based design patterns that centralize otherwise fragmented efforts that have fallen short of the goal of effective technological solutions that keep youth safe online.

Authors:Guandong Pan, Zhaobang Wu, Yaqian Yang, Xin Wang, Longzhao Liu, Zhiming Zheng, Shaoting Tang
Title: Potential Indicator for Continuous Emotion Arousal by Dynamic Neural Synchrony
Abstract:
The need for automatic and high-quality emotion annotation is paramount in applications such as continuous emotion recognition and video highlight detection, yet achieving this through manual human annotations is challenging. Inspired by inter-subject correlation (ISC) utilized in neuroscience, this study introduces a novel Electroencephalography (EEG) based ISC methodology that leverages a single-electrode and feature-based dynamic approach. Our contributions are three folds. Firstly, we reidentify two potent emotion features suitable for classifying emotions-first-order difference (FD) an differential entropy (DE). Secondly, through the use of overall correlation analysis, we demonstrate the heterogeneous synchronized performance of electrodes. This performance aligns with neural emotion patterns established in prior studies, thus validating the effectiveness of our approach. Thirdly, by employing a sliding window correlation technique, we showcase the significant consistency of dynamic ISCs across various features or key electrodes in each analyzed film clip. Our findings indicate the method's reliability in capturing consistent, dynamic shared neural synchrony among individuals, triggered by evocative film stimuli. This underscores the potential of our approach to serve as an indicator of continuous human emotion arousal. The implications of this research are significant for advancements in affective computing and the broader neuroscience field, suggesting a streamlined and effective tool for emotion analysis in real-world applications.

Authors:Iván Sevillano-García, Julián Luengo, Francisco Herrera
Title: STOOD-X methodology: using statistical nonparametric test for OOD Detection Large-Scale datasets enhanced with explainability
Abstract:
Out-of-Distribution (OOD) detection is a critical task in machine learning, particularly in safety-sensitive applications where model failures can have serious consequences. However, current OOD detection methods often suffer from restrictive distributional assumptions, limited scalability, and a lack of interpretability. To address these challenges, we propose STOOD-X, a two-stage methodology that combines a Statistical nonparametric Test for OOD Detection with eXplainability enhancements. In the first stage, STOOD-X uses feature-space distances and a Wilcoxon-Mann-Whitney test to identify OOD samples without assuming a specific feature distribution. In the second stage, it generates user-friendly, concept-based visual explanations that reveal the features driving each decision, aligning with the BLUE XAI paradigm. Through extensive experiments on benchmark datasets and multiple architectures, STOOD-X achieves competitive performance against state-of-the-art post hoc OOD detectors, particularly in high-dimensional and complex settings. In addition, its explainability framework enables human oversight, bias detection, and model debugging, fostering trust and collaboration between humans and AI systems. The STOOD-X methodology therefore offers a robust, explainable, and scalable solution for real-world OOD detection tasks.

Authors:Caroline Berger, David Weintrop, Niklas Elmqvist
Title: "I Feel Like I'm Teaching in a Gladiator Ring": Barriers and Benefits of Live Coding in Classroom Settings
Abstract:
Live coding for teaching-synchronously writing software in front of students-can be an effective method for engaging students and instilling practical programming skills. However, not all settings are conducive to live coding and not all instructors are successful in this challenging task. We present results from a study involving university instructors, teaching assistants, and students identifying both barriers and benefits of live coding. Physical infrastructure, a positive classroom community with psychological safety, and opportunities for teacher development are practical considerations for live coding. In order for live coding to be an active learning experience, we recommend that tools support multiple mechanisms for engaging students, directing audience attention, and encouraging student-led live coding.

Authors:Jeba Rezwana, Corey Ford
Title: Improving User Experience with FAICO: Towards a Framework for AI Communication in Human-AI Co-Creativity
Abstract:
How AI communicates with humans is crucial for effective human-AI co-creation. However, many existing co-creative AI tools cannot communicate effectively, limiting their potential as collaborators. This paper introduces our initial design of a Framework for designing AI Communication (FAICO) for co-creative AI based on a systematic review of 107 full-length papers. FAICO presents key aspects of AI communication and their impacts on user experience to guide the design of effective AI communication. We then show actionable ways to translate our framework into two practical tools: design cards for designers and a configuration tool for users. The design cards enable designers to consider AI communication strategies that cater to a diverse range of users in co-creative contexts, while the configuration tool empowers users to customize AI communication based on their needs and creative workflows. This paper contributes new insights within the literature on human-AI co-creativity and Human-Computer Interaction, focusing on designing AI communication to enhance user experience.

Authors:Naimul Hoque, Zinat Ara, Safwat Ali Khan, Fanny Chevalier, Niklas Elmqvist
Title: Characterizing Creativity in Visualization Design
Abstract:
Understanding the role of creativity in visualization design becomes increasingly important as the field matures, particularly with the emergence of various visualization authoring and recommendation systems. In this paper, we examine how creativity manifests in visualization design processes and how academic research has conceptualized it over time. Through a systematic review of 58 visualization papers that use the terms "creativity" or "creative," we analyze the evolution of creative practices in visualization design. Our findings show that prior literature predominantly used atypical designs through free-form drawings, infographics, pictorials, and data comics to define creative representations. However, creativity in visualization design extends beyond visual representations to encompass early needfinding design activities such as sketching, storyboarding, discussion, and card sorting. Data visualization can also support a wide variety of creative tasks (e.g., fiction writing). We discuss the implications of these findings for fostering innovation within established design paradigms and for developing more sophisticated visualization authoring systems. The full list of coded papers are available here: https://vizcreativity.notion.site/coded-papers.

Authors:Abdulmalik Alluhidan, Jinkyung Katie Park, Mamtaj Akter, Rachel Rodgers, Afsaneh Razi, Pamela J. Wisniewski
Title: Unfiltered: How Teens Engage in Body Image and Shaming Discussions via Instagram Direct Messages (DMs)
Abstract:
We analyzed 1,596 sub-conversations within 451 direct message (DM) conversations from 67 teens (ages 13-17) who engaged in private discussions about body image on Instagram. Our findings show that teens often receive support when sharing struggles with negative body image, participate in criticism when engaging in body-shaming, and are met with appreciation when promoting positive body image. Additionally, these types of disclosures and responses varied based on whether the conversations were one-on-one or group-based. We found that sharing struggles and receiving support most often occurred in one-on-one conversations, while body shaming and negative interactions often occurred in group settings. A key insight of the study is that private social media settings can significantly influence how teens discuss and respond to body image. Based on these findings, we suggest design guidelines for social media platforms that could promote positive interactions around body image, ultimately creating a healthier and more supportive online environment for teens dealing with body image concerns.

Authors:Yinggan Xu, Hana Kimlee, Yijia Xiao, Di Luo
Title: Advancing AI-Scientist Understanding: Multi-Agent LLMs with Interpretable Physics Reasoning
Abstract:
Large Language Models (LLMs) are playing an increasingly important role in physics research by assisting with symbolic manipulation, numerical computation, and scientific reasoning. However, ensuring the reliability, transparency, and interpretability of their outputs remains a major challenge. In this work, we introduce a novel multi-agent LLM physicist framework that fosters collaboration between AI and human scientists through three key modules: a reasoning module, an interpretation module, and an AI-scientist interaction module. Recognizing that effective physics reasoning demands logical rigor, quantitative accuracy, and alignment with established theoretical models, we propose an interpretation module that employs a team of specialized LLM agents-including summarizers, model builders, visualization tools, and testers-to systematically structure LLM outputs into transparent, physically grounded science models. A case study demonstrates that our approach significantly improves interpretability, enables systematic validation, and enhances human-AI collaboration in physics problem-solving and discovery. Our work bridges free-form LLM reasoning with interpretable, executable models for scientific analysis, enabling more transparent and verifiable AI-augmented research.

Authors:Angela Lopez-Cardona, Parvin Emami, Sebastian Idesis, Saravanakumar Duraisamy, Luis A. Leiva, Ioannis Arapakis
Title: A Comparative Study of Scanpath Models in Graph-Based Visualization
Abstract:
Information Visualization (InfoVis) systems utilize visual representations to enhance data interpretation. Understanding how visual attention is allocated is essential for optimizing interface design. However, collecting Eye-tracking (ET) data presents challenges related to cost, privacy, and scalability. Computational models provide alternatives for predicting gaze patterns, thereby advancing InfoVis research. In our study, we conducted an ET experiment with 40 participants who analyzed graphs while responding to questions of varying complexity within the context of digital forensics. We compared human scanpaths with synthetic ones generated by models such as DeepGaze, UMSS, and Gazeformer. Our research evaluates the accuracy of these models and examines how question complexity and number of nodes influence performance. This work contributes to the development of predictive modeling in visual analytics, offering insights that can enhance the design and effectiveness of InfoVis systems.

Authors:Mamtaj Akter, Jinkyung Katie Park, Pamela J. Wisniewski
Title: Moving Beyond Parental Control toward Community-based Approaches to Adolescent Online Safety
Abstract:
In this position paper, we discuss the paradigm shift that moves away from parental mediation approaches toward collaborative approaches to promote adolescents' online safety. We present empirical studies that highlight the limitations of traditional parental control models and advocate for collaborative, community-driven solutions that prioritize teen empowerment. Specifically, we explore how extending oversight beyond the immediate family to include trusted community members can provide crucial support for teens in managing their online lives. We discuss the potential benefits and challenges of this expanded approach, emphasizing the importance of granular privacy controls and reciprocal support within these networks. Finally, we pose open questions for the research community to consider during the workshop, focusing on the design of "teen-centered" online safety solutions that foster autonomy, awareness, and self-regulation.

Authors:Mamtaj Akter, Jinkyung Katie Park, Campbell Headrick, Xinru Page, Pamela J. Wisniewski
Title: Calculating Connection vs. Risk: Understanding How Youth Negotiate Digital Privacy and Security with Peers Online
Abstract:
Youth, while tech-savvy and highly active on social media, are still vulnerable to online privacy and security risks. Therefore, it is critical to understand how they negotiate and manage social connections versus protecting themselves in online contexts. In this work, we conducted a thematic analysis of 1,318 private conversations on Instagram from 149 youth aged 13-21 to understand the digital privacy and security topics they discussed, if and how they engaged in risky privacy behaviors, and how they balanced the benefits and risks (i.e., privacy calculus) of making these decisions. Overall, youth were forthcoming when broaching a wide range of topics on digital privacy and security, ranging from password management and account access challenges to shared experiences of being victims of privacy risks. However, they also openly engaged in risky behaviors, such as sharing personal account information with peers and even perpetrating privacy and security risks against others. Nonetheless, we found many of these behaviors could be explained by the unique "privacy calculus" of youth, where they often prioritized social benefits over potential risks; for instance, youth often shared account credentials with peers to foster social connection and affirmation. As such, we provide a nuanced understanding of youth decision-making regarding digital security and privacy, highlighting both positive behaviors, tensions, and points of concern. We encourage future research to continue to challenge the potentially untrue narratives regarding youth and their digital privacy and security to unpack the nuance of their privacy calculus that may differ from that of adults.

Authors:Yu Fu, Dennis Bromley, Vidya Setlur
Title: DATAWEAVER: Authoring Data-Driven Narratives through the Integrated Composition of Visualization and Text
Abstract:
Data-driven storytelling has gained prominence in journalism and other data reporting fields. However, the process of creating these stories remains challenging, often requiring the integration of effective visualizations with compelling narratives to form a cohesive, interactive presentation. To help streamline this process, we present an integrated authoring framework and system, DataWeaver, that supports both visualization-to-text and text-to-visualization composition. DataWeaver enables users to create data narratives anchored to data facts derived from "call-out" interactions, i.e., user-initiated highlights of visualization elements that prompt relevant narrative content. In addition to this "vis-to-text" composition, DataWeaver also supports a "text-initiated" approach, generating relevant interactive visualizations from existing narratives. Key findings from an evaluation with 13 participants highlighted the utility and usability of DataWeaver and the effectiveness of its integrated authoring framework. The evaluation also revealed opportunities to enhance the framework by refining filtering mechanisms and visualization recommendations and better support authoring creativity by introducing advanced customization options.

Authors:Anna Bodonhelyi, Christian Stegemann-Philipps, Alessandra Sonanini, Lea Herschbach, Marton Szep, Anne Herrmann-Werner, Teresa Festl-Wietek, Enkelejda Kasneci, Friederike Holderried
Title: Modeling Challenging Patient Interactions: LLMs for Medical Communication Training
Abstract:
Effective patient communication is pivotal in healthcare, yet traditional medical training often lacks exposure to diverse, challenging interpersonal dynamics. To bridge this gap, this study proposes the use of Large Language Models (LLMs) to simulate authentic patient communication styles, specifically the "accuser" and "rationalizer" personas derived from the Satir model, while also ensuring multilingual applicability to accommodate diverse cultural contexts and enhance accessibility for medical professionals. Leveraging advanced prompt engineering, including behavioral prompts, author's notes, and stubbornness mechanisms, we developed virtual patients (VPs) that embody nuanced emotional and conversational traits. Medical professionals evaluated these VPs, rating their authenticity (accuser: $3.8 \pm 1.0$; rationalizer: $3.7 \pm 0.8$ on a 5-point Likert scale (from one to five)) and correctly identifying their styles. Emotion analysis revealed distinct profiles: the accuser exhibited pain, anger, and distress, while the rationalizer displayed contemplation and calmness, aligning with predefined, detailed patient description including medical history. Sentiment scores (on a scale from zero to nine) further validated these differences in the communication styles, with the accuser adopting negative ($3.1 \pm 0.6$) and the rationalizer more neutral ($4.0 \pm 0.4$) tone. These results underscore LLMs' capability to replicate complex communication styles, offering transformative potential for medical education. This approach equips trainees to navigate challenging clinical scenarios by providing realistic, adaptable patient interactions, enhancing empathy and diagnostic acumen. Our findings advocate for AI-driven tools as scalable, cost-effective solutions to cultivate nuanced communication skills, setting a foundation for future innovations in healthcare training.

Authors:Mohammad Shadman Hashem, Ahsan Raza, Sama E Shan, Seokhee Jeon
Title: Pneumatic Multi-mode Silicone Actuator with Pressure, Vibration, and Cold Thermal Feedback
Abstract:
A wide range of haptic feedback is crucial for achieving high realism and immersion in virtual environments. Therefore, a multi-modal haptic interface that provides various haptic signals simultaneously is highly beneficial. This paper introduces a novel silicone fingertip actuator that is pneumatically actuated, delivering a realistic and effective haptic experience by simultaneously providing pressure, vibrotactile, and cold thermal feedback. The actuator features a design with multiple air chambers, each with controllable volume achieved through pneumatic valves connected to compressed air tanks. The lower air chamber generates pressure feedback, while the upper chamber produces vibrotactile feedback. In addition, two integrated lateral air nozzles create a cold thermal sensation. To showcase the system's capabilities, we designed two unique 3D surfaces in the virtual environment: a frozen meat surface and an abrasive icy surface. These surfaces simulate tactile perceptions of coldness, pressure, and texture. Comprehensive performance assessments and user studies were conducted to validate the actuator's effectiveness, highlighting its diverse feedback capabilities compared to traditional actuators that offer only single feedback modalities.

Authors:Felix M. Schmitt-Koopmann, Elaine M. Huang, Hans-Peter Hutter, Alireza Darvishy
Title: Towards More Accessible Scientific PDFs for People with Visual Impairments: Step-by-Step PDF Remediation to Improve Tag Accuracy
Abstract:
PDF inaccessibility is an ongoing challenge that hinders individuals with visual impairments from reading and navigating PDFs using screen readers. This paper presents a step-by-step process for both novice and experienced users to create accessible PDF documents, including an approach for creating alternative text for mathematical formulas without expert knowledge. In a study involving nineteen participants, we evaluated our prototype PAVE 2.0 by comparing it against Adobe Acrobat Pro, the existing standard for remediating PDFs. Our study shows that experienced users improved their tagging scores from 42.0% to 80.1%, and novice users from 39.2% to 75.2% with PAVE 2.0. Overall, fifteen participants stated that they would prefer to use PAVE 2.0 in the future, and all participants would recommend it for novice users. Our work demonstrates PAVE 2.0's potential for increasing PDF accessibility for people with visual impairments and highlights remaining challenges.

Authors:Rie Kamikubo, Seita Kayukawa, Yuka Kaniwa, Allan Wang, Hernisa Kacorri, Hironobu Takagi, Chieko Asakawa
Title: Beyond Omakase: Designing Shared Control for Navigation Robots with Blind People
Abstract:
Autonomous navigation robots can increase the independence of blind people but often limit user control, following what is called in Japanese an "omakase" approach where decisions are left to the robot. This research investigates ways to enhance user control in social robot navigation, based on two studies conducted with blind participants. The first study, involving structured interviews (N=14), identified crowded spaces as key areas with significant social challenges. The second study (N=13) explored navigation tasks with an autonomous robot in these environments and identified design strategies across different modes of autonomy. Participants preferred an active role, termed the "boss" mode, where they managed crowd interactions, while the "monitor" mode helped them assess the environment, negotiate movements, and interact with the robot. These findings highlight the importance of shared control and user involvement for blind users, offering valuable insights for designing future social navigation robots.

Authors:Tawfiq Ammari, Anna Gutowska, Jacob Ziff, Casey Randazzo, Harihan Subramonyam
Title: From the CDC to emerging infectious disease publics: The long-now of polarizing and complex health crises
Abstract:
This study examines how public discourse around COVID-19 unfolded on Twitter through the lens of crisis communication and digital publics. Analyzing over 275,000 tweets involving the CDC, we identify 16 distinct discourse clusters shaped by framing, sentiment, credibility, and network dynamics. We find that CDC messaging became a flashpoint for affective and ideological polarization, with users aligning along competing frames of science vs. freedom, and public health vs. political overreach. Most clusters formed echo chambers, while a few enabled cross cutting dialogue. Publics emerged not only around ideology but also around topical and emotional stakes, reflecting shifting concerns across different stages of the pandemic. While marginalized communities raised consistent equity concerns, these narratives struggled to reshape broader discourse. Our findings highlight the importance of long-term, adaptive engagement with diverse publics and propose design interventions such as multi-agent AI assistants, to support more inclusive communication throughout extended public health crises.

Authors:Yifei Duan, Liuqingqing Yang, Tong Zhang, Zhijun Song, Fenghua Shao
Title: Automated UI Interface Generation via Diffusion Models: Enhancing Personalization and Efficiency
Abstract:
This study proposes a UI interface generation method based on a diffusion model, aiming to achieve high-quality, diversified, and personalized interface design through generative artificial intelligence technology. The diffusion model is based on its step-by-step denoising generation process. By combining the conditional generation mechanism, design optimization module, and user feedback mechanism, the model can generate a UI interface that meets the requirements based on multimodal inputs such as text descriptions and sketches provided by users. In the study, a complete experimental evaluation framework was designed, and mainstream generation models (such as GAN, VAE, DALL E, etc.) were selected for comparative experiments. The generation results were quantitatively analyzed from indicators such as PSNR, SSIM, and FID. The results show that the model proposed in this study is superior to other models in terms of generation quality and user satisfaction, especially in terms of logical clarity of information transmission and visual aesthetics. The ablation experiment further verifies the key role of conditional generation and design optimization modules in improving interface quality. This study provides a new technical path for UI design automation and lays the foundation for the intelligent and personalized development of human-computer interaction interfaces. In the future, the application potential of the model in virtual reality, game design, and other fields will be further explored.

Authors:Kai Nylund, Jennifer Mankoff, Venkatesh Potluri
Title: MatplotAlt: A Python Library for Adding Alt Text to Matplotlib Figures in Computational Notebooks
Abstract:
We present MatplotAlt, an open-source Python package for easily adding alternative text to Matplotlib figures. MatplotAlt equips Jupyter notebook authors to automatically generate and surface chart descriptions with a single line of code or command, and supports a range of options that allow users to customize the generation and display of captions based on their preferences and accessibility needs. Our evaluation indicates that MatplotAlt's heuristic and LLM-based methods to generate alt text can create accurate long-form descriptions of both simple univariate and complex Matplotlib figures. We find that state-of-the-art LLMs still struggle with factual errors when describing charts, and improve the accuracy of our descriptions by prompting GPT4-turbo with heuristic-based alt text or data tables parsed from the Matplotlib figure.

Authors:Mengyao Guo, Yu Nie, Jinda Han, Zongxing Li, Ze Gao
Title: CyanKitten: AI-Driven Markerless Motion Capture for Improved Elderly Well-Being
Abstract:
This paper introduces CyanKitten, an interactive virtual companion system tailored for elderly users, integrating advanced posture recognition, behavior recognition, and multimodal interaction capabilities. The system utilizes a three-tier architecture to process and interpret user movements and gestures, leveraging a dual-camera setup and a convolutional neural network trained explicitly on elderly movement patterns. The behavior recognition module identifies and responds to three key interactive gestures: greeting waves, petting motions, and heart-making gestures. A multimodal integration layer also combines visual and audio inputs to facilitate natural and intuitive interactions. This paper outlines the technical implementation of each component, addressing challenges such as elderly-specific movement characteristics, real-time processing demands, and environmental adaptability. The result is an engaging and accessible virtual interaction experience designed to enhance the quality of life for elderly users.

Authors:Huichen Will Wang, Kylie Lin, Andrew Cohen, Ryan Kennedy, Zach Zwald, Carolina Nobre, Cindy Xiong Bearfield
Title: Do You "Trust" This Visualization? An Inventory to Measure Trust in Visualizations
Abstract:
Trust plays a critical role in visual data communication and decision-making, yet existing visualization research employs varied trust measures, making it challenging to compare and synthesize findings across studies. In this work, we first took a bottom-up, data-driven approach to understand what visualization readers mean when they say they "trust" a visualization. We compiled and adapted a broad set of trust-related statements from existing inventories and collected responses on visualizations with varying degrees of trustworthiness. Through exploratory factor analysis, we derived an operational definition of trust in visualizations. Our findings indicate that people perceive a trustworthy visualization as one that presents credible information and is comprehensible and usable. Additionally, we found that general trust disposition influences how individuals assess visualization trustworthiness. Building on these insights, we developed a compact inventory consisting of statements that not only effectively represent each trust factor but also exhibit high item discrimination. We further validated our inventory through two trust games with real-world stakes, demonstrating that our measures reliably predict behavioral trust. Finally, we illustrate how this standardized inventory can be applied across diverse visualization research contexts. Utilizing our inventory, future research can examine how design choices, tasks, and domains influence trust, and how to foster appropriate trusting behavior in human-data interactions.

Authors:Casey Randazzo, Tawfiq Ammari
Title: Kintsugi-Inspired Design: Communicatively Reconstructing Identities Online After Trauma
Abstract:
Trauma can disrupt one's sense of self and mental well-being, leading survivors to reconstruct their identities in online communities. Drawing from 30 in-depth interviews, we present a sociotechnical process model that illustrates the mechanisms of online identity reconstruction and the pathways to integration. We introduce the concept of fractured identities, reflecting the enduring impact of trauma on one's self-concept.

Authors:Humza Nusrat, Bing Luo, Ryan Hall, Joshua Kim, Hassan Bagher-Ebadian, Anthony Doemer, Benjamin Movsas, Kundan Thind
Title: Autonomous Radiotherapy Treatment Planning Using DOLA: A Privacy-Preserving, LLM-Based Optimization Agent
Abstract:
Radiotherapy treatment planning is a complex and time-intensive process, often impacted by inter-planner variability and subjective decision-making. To address these challenges, we introduce Dose Optimization Language Agent (DOLA), an autonomous large language model (LLM)-based agent designed for optimizing radiotherapy treatment plans while rigorously protecting patient privacy. DOLA integrates the LLaMa3.1 LLM directly with a commercial treatment planning system, utilizing chain-of-thought prompting, retrieval-augmented generation (RAG), and reinforcement learning (RL). Operating entirely within secure local infrastructure, this agent eliminates external data sharing. We evaluated DOLA using a retrospective cohort of 18 prostate cancer patients prescribed 60 Gy in 20 fractions, comparing model sizes (8 billion vs. 70 billion parameters) and optimization strategies (No-RAG, RAG, and RAG+RL) over 10 planning iterations. The 70B model demonstrated significantly improved performance, achieving approximately 16.4% higher final scores than the 8B model. The RAG approach outperformed the No-RAG baseline by 19.8%, and incorporating RL accelerated convergence, highlighting the synergy of retrieval-based memory and reinforcement learning. Optimal temperature hyperparameter analysis identified 0.4 as providing the best balance between exploration and exploitation. This proof of concept study represents the first successful deployment of locally hosted LLM agents for autonomous optimization of treatment plans within a commercial radiotherapy planning system. By extending human-machine interaction through interpretable natural language reasoning, DOLA offers a scalable and privacy-conscious framework, with significant potential for clinical implementation and workflow improvement.

Authors:Meisam Jamshidi Seikavandi, Jostein Fimland, Maria Jung Barrett, Paolo Burelli
Title: Exploring the Temporal Dynamics of Facial Mimicry in Emotion Processing Using Action Units
Abstract:
Facial mimicry - the automatic, unconscious imitation of others' expressions - is vital for emotional understanding. This study investigates how mimicry differs across emotions using Face Action Units from videos and participants' responses. Dynamic Time Warping quantified the temporal alignment between participants' and stimuli's facial expressions, revealing significant emotional variations. Post-hoc tests indicated greater mimicry for 'Fear' than 'Happy' and reduced mimicry for 'Anger' compared to 'Fear'. The mimicry correlations with personality traits like Extraversion and Agreeableness were significant, showcasing subtle yet meaningful connections. These findings suggest specific emotions evoke stronger mimicry, with personality traits playing a secondary role in emotional alignment. Notably, our results highlight how personality-linked mimicry mechanisms extend beyond interpersonal communication to affective computing applications, such as remote human-human interactions and human-virtual-agent scenarios. Insights from temporal facial mimicry - e.g., designing digital agents that adaptively mirror user expressions - enable developers to create empathetic, personalized systems, enhancing emotional resonance and user engagement.

Authors:Isura Nirmal, Wen Hu, Mahbub Hassan, Elias Aboutanios, Abdelwahed Khamis
Title: Improving mmWave based Hand Hygiene Monitoring through Beam Steering and Combining Techniques
Abstract:
We introduce BeaMsteerX (BMX), a novel mmWave hand hygiene gesture recognition technique that improves accuracy in longer ranges (1.5m). BMX steers a mmWave beam towards multiple directions around the subject, generating multiple views of the gesture that are then intelligently combined using deep learning to enhance gesture classification. We evaluated BMX using off-the-shelf mmWave radars and collected a total of 7,200 hand hygiene gesture data from 10 subjects performing a six-step hand-rubbing procedure, as recommended by the World Health Organization, using sanitizer, at 1.5m -- over five times longer than in prior works. BMX outperforms state-of-the-art approaches by 31--43% and achieves 91% accuracy at boresight by combining only two beams, demonstrating superior gesture classification in low SNR scenarios. BMX maintained its effectiveness even when the subject was positioned 30 degrees away from the boresight, exhibiting a modest 5% drop in accuracy.

Authors:Meisam Jamshidi Seikavandi, Jostein Fimland, Maria Barrett, Paolo Burelli
Title: Modelling Emotions in Face-to-Face Setting: The Interplay of Eye-Tracking, Personality, and Temporal Dynamics
Abstract:
Accurate emotion recognition is pivotal for nuanced and engaging human-computer interactions, yet remains difficult to achieve, especially in dynamic, conversation-like settings. In this study, we showcase how integrating eye-tracking data, temporal dynamics, and personality traits can substantially enhance the detection of both perceived and felt emotions. Seventy-three participants viewed short, speech-containing videos from the CREMA-D dataset, while being recorded for eye-tracking signals (pupil size, fixation patterns), Big Five personality assessments, and self-reported emotional states. Our neural network models combined these diverse inputs including stimulus emotion labels for contextual cues and yielded marked performance gains compared to the state-of-the-art. Specifically, perceived valence predictions reached a macro F1-score of 0.76, and models incorporating personality traits and stimulus information demonstrated significant improvements in felt emotion accuracy. These results highlight the benefit of unifying physiological, individual and contextual factors to address the subjectivity and complexity of emotional expression. Beyond validating the role of user-specific data in capturing subtle internal states, our findings inform the design of future affective computing and human-agent systems, paving the way for more adaptive and cross-individual emotional intelligence in real-world interactions.

Authors:Yifan Wang, Cheng Jiang, Chenzhong Li
Title: A Review of Brain-Computer Interface Technologies: Signal Acquisition Methods and Interaction Paradigms
Abstract:
Brain-Computer Interface (BCI) technology facilitates direct communication between the human brain and external devices, representing a substantial advancement in human-machine interaction. This review provides an in-depth analysis of various BCI paradigms, including classic paradigms, current classifications, and hybrid paradigms, each with distinct characteristics and applications. Additionally, we explore a range of signal acquisition methods, classified into non-implantation, intervention, and implantation techniques, elaborating on their principles and recent advancements. By examining the interdependence between paradigms and signal acquisition technologies, this review offers a comprehensive perspective on how innovations in one domain propel progress in the other. The goal is to present insights into the future development of more efficient, user-friendly, and versatile BCI systems, emphasizing the synergy between paradigm design and signal acquisition techniques and their potential to transform the field.

Authors:Elizabeth Anne Watkins, Emanuel Moss, Ramesh Manuvinakurike, Meng Shi, Richard Beckwith, Giuseppe Raffa
Title: ACE, Action and Control via Explanations: A Proposal for LLMs to Provide Human-Centered Explainability for Multimodal AI Assistants
Abstract:
In this short paper we address issues related to building multimodal AI systems for human performance support in manufacturing domains. We make two contributions: we first identify challenges of participatory design and training of such systems, and secondly, to address such challenges, we propose the ACE paradigm: "Action and Control via Explanations". Specifically, we suggest that LLMs can be used to produce explanations in the form of human interpretable "semantic frames", which in turn enable end users to provide data the AI system needs to align its multimodal models and representations, including computer vision, automatic speech recognition, and document inputs. ACE, by using LLMs to "explain" using semantic frames, will help the human and the AI system to collaborate, together building a more accurate model of humans activities and behaviors, and ultimately more accurate predictive outputs for better task support, and better outcomes for human users performing manual tasks.

Authors:Adit Gupta, Jennifer Reddig, Tommaso Calo, Daniel Weitekamp, Christopher J. MacLellan
Title: Beyond Final Answers: Evaluating Large Language Models for Math Tutoring
Abstract:
Researchers have made notable progress in applying Large Language Models (LLMs) to solve math problems, as demonstrated through efforts like GSM8k, ProofNet, AlphaGeometry, and MathOdyssey. This progress has sparked interest in their potential use for tutoring students in mathematics. However, the reliability of LLMs in tutoring contexts -- where correctness and instructional quality are crucial -- remains underexplored. Moreover, LLM problem-solving capabilities may not necessarily translate into effective tutoring support for students. In this work, we present two novel approaches to evaluate the correctness and quality of LLMs in math tutoring contexts. The first approach uses an intelligent tutoring system for college algebra as a testbed to assess LLM problem-solving capabilities. We generate benchmark problems using the tutor, prompt a diverse set of LLMs to solve them, and compare the solutions to those generated by the tutor. The second approach evaluates LLM as tutors rather than problem solvers. We employ human evaluators, who act as students seeking tutoring support from each LLM. We then assess the quality and correctness of the support provided by the LLMs via a qualitative coding process. We applied these methods to evaluate several ChatGPT models, including 3.5 Turbo, 4, 4o, o1-mini, and o1-preview. Our findings show that when used as problem solvers, LLMs generate correct final answers for 85.5% of the college algebra problems tested. When employed interactively as tutors, 90% of LLM dialogues show high-quality instructional support; however, many contain errors -- only 56.6% are entirely correct. We conclude that, despite their potential, LLMs are not yet suitable as intelligent tutors for math without human oversight or additional mechanisms to ensure correctness and quality.

Authors:Anna Kleinau, Bernhard Preim, Monique Meuschke
Title: FINCH: Locally Visualizing Higher-Order Feature Interactions in Black Box Models
Abstract:
In an era where black-box AI models are integral to decision-making across industries, robust methods for explaining these models are more critical than ever. While these models leverage complex feature interplay for accurate predictions, most explanation methods only assign relevance to individual features. There is a research gap in methods that effectively illustrate interactions between features, especially in visualizing higher-order interactions involving multiple features, which challenge conventional representation methods. To address this challenge in local explanations focused on individual instances, we employ a visual, subset-based approach to reveal relevant feature interactions. Our visual analytics tool FINCH uses coloring and highlighting techniques to create intuitive, human-centered visualizations, and provides additional views that enable users to calibrate their trust in the model and explanations. We demonstrate FINCH in multiple case studies, demonstrating its generalizability, and conducted an extensive human study with machine learning experts to highlight its helpfulness and usability. With this approach, FINCH allows users to visualize feature interactions involving any number of features locally.

Authors:Zixuan Guo, Yuekai Shi, Tiantian Ye, Tingjie Wan, Hai-Ning Liang
Title: No More Head-Turning: Exploring Passthrough Techniques for Addressing Rear Interruptions from the Front in VR
Abstract:
Virtual reality (VR) users often encounter interruptions, posing challenges to maintaining real-world awareness during immersive experiences. The Passthrough feature in VR headsets allows users to view their physical surroundings without removing the headset. However, when interruptions come from the rear, users need to turn their heads to see the real world, which can lead to negative experiences in VR. Study 1, conducted through semi-structured interviews involving 13 participants, found that users are less likely to use Passthrough for rear interruptions due to large head-turning movements, which cause inconvenience, increase the risk of motion sickness, and reduce the experience. Building on these findings, we introduced three Passthrough techniques in Study 2 for displaying the rear view in front of the user: Full Rear Passthrough + Pause (FRPP), Rear Passthrough Window (RPW), and Rear Passthrough AR (RPAR). Compared to the Baseline method that requires head-turning, all three systems reduced physical and temporal demands, alleviated disorientation caused by motion sickness, and provided a better user experience for managing rear interruptions. Among these, FRPP and RPAR were the most preferred. These findings provide valuable insights for future VR design, emphasizing the need for solutions that effectively manage rear interruptions while maintaining user comfort and experience.

Authors:Andy Gray, Alma Rahat, Stephen Lindsay, Jen Pearson, Tom Crick
Title: Rendering Transparency to Ranking in Educational Assessment via Bayesian Comparative Judgement
Abstract:
Ensuring transparency in educational assessment is increasingly critical, particularly post-pandemic, as demand grows for fairer and more reliable evaluation methods. Comparative Judgement (CJ) offers a promising alternative to traditional assessments, yet concerns remain about its perceived opacity. This paper examines how Bayesian Comparative Judgement (BCJ) enhances transparency by integrating prior information into the judgement process, providing a structured, data-driven approach that improves interpretability and accountability. BCJ assigns probabilities to judgement outcomes, offering quantifiable measures of uncertainty and deeper insights into decision confidence. By systematically tracking how prior data and successive judgements inform final rankings, BCJ clarifies the assessment process and helps identify assessor disagreements. Multi-criteria BCJ extends this by evaluating multiple learning outcomes (LOs) independently, preserving the richness of CJ while producing transparent, granular rankings aligned with specific assessment goals. It also enables a holistic ranking derived from individual LOs, ensuring comprehensive evaluations without compromising detailed feedback. Using a real higher education dataset with professional markers in the UK, we demonstrate BCJ's quantitative rigour and ability to clarify ranking rationales. Through qualitative analysis and discussions with experienced CJ practitioners, we explore its effectiveness in contexts where transparency is crucial, such as high-stakes national assessments. We highlight the benefits and limitations of BCJ, offering insights into its real-world application across various educational settings.

Authors:Matthew Wilchek, Linhan Wang, Sally Dickinson, Erica Feuerbacher, Kurt Luther, Feras A. Batarseh
Title: KHAIT: K-9 Handler Artificial Intelligence Teaming for Collaborative Sensemaking
Abstract:
In urban search and rescue (USAR) operations, communication between handlers and specially trained canines is crucial but often complicated by challenging environments and the specific behaviors canines are trained to exhibit when detecting a person. Since a USAR canine often works out of sight of the handler, the handler lacks awareness of the canine's location and situation, known as the 'sensemaking gap.' In this paper, we propose KHAIT, a novel approach to close the sensemaking gap and enhance USAR effectiveness by integrating object detection-based Artificial Intelligence (AI) and Augmented Reality (AR). Equipped with AI-powered cameras, edge computing, and AR headsets, KHAIT enables precise and rapid object detection from a canine's perspective, improving survivor localization. We evaluate this approach in a real-world USAR environment, demonstrating an average survival allocation time decrease of 22%, enhancing the speed and accuracy of operations.

Authors:Cheng Tang, Chao Tang, Steven Gong, Thomas M. Kwok, Yue Hu
Title: Robot Character Generation and Adaptive Human-Robot Interaction with Personality Shaping
Abstract:
We present a novel framework for designing emotionally agile robots with dynamic personalities and memory-based learning, with the aim of performing adaptive and non-deterministic interactions with humans while conforming to shared social understanding. While existing work has largely focused on emotion recognition and static response systems, many approaches rely on sentiment analysis and action mapping frameworks that are pre-defined with limited dimensionality and fixed configurations, lacking the flexibility of dynamic personality traits and memory-enabled adaptation. Other systems are often restricted to limited modes of expression and fail to develop a causal relationship between human behavior and the robot's proactive physical actions, resulting in constrained adaptability and reduced responsiveness in complex, dynamic interactions. Our methodology integrates the Big Five Personality Traits, Appraisal Theory, and abstracted memory layers through Large Language Models (LLMs). The LLM generates a parameterized robot personality based on the Big Five, processes human language and sentiments, evaluates human behavior using Appraisal Theory, and generates emotions and selects appropriate actions adapted by historical context over time. We validated the framework by testing three robots with distinct personalities in identical background contexts and found that personality, appraisal, and memory influence the adaptability of human-robot interactions. The impact of the individual components was further validated through ablation tests. We conclude that this system enables robots to engage in meaningful and personalized interactions with users, and holds significant potential for applications in domains such as pet robots, assistive robots, educational robots, and collaborative functional robots, where cultivating tailored relationships and enriching user experiences are essential.

Authors:Nai Yang, Yijie Wang, Fan Wu, Zhiwei Wei
Title: MapColorAI: Designing Contextually Relevant Choropleth Map Color Schemes Using a Large Language Model
Abstract:
Choropleth maps, which utilize color schemes to visualize spatial patterns and trends, are simple yet effective tools for geographic data analysis. As such, color scheme design is a critical aspect of choropleth map creation. The traditional coloring methods offered by GIS tools such as ArcGIS and QGIS are not user-friendly for non-professionals. On the one hand, these tools provide numerous color schemes, making it hard to decide which one best matches the theme. On the other hand, it is difficult to fulfill some ambiguous and personalized coloring needs of users, such as requests for 'summer-like' map colors. To address these shortcomings, we develop a novel system that leverages a large language model and map color design principles to generate contextually relevant and user-aligned choropleth map color schemes. The system follows a three-stage process: Data processing, which provides an overview of the data and classifies the data into meaningful classes; Color Concept Design, where the color theme and color mode are conceptualized based on data characteristics and user intentions; and Color Scheme Design, where specific colors are assigned to classes based on generated color theme, color mode, and user requirements. Our system incorporates an interactive interface, providing necessary visualization for choropleth map color design and allowing users to customize and refine color choices flexibly. Through user studies and evaluations, the system demonstrates acceptable usability, accuracy, and flexibility, with users highlighting the tool's efficiency and ease of use.

Authors:Korbinian Kuhn, Verena Kersken, Gottfried Zimmermann
Title: Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces
Abstract:
Despite advances in Automatic Speech Recognition (ASR), transcription errors persist and require manual correction. Confidence scores, which indicate the certainty of ASR results, could assist users in identifying and correcting errors. This study evaluates the reliability of confidence scores for error detection through a comprehensive analysis of end-to-end ASR models and a user study with 36 participants. The results show that while confidence scores correlate with transcription accuracy, their error detection performance is limited. Classifiers frequently miss errors or generate many false positives, undermining their practical utility. Confidence-based error detection neither improved correction efficiency nor was perceived as helpful by participants. These findings highlight the limitations of confidence scores and the need for more sophisticated approaches to improve user interaction and explainability of ASR results.

Authors:Korbinian Kuhn, Verena Kersken, Gottfried Zimmermann
Title: Communication Access Real-Time Translation Through Collaborative Correction of Automatic Speech Recognition
Abstract:
Communication access real-time translation (CART) is an essential accessibility service for d/Deaf and hard of hearing (DHH) individuals, but the cost and scarcity of trained personnel limit its availability. While Automatic Speech Recognition (ASR) offers a cheap and scalable alternative, transcription errors can lead to serious accessibility issues. Real-time correction of ASR by non-professionals presents an under-explored CART workflow that addresses these limitations. We conducted a user study with 75 participants to evaluate the feasibility and efficiency of this workflow. Complementary, we held focus groups with 25 DHH individuals to identify acceptable accuracy levels and factors affecting the accessibility of real-time captioning. Results suggest that collaborative editing can improve transcription accuracy to the extent that DHH users rate it positively regarding understandability. Focus groups also showed that human effort to improve captioning is highly valued, supporting a semi-automated approach as an alternative to stand-alone ASR and traditional CART services.

Authors:Krzysztof Zielinski, Slawomir Tadeja, Bruce Blumberg, Mikkel Baun Kjærgaard
Title: Using Mobile AR for Rapid Feasibility Analysis for Deployment of Robots: A Usability Study with Non-Expert Users
Abstract:
Automating a production line with robotic arms is a complex, demanding task that requires not only substantial resources but also a deep understanding of the automated processes and available technologies and tools. Expert integrators must consider factors such as placement, payload, and robot reach requirements to determine the feasibility of automation. Ideally, such considerations are based on a detailed digital simulation developed before any hardware is deployed. However, this process is often time-consuming and challenging. To simplify these processes, we introduce a much simpler method for the feasibility analysis of robotic arms' reachability, designed for non-experts. We implement this method through a mobile, sensing-based prototype tool. The two-step experimental evaluation included the expert user study results, which helped us identify the difficulty levels of various deployment scenarios and refine the initial prototype. The results of the subsequent quantitative study with 22 non-expert participants utilizing both scenarios indicate that users could complete both simple and complex feasibility analyses in under ten minutes, exhibiting similar cognitive loads and high engagement. Overall, the results suggest that the tool was well-received and rated as highly usable, thereby showing a new path for changing the ease of feasibility analysis for automation.

Authors:Jennifer Mankoff, Janice Light, James Coughlan, Christian Vogler, Abraham Glasser, Gregg Vanderheiden, Laura Rice
Title: Accessibility Considerations in the Development of an AI Action Plan
Abstract:
We argue that there is a need for Accessibility to be represented in several important domains: - Capitalize on the new capabilities AI provides - Support for open source development of AI, which can allow disabled and disability focused professionals to contribute, including - Development of Accessibility Apps which help realise the promise of AI in accessibility domains - Open Source Model Development and Validation to ensure that accessibility concerns are addressed in these algorithms - Data Augmentation to include accessibility in data sets used to train models - Accessible Interfaces that allow disabled people to use any AI app, and to validate its outputs - Dedicated Functionality and Libraries that can make it easy to integrate AI support into a variety of settings and apps. - Data security and privacy and privacy risks including data collected by AI based accessibility technologies; and the possibility of disability disclosure. - Disability-specific AI risks and biases including both direct bias (during AI use by the disabled person) and indirect bias (when AI is used by someone else on data relating to a disabled person).

Authors:Edward Hong Wang, Cynthia Xin Wen
Title: When neural implant meets multimodal LLM: A dual-loop system for neuromodulation and naturalistic neuralbehavioral research
Abstract:
We propose a novel dual-loop system that synergistically combines responsive neurostimulation (RNS) implants with artificial intelligence-driven wearable devices for treating post-traumatic stress disorder (PTSD) and enabling naturalistic brain research. In PTSD Therapy Mode, an implanted closed-loop neural device monitors amygdala activity and provides on-demand stimulation upon detecting pathological theta oscillations, while an ensemble of wearables (smart glasses, smartwatches, smartphones) uses multimodal large language model (LLM) analysis of sensory data to detect environmental or physiological PTSD triggers and deliver timely audiovisual interventions. Logged events from both the neural and wearable loops are analyzed to personalize trigger detection and progressively transition patients to non-invasive interventions. In Neuroscience Research Mode, the same platform is adapted for real-world brain activity capture. Wearable-LLM systems recognize naturalistic events (social interactions, emotional situations, compulsive behaviors, decision making) and signal implanted RNS devices (via wireless triggers) to record synchronized intracranial data during these moments. This approach builds on recent advances in mobile intracranial EEG recording and closed-loop neuromodulation in humans (BRAIN Initiative, 2023) (Mobbs et al., 2021). We discuss how our interdisciplinary system could revolutionize PTSD therapy and cognitive neuroscience by enabling 24/7 monitoring, context-aware intervention, and rich data collection outside traditional labs. The vision is a future where AI-enhanced devices continuously collaborate with the human brain, offering therapeutic support and deep insights into neural function, with the resulting real-world context rich neural data, in turn, accelerating the development of more biologically-grounded and human-centric AI.

Authors:Ruanqianqian Huang, Savitha Ravi, Michael He, Boyu Tian, Sorin Lerner, Michael Coblenz
Title: How Scientists Use Jupyter Notebooks: Goals, Quality Attributes, and Opportunities
Abstract:
Computational notebooks are intended to prioritize the needs of scientists, but little is known about how scientists interact with notebooks, what requirements drive scientists' software development processes, or what tactics scientists use to meet their requirements. We conducted an observational study of 20 scientists using Jupyter notebooks for their day-to-day tasks, finding that scientists prioritize different quality attributes depending on their goals. A qualitative analysis of their usage shows (1) a collection of goals scientists pursue with Jupyter notebooks, (2) a set of quality attributes that scientists value when they write software, and (3) tactics that scientists leverage to promote quality. In addition, we identify ways scientists incorporated AI tools into their notebook work. From our observations, we derive design recommendations for improving computational notebooks and future programming systems for scientists. Key opportunities pertain to helping scientists create and manage state, dependencies, and abstractions in their software, enabling more effective reuse of clearly-defined components.

Authors:Viktorija Paneva, Verena Winterhalter, Naga Sai Surya Vamsy Malladi, Marvin Strauss, Stefan Schneegass, Florian Alt
Title: Usable Privacy in Virtual Worlds: Design Implications for Data Collection Awareness and Control Interfaces in Virtual Reality
Abstract:
Extended reality (XR) devices have become ubiquitous. They are equipped with arrays of sensors, collecting extensive user and environmental data, allowing inferences about sensitive user information users may not realize they are sharing. Current VR privacy notices largely replicate mechanisms from 2D interfaces, failing to leverage the unique affordances of virtual 3D environments. To address this, we conducted brainstorming and sketching sessions with novice game developers and designers, followed by privacy expert evaluations, to explore and refine privacy interfaces tailored for VR. Key challenges include balancing user engagement with privacy awareness, managing complex privacy information with user comprehension, and maintaining compliance and trust. We identify design implications such as thoughtful gamification, explicit and purpose-tied consent mechanisms, and granular, modifiable privacy control options. Our findings provide actionable guidance to researchers and practitioners for developing privacy-aware and user-friendly VR experiences.

Authors:Jordan Taylor, Joel Mire, Franchesca Spektor, Alicia DeVrio, Maarten Sap, Haiyi Zhu, Sarah Fox
Title: Un-Straightening Generative AI: How Queer Artists Surface and Challenge the Normativity of Generative AI Models
Abstract:
Queer people are often discussed as targets of bias, harm, or discrimination in research on generative AI. However, the specific ways that queer people engage with generative AI, and thus possible uses that support queer people, have yet to be explored. We conducted a workshop study with 13 queer artists, during which we gave participants access to GPT-4 and DALL-E 3 and facilitated group sensemaking activities. We found our participants struggled to use these models due to various normative values embedded in their designs, such as hyper-positivity and anti-sexuality. We describe various strategies our participants developed to overcome these models' limitations and how, nevertheless, our participants found value in these highly-normative technologies. Drawing on queer feminist theory, we discuss implications for the conceptualization of "state-of-the-art" models and consider how FAccT researchers might support queer alternatives.

Authors:Mariana Fernandez-Espinosa, Diego Gomez-Zara
Title: Augmenting Teamwork through AI Agents as Spatial Collaborators
Abstract:
As Augmented Reality (AR) and Artificial Intelligence (AI) continue to converge, new opportunities emerge for AI agents to actively support human collaboration in immersive environments. While prior research has primarily focused on dyadic human-AI interactions, less attention has been given to Human-AI Teams (HATs) in AR, where AI acts as an adaptive teammate rather than a static tool. This position paper takes the perspective of team dynamics and work organization to propose that AI agents in AR should not only interact with individuals but also recognize and respond to team-level needs in real time. We argue that spatially aware AI agents should dynamically generate the resources necessary for effective collaboration, such as virtual blackboards for brainstorming, mental map models for shared understanding, and memory recall of spatial configurations to enhance knowledge retention and task coordination. This approach moves beyond predefined AI assistance toward context-driven AI interventions that optimize team performance and decision-making.

Authors:Ruofei Ma, Yu Zhao, Yuheng Shao, Yunjie Yao, Quan Li
Title: StratIncon Detector: Analyzing Strategy Inconsistencies Between Real-Time Strategy and Preferred Professional Strategy in MOBA Esports
Abstract:
MOBA (Multiplayer Online Battle Arena) games require a delicate interplay of strategic planning and real-time decision-making, particularly in professional esports, where players exhibit varying levels of skill and strategic insight. While team strategies have been widely studied, analyzing inconsistencies in professional matches remains a significant challenge. The complexity lies in defining and quantifying the difference between real-time and preferred professional strategies, as well as understanding the disparities between them. Establishing direct causal links between specific strategic decisions and game outcomes also demands a comprehensive analysis of the entire match progression. To tackle these challenges, we present the StratIncon Detector, a visual analytics system designed to assist professional players and coaches in efficiently identifying strategic inconsistencies. The system detects real-time strategies, predicts preferred professional strategies, extracts relevant human factors, and uncovers their impact on subsequent game phases. Findings from a case study, a user study with 24 participants, and expert interviews suggest that, compared to traditional methods, the StratIncon Detector enables users to more comprehensively and efficiently identify inconsistencies, infer their causes, evaluate their effects on subsequent game outcomes, and gain deeper insights into team collaboration-ultimately enhancing future teamwork.

Authors:Rune M. Jacobsen, Samuel Rhys Cox, Carla F. Griggio, Niels van Berkel
Title: Chatbots for Data Collection in Surveys: A Comparison of Four Theory-Based Interview Probes
Abstract:
Surveys are a widespread method for collecting data at scale, but their rigid structure often limits the depth of qualitative insights obtained. While interviews naturally yield richer responses, they are challenging to conduct across diverse locations and large participant pools. To partially bridge this gap, we investigate the potential of using LLM-based chatbots to support qualitative data collection through interview probes embedded in surveys. We assess four theory-based interview probes: descriptive, idiographic, clarifying, and explanatory. Through a split-plot study design (N=64), we compare the probes' impact on response quality and user experience across three key stages of HCI research: exploration, requirements gathering, and evaluation. Our results show that probes facilitate the collection of high-quality survey data, with specific probes proving effective at different research stages. We contribute practical and methodological implications for using chatbots as research tools to enrich qualitative data collection.

Authors:Zhaohui Liang, Yonglin Chen, Naser Al Madi, Can Liu
Title: Desirable Unfamiliarity: Insights from Eye Movements on Engagement and Readability of Dictation Interfaces
Abstract:
Dictation interfaces support efficient text input, but the transcribed text can be hard to read. To understand how users read and review dictated text, we conducted a controlled eye-tracking experiment with 20 participants to compare five dictation interfaces: PLAIN (real-time transcription), AOC (periodic corrections), RAKE (keyword highlights), GP-TSM (grammar-preserving highlights), and SUMMARY (LLM-generated abstraction summary). The study analyzed participants' gaze patterns during their speech composition and reviewing processes. The findings show that during composition, participants spent only 7--11% of their time actively reading, and they favored real-time feedback and avoided distracting interface changes. During reviewing, although SUMMARY introduced unfamiliar words (requiring longer and more frequent fixation), they were easier to read (requiring fewer regressions). Participants preferred SUMMARY for the polished text that preserved fidelity to original meanings. RAKE guided the reading of self-produced text better than GP-TSM. RAKE guides the reading of self-produced text better than GP-TSM. These surprising findings suggest that dictation interfaces could consider showing summaries or key information to support recall instead of raw transcripts.

Authors:Casey Randazzo, Minkyung Kim, Melanie Kwestel, Marya L Doerfel, Tawfiq Ammari
Title: "We're losing our neighborhoods. We're losing our community": A comparative analysis of community discourse in online and offline public spheres
Abstract:
Recovering from crises, such as hurricanes or wildfires, is a complex process that can take weeks, months, or even decades to overcome. Crises have both acute (immediate) and chronic (long-term) effects on communities. Crisis informatics research often focuses on the immediate response phase of disasters, thereby overlooking the long-term recovery phase, which is critical for understanding the information needs of users undergoing challenges like climate gentrification and housing inequity. We fill this gap by investigating community discourse over eight months following Hurricane Ida in an online neighborhood Facebook group and Town Hall Meetings of a borough in the New York Metropolitan region. Using a mixed methods approach, we examined the use of social media to manage long-term disaster recovery. The findings revealed a significant overlap in topics, underscoring the interconnected nature of online and offline community discourse, and illuminated themes related to the long-term consequences of disasters. We conclude with recommendations aimed at helping designers and government leaders enhance participation across community forums and support recovery in the aftermath of disasters.

Authors:Prarthana Bhattacharyya, Joshua Mitton, Ryan Page, Owen Morgan, Oliver Powell, Benjamin Menzies, Gabriel Homewood, Kemi Jacobs, Paolo Baesso, Taru Muhonen, Richard Vigars, Louis Berridge
Title: Helios 2.0: A Robust, Ultra-Low Power Gesture Recognition System Optimised for Event-Sensor based Wearables
Abstract:
We present an advance in wearable technology: a mobile-optimized, real-time, ultra-low-power event camera system that enables natural hand gesture control for smart glasses, dramatically improving user experience. While hand gesture recognition in computer vision has advanced significantly, critical challenges remain in creating systems that are intuitive, adaptable across diverse users and environments, and energy-efficient enough for practical wearable applications. Our approach tackles these challenges through carefully selected microgestures: lateral thumb swipes across the index finger (in both directions) and a double pinch between thumb and index fingertips. These human-centered interactions leverage natural hand movements, ensuring intuitive usability without requiring users to learn complex command sequences. To overcome variability in users and environments, we developed a novel simulation methodology that enables comprehensive domain sampling without extensive real-world data collection. Our power-optimised architecture maintains exceptional performance, achieving F1 scores above 80\% on benchmark datasets featuring diverse users and environments. The resulting models operate at just 6-8 mW when exploiting the Qualcomm Snapdragon Hexagon DSP, with our 2-channel implementation exceeding 70\% F1 accuracy and our 6-channel model surpassing 80\% F1 accuracy across all gesture classes in user studies. These results were achieved using only synthetic training data. This improves on the state-of-the-art for F1 accuracy by 20\% with a power reduction 25x when using DSP. This advancement brings deploying ultra-low-power vision systems in wearable devices closer and opens new possibilities for seamless human-computer interaction.

Authors:Bryan Min, Allen Chen, Yining Cao, Haijun Xia
Title: Malleable Overview-Detail Interfaces
Abstract:
The overview-detail design pattern, characterized by an overview of multiple items and a detailed view of a selected item, is ubiquitously implemented across software interfaces. Designers often try to account for all users, but ultimately these interfaces settle on a single form. For instance, an overview map may display hotel prices but omit other user-desired attributes. This research instead explores the malleable overview-detail interface, one that end-users can customize to address individual needs. Our content analysis of overview-detail interfaces uncovered three dimensions of variation: content, composition, and layout, enabling us to develop customization techniques along these dimensions. For content, we developed Fluid Attributes, a set of techniques enabling users to show and hide attributes between views and leverage AI to manipulate, reformat, and generate new attributes. For composition and layout, we provided solutions to compose multiple overviews and detail views and transform between various overview and overview-detail layouts. A user study on our techniques implemented in two design probes revealed that participants produced diverse customizations and unique usage patterns, highlighting the need and broad applicability for malleable overview-detail interfaces.

Authors:Ramtin Tabatabaei, Vassilis Kostakos, Wafa Johal
Title: Real-Time Detection of Robot Failures Using Gaze Dynamics in Collaborative Tasks
Abstract:
Detecting robot failures during collaborative tasks is crucial for maintaining trust in human-robot interactions. This study investigates user gaze behaviour as an indicator of robot failures, utilising machine learning models to distinguish between non-failure and two types of failures: executional and decisional. Eye-tracking data were collected from 26 participants collaborating with a robot on Tangram puzzle-solving tasks. Gaze metrics, such as average gaze shift rates and the probability of gazing at specific areas of interest, were used to train machine learning classifiers, including Random Forest, AdaBoost, XGBoost, SVM, and CatBoost. The results show that Random Forest achieved 90% accuracy for detecting executional failures and 80% for decisional failures using the first 5 seconds of failure data. Real-time failure detection was evaluated by segmenting gaze data into intervals of 3, 5, and 10 seconds. These findings highlight the potential of gaze dynamics for real-time error detection in human-robot collaboration.

Authors:Dünya Baradari, Nataliya Kosmyna, Oscar Petrov, Rebecah Kaplun, Pattie Maes
Title: NeuroChat: A Neuroadaptive AI Chatbot for Customizing Learning Experiences
Abstract:
Generative AI is transforming education by enabling personalized, on-demand learning experiences. However, AI tutors lack the ability to assess a learner's cognitive state in real time, limiting their adaptability. Meanwhile, electroencephalography (EEG)-based neuroadaptive systems have successfully enhanced engagement by dynamically adjusting learning content. This paper presents NeuroChat, a proof-of-concept neuroadaptive AI tutor that integrates real-time EEG-based engagement tracking with generative AI. NeuroChat continuously monitors a learner's cognitive engagement and dynamically adjusts content complexity, response style, and pacing using a closed-loop system. We evaluate this approach in a pilot study (n=24), comparing NeuroChat to a standard LLM-based chatbot. Results indicate that NeuroChat enhances cognitive and subjective engagement but does not show an immediate effect on learning outcomes. These findings demonstrate the feasibility of real-time cognitive feedback in LLMs, highlighting new directions for adaptive learning, AI tutoring, and human-AI interaction.

Authors:Omar Khan, JooYoung Seo
Title: "Sighted People Have Their Pick Of The Litter": Unpacking The Need For Digital Mental Health (DMH) Tracking Services With And For The Blind Community
Abstract:
The proliferation of digital mental health (DMH) tracking services promises personalized support, yet accessibility barriers limit equal access. This study investigates blind community experiences with DMH tracking services across the United States as a step toward inclusive health technology design. Working with blind advocacy organizations, we distributed a cross-sectional observational survey (n = 93) and analyzed open-ended responses using Norman and Skinner's eHealth Literacy framework. Our findings reveal significant challenges in navigation, content interpretation, and overall user experience, which impede the blind community's effective engagement with DMH tools. Results highlight the need for adaptive interfaces, accessible tracking strategies, and voice-guided interactions. These insights inform design recommendations for developers and policymakers, promoting more inclusive mental health technologies. By prioritizing accessibility, we make forward progress in ensuring that DMH tracking services fulfill their potential to support mental well-being across diverse user groups, fostering digital equality in mental health care.

Authors:Steven W. Su, Yaqi Li, Kairui Guo, Rob Duffield
Title: Human Machine Co-Adaptation Model and Its Convergence Analysis
Abstract:
The key to robot-assisted rehabilitation lies in the design of the human-machine interface, which must accommodate the needs of both patients and machines. Current interface designs primarily focus on machine control algorithms, often requiring patients to spend considerable time adapting. In this paper, we introduce a novel approach based on the Cooperative Adaptive Markov Decision Process (CAMDPs) model to address the fundamental aspects of the interactive learning process, offering theoretical insights and practical guidance. We establish sufficient conditions for the convergence of CAMDPs and ensure the uniqueness of Nash equilibrium points. Leveraging these conditions, we guarantee the system's convergence to a unique Nash equilibrium point. Furthermore, we explore scenarios with multiple Nash equilibrium points, devising strategies to adjust both Value Evaluation and Policy Improvement algorithms to enhance the likelihood of converging to the global minimal Nash equilibrium point. Through numerical experiments, we illustrate the effectiveness of the proposed conditions and algorithms, demonstrating their applicability and robustness in practical settings. The proposed conditions for convergence and the identification of a unique optimal Nash equilibrium contribute to the development of more effective adaptive systems for human users in robot-assisted rehabilitation.

Authors:Xin Wang, Stephanie Tulk Jesso, Sadamori Kojaku, David M Neyens, Min Sun Kim
Title: VizTrust: A Visual Analytics Tool for Capturing User Trust Dynamics in Human-AI Communication
Abstract:
Trust plays a fundamental role in shaping the willingness of users to engage and collaborate with artificial intelligence (AI) systems. Yet, measuring user trust remains challenging due to its complex and dynamic nature. While traditional survey methods provide trust levels for long conversations, they fail to capture its dynamic evolution during ongoing interactions. Here, we present VizTrust, which addresses this challenge by introducing a real-time visual analytics tool that leverages a multi-agent collaboration system to capture and analyze user trust dynamics in human-agent communication. Built on established human-computer trust scales-competence, integrity, benevolence, and predictability-, VizTrust enables stakeholders to observe trust formation as it happens, identify patterns in trust development, and pinpoint specific interaction elements that influence trust. Our tool offers actionable insights into human-agent trust formation and evolution in real time through a dashboard, supporting the design of adaptive conversational agents that responds effectively to user trust signals.

Authors:Shangxuan Wu, Wendi Luan, Yong Wang, Dan Zeng, Qiaomu Shen, Bo Tang
Title: Data Insights as Data: Quick Overview and Exploration of Automated Data Insights
Abstract:
Automated data insight mining and visualization have been widely used in various business intelligence applications (e.g., market analysis and product promotion). However, automated insight mining techniques often output the same mining results to different analysts without considering their personal preferences, while interactive insight discovery requires significant manual effort. This paper fills the gap by integrating automated insight mining with interactive data visualization and striking a proper balance between them to facilitate insight discovery and exploration. Specifically, we regard data insights as a special type of data and further present InsightMap, a novel visualization approach that uses the map metaphor to provide a quick overview and in-depth exploration of different data insights, where a metric is proposed to measure the similarity between different insights. The effectiveness and usability of InsightMap are demonstrated through extensive case studies and in-depth user interviews.

Authors:Devin Murphy, Yichen Li, Crystal Owens, Layla Stanton, Young Joong Lee, Paul Pu Liang, Yiyue Luo, Antonio Torralba, Wojciech Matusik
Title: Fits like a Flex-Glove: Automatic Design of Personalized FPCB-Based Tactile Sensing Gloves
Abstract:
Resistive tactile sensing gloves have captured the interest of researchers spanning diverse domains, such as robotics, healthcare, and human-computer interaction. However, existing fabrication methods often require labor-intensive assembly or costly equipment, limiting accessibility. Leveraging flexible printed circuit board (FPCB) technology, we present an automated pipeline for generating resistive tactile sensing glove design files solely from a simple hand photo on legal-size paper, which can be readily supplied to commercial board houses for manufacturing. Our method enables cost-effective, accessible production at under \$130 per glove with sensor assembly times under 15 minutes. Sensor performance was characterized under varying pressure loads, and a preliminary user evaluation showcases four unique automatically manufactured designs, evaluated for their reliability and comfort.

Authors:Xiyuan Wang, Ziang Li, Sizhe Chen, Xingxing Xing, Wei Wan, Quan Li
Title: Prefer2SD: A Human-in-the-Loop Approach to Balancing Similarity and Diversity in In-Game Friend Recommendations
Abstract:
In-game friend recommendations significantly impact player retention and sustained engagement in online games. Balancing similarity and diversity in recommendations is crucial for fostering stronger social bonds across diverse player groups. However, automated recommendation systems struggle to achieve this balance, especially as player preferences evolve over time. To tackle this challenge, we introduce Prefer2SD (derived from Preference to Similarity and Diversity), an iterative, human-in-the-loop approach designed to optimize the similarity-diversity (SD) ratio in friend recommendations. Developed in collaboration with a local game company, Prefer2D leverages a visual analytics system to help experts explore, analyze, and adjust friend recommendations dynamically, incorporating players' shifting preferences. The system employs interactive visualizations that enable experts to fine-tune the balance between similarity and diversity for distinct player groups. We demonstrate the efficacy of Prefer2SD through a within-subjects study (N=12), a case study, and expert interviews, showcasing its ability to enhance in-game friend recommendations and offering insights for the broader field of personalized recommendation systems.

Authors:Xiyuan Wang, Yi-Fan Cao, Junjie Xiong, Sizhe Chen, Wenxuan Li, Junjie Zhang, Quan Li
Title: ClueCart: Supporting Game Story Interpretation and Narrative Inference from Fragmented Clues
Abstract:
Indexical storytelling is gaining popularity in video games, where the narrative unfolds through fragmented clues. This approach fosters player-generated content and discussion, as story interpreters piece together the overarching narrative from these scattered elements. However, the fragmented and non-linear nature of the clues makes systematic categorization and interpretation challenging, potentially hindering efficient story reconstruction and creative engagement. To address these challenges, we first proposed a hierarchical taxonomy to categorize narrative clues, informed by a formative study. Using this taxonomy, we designed ClueCart, a creativity support tool aimed at enhancing creators' ability to organize story clues and facilitate intricate story interpretation. We evaluated ClueCart through a between-subjects study (N=40), using Miro as a baseline. The results showed that ClueCart significantly improved creators' efficiency in organizing and retrieving clues, thereby better supporting their creative processes. Additionally, we offer design insights for future studies focused on player-centric narrative analysis.

Authors:Elizabeth Anne Watkins, Emanuel Moss, Giuseppe Raffa, Lama Nachman
Title: What's So Human about Human-AI Collaboration, Anyway? Generative AI and Human-Computer Interaction
Abstract:
While human-AI collaboration has been a longstanding goal and topic of study for computational research, the emergence of increasingly naturalistic generative AI language models has greatly inflected the trajectory of such research. In this paper we identify how, given the language capabilities of generative AI, common features of human-human collaboration derived from the social sciences can be applied to the study of human-computer interaction. We provide insights drawn from interviews with industry personnel working on building human-AI collaboration systems, as well as our collaborations with end-users to build a multimodal AI assistant for task support.

Authors:Omer Aydin, Enis Karaarslan, Fatih Safa Erenay, Nebojsa Bacanin
Title: Generative AI in Academic Writing: A Comparison of DeepSeek, Qwen, ChatGPT, Gemini, Llama, Mistral, and Gemma
Abstract:
DeepSeek v3, developed in China, was released in December 2024, followed by Alibaba's Qwen 2.5 Max in January 2025 and Qwen3 235B in April 2025. These free and open-source models offer significant potential for academic writing and content creation. This study evaluates their academic writing performance by comparing them with ChatGPT, Gemini, Llama, Mistral, and Gemma. There is a critical gap in the literature concerning how extensively these tools can be utilized and their potential to generate original content in terms of quality, readability, and effectiveness. Using 40 papers on Digital Twin and Healthcare, texts were generated through AI tools based on posed questions and paraphrased abstracts. The generated content was analyzed using plagiarism detection, AI detection, word count comparisons, semantic similarity, and readability assessments. Results indicate that paraphrased abstracts showed higher plagiarism rates, while question-based responses also exceeded acceptable levels. AI detection tools consistently identified all outputs as AI-generated. Word count analysis revealed that all chatbots produced a sufficient volume of content. Semantic similarity tests showed a strong overlap between generated and original texts. However, readability assessments indicated that the texts were insufficient in terms of clarity and accessibility. This study comparatively highlights the potential and limitations of popular and latest large language models for academic writing. While these models generate substantial and semantically accurate content, concerns regarding plagiarism, AI detection, and readability must be addressed for their effective use in scholarly work.

Authors:Callie Y. Kim, Arissa J. Sato, Nathan Thomas White, Hui-Ru Ho, Christine P. Lee, Yuna Hwang, Bilge Mutlu
Title: Bridging Generations using AI-Supported Co-Creative Activities
Abstract:
Intergenerational co-creation using technology between grandparents and grandchildren can be challenging due to differences in technological familiarity. AI has emerged as a promising tool to support co-creative activities, offering flexibility and creative assistance, but its role in facilitating intergenerational connection remains underexplored. In this study, we conducted a user study with 29 grandparent-grandchild groups engaged in AI-supported story creation to examine how AI-assisted co-creation can foster meaningful intergenerational bonds. Our findings show that grandchildren managed the technical aspects, while grandparents contributed creative ideas and guided the storytelling. AI played a key role in structuring the activity, facilitating brainstorming, enhancing storytelling, and balancing the contributions of both generations. The process fostered mutual appreciation, with each generation recognizing the strengths of the other, leading to an engaging and cohesive co-creation process. We offer design implications for integrating AI into intergenerational co-creative activities, emphasizing how AI can enhance connection across skill levels and technological familiarity.

Authors:Minju Yoo, Hyoungwook Jin, Juho Kim
Title: How Do Teachers Create Pedagogical Chatbots?: Current Practices and Challenges
Abstract:
AI chatbots have emerged as promising educational tools for personalized learning experiences, with advances in large language models (LLMs) enabling teachers to create and customize these chatbots for their specific classroom needs. However, there is a limited understanding of how teachers create pedagogical chatbots and integrate them into their lessons. Through semi-structured interviews with seven K-12 teachers, we examined their practices and challenges when designing, implementing, and deploying chatbots. Our findings revealed that teachers prioritize developing task-specific chatbots aligned with their lessons. Teachers engaged in various creation practices and had different challenges; novices in chatbot creation struggled mainly with initial design and technical implementation, while experienced teachers faced challenges with technical aspects and analyzing conversational data. Based on these insights, we explore approaches to supporting teachers' chatbot development and opportunities for designing future chatbot creation systems. This work provides foundational insights from teachers that can empower teacher-created chatbots, facilitating AI-augmented teaching.

Authors:Mengzhu Katie Chen, Isabella Pedraza Pineros, Arvind Satyanarayan, Jonathan Zong
Title: Tactile Vega-Lite: Rapidly Prototyping Tactile Charts with Smart Defaults
Abstract:
Tactile charts are essential for conveying data to blind and low vision (BLV) readers but are difficult for designers to construct. Non-expert designers face barriers to entry due to complex guidelines, while experts struggle with fragmented and time-consuming workflows that involve extensive customization. Inspired by formative interviews with expert tactile graphics designers, we created Tactile Vega-Lite (TVL): an extension of Vega-Lite that offers tactile-specific abstractions and synthesizes existing guidelines into a series of smart defaults. Predefined stylistic choices enable non-experts to produce guideline-compliant tactile charts quickly. Expert users can override defaults to tailor customizations for their intended audience. In a user study with 12 tactile graphics creators, we show that Tactile Vega-Lite enhances flexibility and consistency by automating tasks like adjusting spacing and translating braille while accelerating iterations through pre-defined textures and line styles. Through expert critique, we also learn more about tactile chart design best practices and design decisions.

Authors:Alexander Scarlatos, Yusong Wu, Ian Simon, Adam Roberts, Tim Cooijmans, Natasha Jaques, Cassie Tarakajian, Cheng-Zhi Anna Huang
Title: ReaLJam: Real-Time Human-AI Music Jamming with Reinforcement Learning-Tuned Transformers
Abstract:
Recent advances in generative artificial intelligence (AI) have created models capable of high-quality musical content generation. However, little consideration is given to how to use these models for real-time or cooperative jamming musical applications because of crucial required features: low latency, the ability to communicate planned actions, and the ability to adapt to user input in real-time. To support these needs, we introduce ReaLJam, an interface and protocol for live musical jamming sessions between a human and a Transformer-based AI agent trained with reinforcement learning. We enable real-time interactions using the concept of anticipation, where the agent continually predicts how the performance will unfold and visually conveys its plan to the user. We conduct a user study where experienced musicians jam in real-time with the agent through ReaLJam. Our results demonstrate that ReaLJam enables enjoyable and musically interesting sessions, and we uncover important takeaways for future work.

Authors:Jiaying "Lizzy" Liu, Yunlong Wang, Allen Jue, Yao Lyu, Yiheng Su, Shuo Niu, Yan Zhang
Title: Displaying Fear, Sadness, and Joy in Public: Schizophrenia Vloggers' Video Narration of Emotion and Online Care-Seeking
Abstract:
Individuals with severe mental illnesses (SMI), particularly schizophrenia, experience complex and intense emotions frequently. They increasingly turn to vlogging as an authentic medium for emotional disclosure and online support-seeking. While previous research has primarily focused on text-based disclosure, little is known about how people construct narratives around emotions and emotional experiences through video blogs. Our study analyzed 401 YouTube videos created by schizophrenia vloggers, revealing that vloggers disclosed their fear, sadness, and joy through verbal narration by explicit expressions or storytelling. Visually, they employed various framing styles, including Anonymous, Talk-to-Camera, and In-the-Moment approaches, along with diverse visual narration techniques. Notably, we uncovered a concerning 'visual appeal disparity' in audience engagement, with visually appealing videos receiving significantly more views, likes, and comments. This study discusses the role of video-sharing platforms in emotional expression and offers design implications for fostering online care-seeking for emotionally vulnerable populations.

Authors:Smit Desai, Mateusz Dubiel, Nima Zargham, Thomas Mildner, Laura Spillner
Title: Personas Evolved: Designing Ethical LLM-Based Conversational Agent Personalities
Abstract:
The emergence of Large Language Models (LLMs) has revolutionized Conversational User Interfaces (CUIs), enabling more dynamic, context-aware, and human-like interactions across diverse domains, from social sciences to healthcare. However, the rapid adoption of LLM-based personas raises critical ethical and practical concerns, including bias, manipulation, and unforeseen social consequences. Unlike traditional CUIs, where personas are carefully designed with clear intent, LLM-based personas generate responses dynamically from vast datasets, making their behavior less predictable and harder to govern. This workshop aims to bridge the gap between CUI and broader AI communities by fostering a cross-disciplinary dialogue on the responsible design and evaluation of LLM-based personas. Bringing together researchers, designers, and practitioners, we will explore best practices, develop ethical guidelines, and promote frameworks that ensure transparency, inclusivity, and user-centered interactions. By addressing these challenges collaboratively, we seek to shape the future of LLM-driven CUIs in ways that align with societal values and expectations.

Authors:Chirag Bhuvaneshwara, Lara Chehayeb, Alexander Haberl, Julius Siedentopf, Patrick Gebhard, Dimitra Tsovaltzi
Title: InCoRe -- An Interactive Co-Regulation Model: Training Teacher Communication Skills in Demanding Classroom Situations
Abstract:
Socioemotional and regulation processes in learning are important. We add to the understanding of previous work on co-regulation processes in the learning sciences, extending the caregiver-child paradigm and focusing on the teacher-student relation by presenting an interactive co-regulation model and the methodology for developing empirically grounded systems for training teachers. We focus on the combination of classroom management and affect models and detail the use of a psychological model to operationalise and automate the interaction with the virtual student. We delve into an annotation scheme developed to capture teacher subjective psychological experiences during training and how these affect their co-regulation behavior with students and contributes to understanding the role of teacher emotional experiences and their consequences of co-regulation processes for classroom management. This research is also a contribution to developing hybrid AI systems.

Authors:Weiyan Shi, Viet Hai Le, Kenny Tsu Wei Choo
Title: Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention
Abstract:
Joint attention is a critical component of early speech-language development and a key indicator of effective parent-child interaction. However, research on detecting and analysing joint attention remains limited, particularly for Multimodal Large Language Models (MLLMs). This study evaluates MLLMs' ability to comprehend joint attention by analysing 26 parent-child interaction videos annotated by two speech-language pathologists. These annotations identify strong and poor joint attention segments, serving as benchmarks for evaluating the models' interpretive capabilities. Our findings reveal that current MLLMs struggle to accurately interpret joint attention due to a lack of nuanced understanding of child-initiated eye contact, a crucial component of joint attention dynamics. This study highlights the importance of incorporating detailed eye contact to enhance MLLMs' multimodal reasoning. Addressing these gaps is essential for future research to advance the use of MLLMs in analysing and supporting parent-child interactions.

Authors:Xiyao Mei, Yu Zhang, Chaofan Yang, Rui Shi, Xiaoru Yuan
Title: ZuantuSet: A Collection of Historical Chinese Visualizations and Illustrations
Abstract:
Historical visualizations are a valuable resource for studying the history of visualization and inspecting the cultural context where they were created. When investigating historical visualizations, it is essential to consider contributions from different cultural frameworks to gain a comprehensive understanding. While there is extensive research on historical visualizations within the European cultural framework, this work shifts the focus to ancient China, a cultural context that remains underexplored by visualization researchers. To this aim, we propose a semi-automatic pipeline to collect, extract, and label historical Chinese visualizations. Through the pipeline, we curate ZuantuSet, a dataset with over 71K visualizations and 108K illustrations. We analyze distinctive design patterns of historical Chinese visualizations and their potential causes within the context of Chinese history and culture. We illustrate potential usage scenarios for this dataset, summarize the unique challenges and solutions associated with collecting historical Chinese visualizations, and outline future research directions.

Authors:Angelique Taylor, Tauhid Tanjim, Michael Joseph Sack, Maia Hirsch, Kexin Cheng, Kevin Ching, Jonathan St. George, Thijs Roumen, Malte F. Jung, Hee Rin Lee
Title: Rapidly Built Medical Crash Cart! Lessons Learned and Impacts on High-Stakes Team Collaboration in the Emergency Room
Abstract:
Designing robots to support high-stakes teamwork in emergency settings presents unique challenges, including seamless integration into fast-paced environments, facilitating effective communication among team members, and adapting to rapidly changing situations. While teleoperated robots have been successfully used in high-stakes domains such as firefighting and space exploration, autonomous robots that aid highs-takes teamwork remain underexplored. To address this gap, we conducted a rapid prototyping process to develop a series of seemingly autonomous robot designed to assist clinical teams in the Emergency Room. We transformed a standard crash cart--which stores medical equipment and emergency supplies into a medical robotic crash cart (MCCR). The MCCR was evaluated through field deployments to assess its impact on team workload and usability, identified taxonomies of failure, and refined the MCCR in collaboration with healthcare professionals. Our work advances the understanding of robot design for high-stakes, time-sensitive settings, providing insights into useful MCCR capabilities and considerations for effective human-robot collaboration. By publicly disseminating our MCCR tutorial, we hope to encourage HRI researchers to explore the design of robots for high-stakes teamwork.

Authors:Devansh Saxena, Ji-Youn Jung, Jodi Forlizzi, Kenneth Holstein, John Zimmerman
Title: AI Mismatches: Identifying Potential Algorithmic Harms Before AI Development
Abstract:
AI systems are often introduced with high expectations, yet many fail to deliver, resulting in unintended harm and missed opportunities for benefit. We frequently observe significant "AI Mismatches", where the system's actual performance falls short of what is needed to ensure safety and co-create value. These mismatches are particularly difficult to address once development is underway, highlighting the need for early-stage intervention. Navigating complex, multi-dimensional risk factors that contribute to AI Mismatches is a persistent challenge. To address it, we propose an AI Mismatch approach to anticipate and mitigate risks early on, focusing on the gap between realistic model performance and required task performance. Through an analysis of 774 AI cases, we extracted a set of critical factors, which informed the development of seven matrices that map the relationships between these factors and highlight high-risk areas. Through case studies, we demonstrate how our approach can help reduce risks in AI development.

Authors:Ian Steenstra, Farnaz Nouraei, Timothy W. Bickmore
Title: Scaffolding Empathy: Training Counselors with Simulated Patients and Utterance-level Performance Visualizations
Abstract:
Learning therapeutic counseling involves significant role-play experience with mock patients, with current manual training methods providing only intermittent granular feedback. We seek to accelerate and optimize counselor training by providing frequent, detailed feedback to trainees as they interact with a simulated patient. Our first application domain involves training motivational interviewing skills for counselors. Motivational interviewing is a collaborative counseling style in which patients are guided to talk about changing their behavior, with empathetic counseling an essential ingredient. We developed and evaluated an LLM-powered training system that features a simulated patient and visualizations of turn-by-turn performance feedback tailored to the needs of counselors learning motivational interviewing. We conducted an evaluation study with professional and student counselors, demonstrating high usability and satisfaction with the system. We present design implications for the development of automated systems that train users in counseling skills and their generalizability to other types of social skills training.

Authors:Frank Elavsky, Marita Vindedal, Ted Gies, Patrick Carrington, Dominik Moritz, Øystein Moseng
Title: Towards softerware: Enabling personalization of interactive data representations for users with disabilities
Abstract:
Accessible design for some may still produce barriers for others. This tension, called access friction, creates challenges for both designers and end-users with disabilities. To address this, we present the concept of softerware, a system design approach that provides end users with agency to meaningfully customize and adapt interfaces to their needs. To apply softerware to visualization, we assembled 195 data visualization customization options centered on the barriers we expect users with disabilities will experience. We built a prototype that applies a subset of these options and interviewed practitioners for feedback. Lastly, we conducted a design probe study with blind and low vision accessibility professionals to learn more about their challenges and visions for softerware. We observed access frictions between our participant's designs and they expressed that for softerware's success, current and future systems must be designed with accessible defaults, interoperability, persistence, and respect for a user's perceived effort-to-outcome ratio.

Authors:Nengyue Su, Liang Luo, Yu Gu, Fuji Ren
Title: Exploring the Effects of Traditional Chinese Medicine Scents on Mitigating Driving Fatigue
Abstract:
The rise of autonomous driving technology has led to concerns about inactivity-induced fatigue. This paper explores Traditional Chinese Medicine (TCM) scents for mitigating. Two human-involved studies have been conducted in a high-fidelity driving simulator. Study 1 maps six prevalent TCM scents onto the arousal/valence circumplex to select proper candidates, i.e., argy wormwood (with the highest arousal) and tangerine peel (with the highest valence). Study 2 tests both scents in an auto-driving course. Statistics show both scents can improve driver alertness and reaction-time, but should be used in different ways: argy wormwood is suitable for short-term use due to its higher intensity but poor acceptance, while tangerine peel is ideal for long-term use due to its higher likeness. These findings provide insights for in-car fatigue mitigation to enhance driver safety and well-being. However, issues such as scent longevity as for aromatherapy and automatic fatigue prediction remain unresolved.

Authors:Hodaya Barr, Dror Levy, Ariel Rosenfeld, Oleg Maksimov, Sarit Kraus
Title: Advising Agent for Supporting Human-Multi-Drone Team Collaboration
Abstract:
Multi-drone systems have become transformative technologies across various industries, offering innovative applications. However, despite significant advancements, their autonomous capabilities remain inherently limited. As a result, human operators are often essential for supervising and controlling these systems, creating what is referred to as a human-multi-drone team. In realistic settings, human operators must make real-time decisions while addressing a variety of signals, such as drone statuses and sensor readings, and adapting to dynamic conditions and uncertainty. This complexity may lead to suboptimal operations, potentially compromising the overall effectiveness of the team. In critical contexts like Search And Rescue (SAR) missions, such inefficiencies can have costly consequences. This work introduces an advising agent designed to enhance collaboration in human-multi-drone teams, with a specific focus on SAR scenarios. The advising agent is designed to assist the human operator by suggesting contextual actions worth taking. To that end, the agent employs a novel computation technique that relies on a small set of human demonstrations to generate varying realistic human-like trajectories. These trajectories are then generalized using machine learning for fast and accurate predictions of the long-term effects of different advice. Through human evaluations, we demonstrate that our approach delivers high-quality assistance, resulting in significantly improved performance compared to baseline conditions.

Authors:Dakyeom Ahn, Seora Park, Seolhee Lee, Jieun Cho, Hajin Lim
Title: I Stan Alien Idols and Also the People Behind Them: Understanding How Seams Between Virtual and Real Identities Engage VTuber Fans -- A Case Study of PLAVE
Abstract:
Virtual YouTubers (VTubers) have recently gained popularity as streamers using computer-generated avatars and real-time motion capture to create distinct virtual identities. While prior research has explored how VTubers construct virtual personas and engage audiences, little attention has been given to viewers' reactions when virtual and real identities blur-what we refer to as "seams." To address this gap, we conducted a case study on PLAVE, a popular Korean VTuber Kpop idol group, interviewing 24 of their fans. Our findings identified two main sources of seams: technical glitches and identity collapses, where VTubers act inconsistently with their virtual personas, revealing aspects of their real selves. These seams played a pivotal role in shaping diverse fan engagements, with some valuing authenticity linked to real identities, while others prioritized the coherence of virtual personas. Overall, our findings underscore the importance of seams in shaping viewer experiences.

Authors:Dakyeom Ahn, Hajin Lim
Title: Exploring K-12 Physical Education Teachers' Perspectives on Opportunities and Challenges of AI Integration Through Ideation Workshops
Abstract:
While AI's potential in education and professional sports is widely recognized, its application in K-12 physical education (PE) remains underexplored with significant opportunities for innovation. This study aims to address this gap by engaging 17 in-service secondary school PE teachers in group ideation workshops to explore potential AI applications and challenges in PE classes. Participants envisioned AI playing multidimensional roles, such as an operational assistant, personal trainer, group coach, and evaluator, as solutions to address unique instructional and operational challenges in K-12 PE classes. These roles reflected participants' perspectives on how AI could enhance class management, deliver personalized feedback, promote balanced team activities, and streamline performance assessments. Participants also highlighted critical considerations for AI integration, including the need to ensure robust student data security and privacy measures, minimize the risk of over-reliance on AI for instructional decisions, and accommodate the varying levels of technological proficiency among PE teachers. Our findings provide valuable insights and practical guidance for AI developers, educators, and policymakers, offering a foundation for the effective integration of AI into K-12 PE curricula to enhance teaching practices and student outcomes.

Authors:Gefei Zhang, Shenming Ji, Yicao Li, Jingwei Tang, Jihong Ding, Meng Xia, Guodao Sun, Ronghua Liang
Title: CPVis: Evidence-based Multimodal Learning Analytics for Evaluation in Collaborative Programming
Abstract:
As programming education becomes more widespread, many college students from non-computer science backgrounds begin learning programming. Collaborative programming emerges as an effective method for instructors to support novice students in developing coding and teamwork abilities. However, due to limited class time and attention, instructors face challenges in monitoring and evaluating the progress and performance of groups or individuals. To address this issue, we collect multimodal data from real-world settings and develop CPVis, an interactive visual analytics system designed to assess student collaboration dynamically. Specifically, CPVis enables instructors to evaluate both group and individual performance efficiently. CPVis employs a novel flower-based visual encoding to represent performance and provides time-based views to capture the evolution of collaborative behaviors. A within-subject experiment (N=22), comparing CPVis with two baseline systems, reveals that users gain more insights, find the visualization more intuitive, and report increased confidence in their assessments of collaboration.

Authors:Yoshee Jain, John Hollander, Amber He, Sunny Tang, Liang Zhang, John Sabatini
Title: Exploring the Potential of Large Language Models for Estimating the Reading Comprehension Question Difficulty
Abstract:
Reading comprehension is a key for individual success, yet the assessment of question difficulty remains challenging due to the extensive human annotation and large-scale testing required by traditional methods such as linguistic analysis and Item Response Theory (IRT). While these robust approaches provide valuable insights, their scalability is limited. There is potential for Large Language Models (LLMs) to automate question difficulty estimation; however, this area remains underexplored. Our study investigates the effectiveness of LLMs, specifically OpenAI's GPT-4o and o1, in estimating the difficulty of reading comprehension questions using the Study Aid and Reading Assessment (SARA) dataset. We evaluated both the accuracy of the models in answering comprehension questions and their ability to classify difficulty levels as defined by IRT. The results indicate that, while the models yield difficulty estimates that align meaningfully with derived IRT parameters, there are notable differences in their sensitivity to extreme item characteristics. These findings suggest that LLMs can serve as the scalable method for automated difficulty assessment, particularly in dynamic interactions between learners and Adaptive Instructional Systems (AIS), bridging the gap between traditional psychometric techniques and modern AIS for reading comprehension and paving the way for more adaptive and personalized educational assessments.

Authors:Sameer Neupane, Poorvesh Dongre, Denis Gracanin, Santosh Kumar
Title: Wearable Meets LLM for Stress Management: A Duoethnographic Study Integrating Wearable-Triggered Stressors and LLM Chatbots for Personalized Interventions
Abstract:
We use a duoethnographic approach to study how wearable-integrated LLM chatbots can assist with personalized stress management, addressing the growing need for immediacy and tailored interventions. Two researchers interacted with custom chatbots over 22 days, responding to wearable-detected physiological prompts, recording stressor phrases, and using them to seek tailored interventions from their LLM-powered chatbots. They recorded their experiences in autoethnographic diaries and analyzed them during weekly discussions, focusing on the relevance, clarity, and impact of chatbot-generated interventions. Results showed that even though most events triggered by the wearable were meaningful, only one in five warranted an intervention. It also showed that interventions tailored with brief event descriptions were more effective than generic ones. By examining the intersection of wearables and LLM, this research contributes to developing more effective, user-centric mental health tools for real-time stress relief and behavior change.

Authors:Sangwon Seo, Bing Han, Rayan E. Harari, Roger D. Dias, Marco A. Zenati, Eduardo Salas, Vaibhav Unhelkar
Title: Socratic: Enhancing Human Teamwork via AI-enabled Coaching
Abstract:
Coaches are vital for effective collaboration, but cost and resource constraints often limit their availability during real-world tasks. This limitation poses serious challenges in life-critical domains that rely on effective teamwork, such as healthcare and disaster response. To address this gap, we propose and realize an innovative application of AI: task-time team coaching. Specifically, we introduce Socratic, a novel AI system that complements human coaches by providing real-time guidance during task execution. Socratic monitors team behavior, detects misalignments in team members' shared understanding, and delivers automated interventions to improve team performance. We validated Socratic through two human subject experiments involving dyadic collaboration. The results demonstrate that the system significantly enhances team performance with minimal interventions. Participants also perceived Socratic as helpful and trustworthy, supporting its potential for adoption. Our findings also suggest promising directions both for AI research and its practical applications to enhance human teamwork.

Authors:Hui-Ru Ho, Nitigya Kargeti, Ziqi Liu, Bilge Mutlu
Title: SET-PAiREd: Designing for Parental Involvement in Learning with an AI-Assisted Educational Robot
Abstract:
AI-assisted learning companion robots are increasingly used in early education. Many parents express concerns about content appropriateness, while they also value how AI and robots could supplement their limited skill, time, and energy to support their children's learning. We designed a card-based kit, SET, to systematically capture scenarios that have different extents of parental involvement. We developed a prototype interface, PAiREd, with a learning companion robot to deliver LLM-generated educational content that can be reviewed and revised by parents. Parents can flexibly adjust their involvement in the activity by determining what they want the robot to help with. We conducted an in-home field study involving 20 families with children aged 3-5. Our work contributes to an empirical understanding of the level of support parents with different expectations may need from AI and robots and a prototype that demonstrates an innovative interaction paradigm for flexibly including parents in supporting their children.

Authors:Liuchuan Yu, Ching-I Huang, Hsueh-Cheng Wang, Lap-Fai Yu
Title: Enriching Physical-Virtual Interaction in AR Gaming by Tracking Identical Real Objects
Abstract:
Augmented reality (AR) games, particularly those designed for headsets, have become increasingly prevalent with advancements in both hardware and software. However, the majority of AR games still rely on pre-scanned or static scenes, and interaction mechanisms are often limited to controllers or hand-tracking. Additionally, the presence of identical objects in AR games poses challenges for conventional object tracking techniques, which often struggle to differentiate between identical objects or necessitate the installation of fixed cameras for global object movement tracking. In response to these limitations, we present a novel approach to address the tracking of identical objects in an AR scene to enrich physical-virtual interaction. Our method leverages partial scene observations captured by an AR headset, utilizing the perspective and spatial data provided by this technology. Object identities within the scene are determined through the solution of a label assignment problem using integer programming. To enhance computational efficiency, we incorporate a Voronoi diagram-based pruning method into our approach. Our implementation of this approach in a farm-to-table AR game demonstrates its satisfactory performance and robustness. Furthermore, we showcase the versatility and practicality of our method through applications in AR storytelling and a simulated gaming robot. Our video demo is available at: https://youtu.be/rPGkLYuKvCQ.

Authors:Angelique Taylor, Tauhid Tanjim, Huajie Cao, Jalynn Blu Nicoly, Jonathan I. Segal, Jonathan St. George, Soyon Kim, Kevin Ching, Francisco R. Ortega, Hee Rin Lee
Title: Co-Designing Augmented Reality Tools for High-Stakes Clinical Teamwork
Abstract:
How might healthcare workers (HCWs) leverage augmented reality head-mounted displays (AR-HMDs) to enhance teamwork? Although AR-HMDs have shown immense promise in supporting teamwork in healthcare settings, design for Emergency Department (ER) teams has received little attention. The ER presents unique challenges, including procedural recall, medical errors, and communication gaps. To address this gap, we engaged in a participatory design study with healthcare workers to gain a deep understanding of the potential for AR-HMDs to facilitate teamwork during ER procedures. Our results reveal that AR-HMDs can be used as an information-sharing and information-retrieval system to bridge knowledge gaps, and concerns about integrating AR-HMDs in ER workflows. We contribute design recommendations for seven role-based AR-HMD application scenarios involving HCWs with various expertise, working across multiple medical tasks. We hope our research inspires designers to embark on the development of new AR-HMD applications for high-stakes, team environments.

Authors:Louisa Conwill, Megan K. Levis, Karla Badillo-Urquiola, Walter J. Scheirer
Title: The Challenges and Benefits of Bringing Religious Values Into Design
Abstract:
HCI is increasingly taking inspiration from religious traditions as a basis for ethical technology designs. Such ethically-inspired designs can be especially important for social communications technologies, which are associated with numerous societal concerns. If religious values are to be incorporated into real-world designs, there may be challenges when designers work with values unfamiliar to them. Therefore, we investigate the difference in interpretations of values when they are translated to technology designs. To do so we studied design patterns that embody Catholic Social Teaching (CST). We interviewed 24 technologists and 7 CST scholars to assess how their understanding of how those values would manifest in social media designs. We found that for the most part the technologists responded similarly to the CST scholars. However, CST scholars had a better understanding of the principle of subsidiarity, and they believed moderation upheld human dignity more than the technologists did. We discuss the implications of our findings on the designs of social technologies and design processes at large.

Authors:Ramtin Tabatabaei, Vassilis Kostakos, Wafa Johal
Title: Gazing at Failure: Investigating Human Gaze in Response to Robot Failure in Collaborative Tasks
Abstract:
Robots are prone to making errors, which can negatively impact their credibility as teammates during collaborative tasks with human users. Detecting and recovering from these failures is crucial for maintaining effective level of trust from users. However, robots may fail without being aware of it. One way to detect such failures could be by analysing humans' non-verbal behaviours and reactions to failures. This study investigates how human gaze dynamics can signal a robot's failure and examines how different types of failures affect people's perception of robot. We conducted a user study with 27 participants collaborating with a robotic mobile manipulator to solve tangram puzzles. The robot was programmed to experience two types of failures -- executional and decisional -- occurring either at the beginning or end of the task, with or without acknowledgement of the failure. Our findings reveal that the type and timing of the robot's failure significantly affect participants' gaze behaviour and perception of the robot. Specifically, executional failures led to more gaze shifts and increased focus on the robot, while decisional failures resulted in lower entropy in gaze transitions among areas of interest, particularly when the failure occurred at the end of the task. These results highlight that gaze can serve as a reliable indicator of robot failures and their types, and could also be used to predict the appropriate recovery actions.

Authors:Longbin Lai, Changwei Luo, Yunkai Lou, Mingchen Ju, Zhengyi Yang
Title: Graphy'our Data: Towards End-to-End Modeling, Exploring and Generating Report from Raw Data
Abstract:
Large Language Models (LLMs) have recently demonstrated remarkable performance in tasks such as Retrieval-Augmented Generation (RAG) and autonomous AI agent workflows. Yet, when faced with large sets of unstructured documents requiring progressive exploration, analysis, and synthesis, such as conducting literature survey, existing approaches often fall short. We address this challenge -- termed Progressive Document Investigation -- by introducing Graphy, an end-to-end platform that automates data modeling, exploration and high-quality report generation in a user-friendly manner. Graphy comprises an offline Scrapper that transforms raw documents into a structured graph of Fact and Dimension nodes, and an online Surveyor that enables iterative exploration and LLM-driven report generation. We showcase a pre-scrapped graph of over 50,000 papers -- complete with their references -- demonstrating how Graphy facilitates the literature-survey scenario. The demonstration video can be found at https://youtu.be/uM4nzkAdGlM.

Authors:Zhuoyi Cheng, Pei Chen, Wenzheng Song, Hongbo Zhang, Zhuoshu Li, Lingyun Sun
Title: An Exploratory Study on How AI Awareness Impacts Human-AI Design Collaboration
Abstract:
The collaborative design process is intrinsically complicated and dynamic, and researchers have long been exploring how to enhance efficiency in this process. As Artificial Intelligence technology evolves, it has been widely used as a design tool and exhibited the potential as a design collaborator. Nevertheless, problems concerning how designers should communicate with AI in collaborative design remain unsolved. To address this research gap, we referred to how designers communicate fluently in human-human design collaboration, and found awareness to be an important ability for facilitating communication by understanding their collaborators and current situation. However, previous research mainly studied and supported human awareness, the possible impact AI awareness would bring to the human-AI collaborative design process, and the way to realize AI awareness remain unknown. In this study, we explored how AI awareness will impact human-AI collaboration through a Wizard-of-Oz experiment. Both quantitative and qualitative results supported that enabling AI to have awareness can enhance the communication fluidity between human and AI, thus enhancing collaboration efficiency. We further discussed the results and concluded design implications for future human-AI collaborative design systems.

Authors:Sitong Pan, Robin Schmucker, Bernardo Garcia Bulle Bueno, Salome Aguilar Llanes, Fernanda Albo Alarcón, Hangxiao Zhu, Adam Teo, Meng Xia
Title: TutorUp: What If Your Students Were Simulated? Training Tutors to Address Engagement Challenges in Online Learning
Abstract:
With the rise of online learning, many novice tutors lack experience engaging students remotely. We introduce TutorUp, a Large Language Model (LLM)-based system that enables novice tutors to practice engagement strategies with simulated students through scenario-based training. Based on a formative study involving two surveys (N1=86, N2=102) on student engagement challenges, we summarize scenarios that mimic real teaching situations. To enhance immersion and realism, we employ a prompting strategy that simulates dynamic online learning dialogues. TutorUp provides immediate and asynchronous feedback by referencing tutor-students online session dialogues and evidence-based teaching strategies from learning science literature. In a within-subject evaluation (N=16), participants rated TutorUp significantly higher than a baseline system without simulation capabilities regarding effectiveness and usability. Our findings suggest that TutorUp provides novice tutors with more effective training to learn and apply teaching strategies to address online student engagement challenges.

Authors:Shoumik Saha, Soheil Feizi
Title: Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing
Abstract:
The growing use of large language models (LLMs) for text generation has led to widespread concerns about AI-generated content detection. However, an overlooked challenge is AI-polished text, where human-written content undergoes subtle refinements using AI tools. This raises a critical question: should minimally polished text be classified as AI-generated? Such classification can lead to false plagiarism accusations and misleading claims about AI prevalence in online content. In this study, we systematically evaluate twelve state-of-the-art AI-text detectors using our AI-Polished-Text Evaluation (APT-Eval) dataset, which contains 14.7K samples refined at varying AI-involvement levels. Our findings reveal that detectors frequently flag even minimally polished text as AI-generated, struggle to differentiate between degrees of AI involvement, and exhibit biases against older and smaller models. These limitations highlight the urgent need for more nuanced detection methodologies.

Authors:Maximilian Enderling, Jan Hombeck, Kai Lawonn
Title: Enabling Seamless Creation of Annotated Spaces: Enhancing Learning in VR Environments
Abstract:
We present an approach to evaluate the efficacy of annotations in augmenting learning environments in the context of Virtual Reality. Our study extends previous work highlighting the benefits of learning based in virtual reality and introduces a method to facilitate asynchronous collaboration between educators and students. These two distinct perspectives fulfill special roles: educators aim to convey information, which learners should get familiarized. Educators are empowered to annotate static scenes on large touchscreens to supplement information. Subsequently, learners explore those annotated scenes in virtual reality. To assess the comparative ease and usability of creating text and pen annotations, we conducted a user study with 24 participants, which assumed both roles of learners and teachers. Educators annotated static courses using provided textbook excerpts, interfacing through an 86-inch touchscreen. Learners navigated pre-designed educational courses in virtual reality to evaluate the practicality of annotations. The utility of annotations in virtual reality garnered high ratings. Users encountered issues with the touch interface implementation and rated it with a low intuitivity. Despite this, our study underscores the significant benefits of annotations, particularly for learners. This research offers valuable insights into annotation-enriched learning, emphasizing its potential to enhance students' information retention and comprehension.

Authors:Lara Chehayeb, Katarzyna Olszynska, Chirag Bhuvaneshwara, Dimitra Tsovaltzi
Title: Effects of a Co-Regulation Model for MR Teacher Training: HRV and Self-Compassion as Indicators of Emotion Regulation
Abstract:
Teachers play a pivotal role in fostering students' emotional and cognitive development. Teachers need to regulate their emotions in order to co-regulate students. Here using a unique mixed method approach, we investigate the relationship between self-compassion, treating oneself with compassion, and physiological stress responses among pre-service teachers. Heart rate variability (HRV) was measured during a mixed reality (MR) teacher training scenario environment designed to simulate socio-emotional conflict in class. Recorded interviews that followed the MR-training were analyzed for observed self-compassion. Findings suggest that less emotional stress during the MR-training correlates with higher levels of self-compassion during the interview. MR-trainings and self-compassion may be valuable tools to train teacher emotion regulation and well-being.

Authors:Bongsu Kang, Jundong Kim, Tae-Rim Yun, Hyojin Bae, Chang-Eop Kim
Title: Identifying Features that Shape Perceived Consciousness in Large Language Model-based AI: A Quantitative Study of Human Responses
Abstract:
This study quantitively examines which features of AI-generated text lead humans to perceive subjective consciousness in large language model (LLM)-based AI systems. Drawing on 99 passages from conversations with Claude 3 Opus and focusing on eight features -- metacognitive self-reflection, logical reasoning, empathy, emotionality, knowledge, fluency, unexpectedness, and subjective expressiveness -- we conducted a survey with 123 participants. Using regression and clustering analyses, we investigated how these features influence participants' perceptions of AI consciousness. The results reveal that metacognitive self-reflection and the AI's expression of its own emotions significantly increased perceived consciousness, while a heavy emphasis on knowledge reduced it. Participants clustered into seven subgroups, each showing distinct feature-weighting patterns. Additionally, higher prior knowledge of LLMs and more frequent usage of LLM-based chatbots were associated with greater overall likelihood assessments of AI consciousness. This study underscores the multidimensional and individualized nature of perceived AI consciousness and provides a foundation for better understanding the psychosocial implications of human-AI interaction.

Authors:Tongyu Nie, Courtney Hutton Pospick, Ville Cantory, Danhua Zhang, Jasmine Joyce DeGuzman, Victoria Interrante, Isayas Berhe Adhanom, Evan Suma Rosenberg
Title: Peripheral Teleportation: A Rest Frame Design to Mitigate Cybersickness During Virtual Locomotion
Abstract:
Mitigating cybersickness can improve the usability of virtual reality (VR) and increase its adoption. The most widely used technique, dynamic field-of-view (FOV) restriction, mitigates cybersickness by blacking out the peripheral region of the user's FOV. However, this approach reduces the visibility of the virtual environment. We propose peripheral teleportation, a novel technique that creates a rest frame (RF) in the user's peripheral vision using content rendered from the current virtual environment. Specifically, the peripheral region is rendered by a pair of RF cameras whose transforms are updated by the user's physical motion. We apply alternating teleportations during translations, or snap turns during rotations, to the RF cameras to keep them close to the current viewpoint transformation. Consequently, the optical flow generated by RF cameras matches the user's physical motion, creating a stable peripheral view. In a between-subjects study (N = 90), we compared peripheral teleportation with a traditional black FOV restrictor and an unrestricted control condition. The results showed that peripheral teleportation significantly reduced discomfort and enabled participants to stay immersed in the virtual environment for a longer duration of time. Overall, these findings suggest that peripheral teleportation is a promising technique that VR practitioners may consider adding to their cybersickness mitigation toolset.

Authors:Lucile Favero, Juan Antonio Pérez-Ortiz, Tanja Käser, Nuria Oliver
Title: Leveraging Small LLMs for Argument Mining in Education: Argument Component Identification, Classification, and Assessment
Abstract:
Argument mining algorithms analyze the argumentative structure of essays, making them a valuable tool for enhancing education by providing targeted feedback on the students' argumentation skills. While current methods often use encoder or encoder-decoder deep learning architectures, decoder-only models remain largely unexplored, offering a promising research direction. This paper proposes leveraging open-source, small Large Language Models (LLMs) for argument mining through few-shot prompting and fine-tuning. These models' small size and open-source nature ensure accessibility, privacy, and computational efficiency, enabling schools and educators to adopt and deploy them locally. Specifically, we perform three tasks: segmentation of student essays into arguments, classification of the arguments by type, and assessment of their quality. We empirically evaluate the models on the Feedback Prize - Predicting Effective Arguments dataset of grade 6-12 students essays and demonstrate how fine-tuned small LLMs outperform baseline methods in segmenting the essays and determining the argument types while few-shot prompting yields comparable performance to that of the baselines in assessing quality. This work highlights the educational potential of small, open-source LLMs to provide real-time, personalized feedback, enhancing independent learning and writing skills while ensuring low computational cost and privacy.

Authors:Yi Fei Cheng, Ari Carden, Hyunsung Cho, Catarina G. Fidalgo, Jonathan Wieland, David Lindlbauer
Title: Augmented Reality In-the-Wild: Usage Patterns and Experiences of Working with AR Laptops in Real-World Settings
Abstract:
Augmented Reality (AR) is increasingly positioned as a tool for knowledge work, providing beneficial affordances such as a virtually limitless display space that integrates digital information with the user's physical surroundings. However, for AR to supplant traditional screen-based devices in knowledge work, it must support prolonged usage across diverse contexts. Until now, few studies have explored the effects, opportunities, and challenges of working in AR outside a controlled laboratory setting and for an extended duration. This gap in research limits our understanding of how users may adapt its affordances to their daily workflows and what barriers hinder its adoption. In this paper, we present findings from a longitudinal diary study examining how participants incorporated an AR laptop -- Sightful's Spacetop EA -- into their daily work routines. 14 participants used the device for 40-minute daily sessions over two weeks, collectively completing 103 hours of AR-based work. Through survey responses, workspace photographs, and post-study interviews, we analyzed usage patterns, workspace configurations, and evolving user perceptions. Our findings reveal key factors influencing participants' usage of AR, including task demands, environmental constraints, social dynamics, and ergonomic considerations. We highlight how participants leveraged and configured AR's virtual display space, along with emergent hybrid workflows that involved physical screens and tasks. Based on our results, we discuss both overlaps with current literature and new considerations and challenges for the future design of AR systems for pervasive and productive use.

Authors:Bryan Min, Haijun Xia
Title: Feedforward in Generative AI: Opportunities for a Design Space
Abstract:
Generative AI (GenAI) models have become more capable than ever at augmenting productivity and cognition across diverse contexts. However, a fundamental challenge remains as users struggle to anticipate what AI will generate. As a result, they must engage in excessive turn-taking with the AI's feedback to clarify their intent, leading to significant cognitive load and time investment. Our goal is to advance the perspective that in order for users to seamlessly leverage the full potential of GenAI systems across various contexts, we must design GenAI systems that not only provide informative feedback but also informative feedforward -- designs that tell users what AI will generate before the user submits their prompt. To spark discussion on feedforward in GenAI, we designed diverse instantiations of feedforward across four GenAI applications: conversational UIs, document editors, malleable interfaces, and automation agents, and discussed how these designs can contribute to a more rigorous investigation of a design space and a set of guidelines for feedforward in all GenAI systems.

Authors:Jiaying "Lizzy" Liu, Yongjie Sha, Yan Zhang
Title: A review of theories and models utilized by empirical studies about mental health help-seeking and implications for future research
Abstract:
Purpose: With the rise of mental health risks globally, it is urgent to provide effective mental health support. However, a holistic understanding of how people seek help for mental health problems remains limited, impeding the development of evidence-based intervention programs to facilitate help-seeking behavior. This study reviews current theories that guide empirical research on young adults' help-seeking behavior using technologies, identifies limitations in existing frameworks, and proposes directions for future research. Methods: We searched databases that are most likely to contain mental health help-seeking practices in relation to information technology, including PubMed, ACM Digital Library, Web of Science, PsycInfo, ScienceDirect, EBSCO, and Cochrane Library. Results: Of 2443 abstracts reviewed, 43 studies met the criteria and were included in the analysis. We identified 16 theories and models. They represent seven perspectives to view mental health help-seeking and reveal factors such as accessibility, stigma, and social support as key factors influencing help-seeking. Limitations: We summarized the theories and models and categorized them based on their primary perspectives. Cross-perspective connections could be explored in future reviews. Conclusions: A holistic approach to creating culturally sensitive multi-level interventions that consider individual, interpersonal, and community factors is needed to advance effective mental health help-seeking support strategies.

Authors:Ramaravind Kommiya Mothilal, Faisal M. Lalani, Syed Ishtiaque Ahmed, Shion Guha, Sharifa Sultana
Title: Talking About the Assumption in the Room
Abstract:
The reference to assumptions in how practitioners use or interact with machine learning (ML) systems is ubiquitous in HCI and responsible ML discourse. However, what remains unclear from prior works is the conceptualization of assumptions and how practitioners identify and handle assumptions throughout their workflows. This leads to confusion about what assumptions are and what needs to be done with them. We use the concept of an argument from Informal Logic, a branch of Philosophy, to offer a new perspective to understand and explicate the confusions surrounding assumptions. Through semi-structured interviews with 22 ML practitioners, we find what contributes most to these confusions is how independently assumptions are constructed, how reactively and reflectively they are handled, and how nebulously they are recorded. Our study brings the peripheral discussion of assumptions in ML to the center and presents recommendations for practitioners to better think about and work with assumptions.

Authors:Yan Zhang, Tharaka Sachintha Ratnayake, Cherie Sew, Jarrod Knibbe, Jorge Goncalves, Wafa Johal
Title: Can you pass that tool?: Implications of Indirect Speech in Physical Human-Robot Collaboration
Abstract:
Indirect speech acts (ISAs) are a natural pragmatic feature of human communication, allowing requests to be conveyed implicitly while maintaining subtlety and flexibility. Although advancements in speech recognition have enabled natural language interactions with robots through direct, explicit commands -- roviding clarity in communication -- the rise of large language models presents the potential for robots to interpret ISAs. However, empirical evidence on the effects of ISAs on human-robot collaboration (HRC) remains limited. To address this, we conducted a Wizard-of-Oz study (N=36), engaging a participant and a robot in collaborative physical tasks. Our findings indicate that robots capable of understanding ISAs significantly improve human's perceived robot anthropomorphism, team performance, and trust. However, the effectiveness of ISAs is task- and context-dependent, thus requiring careful use. These results highlight the importance of appropriately integrating direct and indirect requests in HRC to enhance collaborative experiences and task performance.

Authors:Jiaying "Lizzy" Liu, Yan Zhang
Title: "When I lost it, they dragged me out": How Care Encounters Empower Marginalized Young Adults' Aspiration and Mental Health Care-Seeking
Abstract:
Mental health care-seeking among marginalized young adults has received limited attention in CSCW research. Through in-depth interviews and visual elicitation methods with 18 diverse U.S. participants, our study reveals how marginalized identities shape mental health care-seeking journeys, often characterized by low aspirations and passive care-seeking influenced by lived experiences of marginalization. However, we found the transformative function of "care encounters" - serendipitous interactions with mental health resources that occur when individuals are not actively seeking support. These encounters serve as critical turning points, catalyzing shifts in aspiration and enabling more proactive care-seeking behaviors. Our analysis identifies both the infrastructural conditions that enable transformative care encounters and the aspiration breakdowns that impede care-seeking processes. This work makes conceptual contributions by supplementing traditional motivation-based care-seeking models with a reconceptualization of "care encounters" that accounts for the infrastructural and serendipitous nature of mental health access. We advance understanding of how marginalized identity uniquely influences care-seeking behaviors while providing actionable design implications for embedding technology-mediated "care encounters" into socio-technical interventions that can better support mental health care access for vulnerable populations.

Authors:Yate Ge, Meiying Li, Xipeng Huang, Yuanda Hu, Qi Wang, Xiaohua Sun, Weiwei Guo
Title: GenComUI: Exploring Generative Visual Aids as Medium to Support Task-Oriented Human-Robot Communication
Abstract:
This work investigates the integration of generative visual aids in human-robot task communication. We developed GenComUI, a system powered by large language models that dynamically generates contextual visual aids (such as map annotations, path indicators, and animations) to support verbal task communication and facilitate the generation of customized task programs for the robot. This system was informed by a formative study that examined how humans use external visual tools to assist verbal communication in spatial tasks. To evaluate its effectiveness, we conducted a user experiment (n = 20) comparing GenComUI with a voice-only baseline. The results demonstrate that generative visual aids, through both qualitative and quantitative analysis, enhance verbal task communication by providing continuous visual feedback, thus promoting natural and effective human-robot communication. Additionally, the study offers a set of design implications, emphasizing how dynamically generated visual aids can serve as an effective communication medium in human-robot interaction. These findings underscore the potential of generative visual aids to inform the design of more intuitive and effective human-robot communication, particularly for complex communication scenarios in human-robot interaction and LLM-based end-user development.

Authors:Ruijia Chen, Junru Jiang, Pragati Maheshwary, Brianna R. Cochran, Yuhang Zhao
Title: VisiMark: Characterizing and Augmenting Landmarks for People with Low Vision in Augmented Reality to Support Indoor Navigation
Abstract:
Landmarks are critical in navigation, supporting self-orientation and mental model development. Similar to sighted people, people with low vision (PLV) frequently look for landmarks via visual cues but face difficulties identifying some important landmarks due to vision loss. We first conducted a formative study with six PLV to characterize their challenges and strategies in landmark selection, identifying their unique landmark categories (e.g., area silhouettes, accessibility-related objects) and preferred landmark augmentations. We then designed VisiMark, an AR interface that supports landmark perception for PLV by providing both overviews of space structures and in-situ landmark augmentations. We evaluated VisiMark with 16 PLV and found that VisiMark enabled PLV to perceive landmarks they preferred but could not easily perceive before, and changed PLV's landmark selection from only visually-salient objects to cognitive landmarks that are more important and meaningful. We further derive design considerations for AR-based landmark augmentation systems for PLV.

Authors:Joshua Strong, Pramit Saha, Yasin Ibrahim, Cheng Ouyang, Alison Noble
Title: Expert-Agnostic Learning to Defer
Abstract:
Learning to Defer (L2D) trains autonomous systems to handle straightforward cases while deferring uncertain ones to human experts. Recent advancements in this field have introduced methods that offer flexibility to unseen experts at test time. However, we find these approaches struggle to generalise to experts with behaviours not seen during training, require extensive human annotation, and lack mechanisms for incorporating prior knowledge of expert capabilities. To address these challenges, we introduce Expert-Agnostic Learning to Defer (EA-L2D), a novel L2D framework that employs a Bayesian approach to model expert behaviour in an \textit{expert-agnostic} fashion. Across benchmark medical imaging datasets (HAM10000, Blood Cells, Retinal OCT, and Liver Tumours), EA-L2D significantly outperforms prior methods on unseen experts, achieving up to a 28\% relative improvement, while also matching or exceeding state-of-the-art performance on seen experts.

Authors:Kexin Zhang, Edward Glenn Scott Spencer, Abijith Manikandan, Andric Li, Ang Li, Yaxing Yao, Yuhang Zhao
Title: Inclusive Avatar Guidelines for People with Disabilities: Supporting Disability Representation in Social Virtual Reality
Abstract:
Avatar is a critical medium for identity representation in social virtual reality (VR). However, options for disability expression are highly limited on current avatar interfaces. Improperly designed disability features may even perpetuate misconceptions about people with disabilities (PWD). As more PWD use social VR, there is an emerging need for comprehensive design standards that guide developers and designers to create inclusive avatars. Our work aim to advance the avatar design practices by delivering a set of centralized, comprehensive, and validated design guidelines that are easy to adopt, disseminate, and update. Through a systematic literature review and interview with 60 participants with various disabilities, we derived 20 initial design guidelines that cover diverse disability expression methods through five aspects, including avatar appearance, body dynamics, assistive technology design, peripherals around avatars, and customization control. We further evaluated the guidelines via a heuristic evaluation study with 10 VR practitioners, validating the guideline coverage, applicability, and actionability. Our evaluation resulted in a final set of 17 design guidelines with recommendation levels.

Authors:Prerna Ravi, John Masla, Gisella Kakoti, Grace Lin, Emma Anderson, Matt Taylor, Anastasia Ostrowski, Cynthia Breazeal, Eric Klopfer, Hal Abelson
Title: Co-designing Large Language Model Tools for Project-Based Learning with K12 Educators
Abstract:
The emergence of generative AI, particularly large language models (LLMs), has opened the door for student-centered and active learning methods like project-based learning (PBL). However, PBL poses practical implementation challenges for educators around project design and management, assessment, and balancing student guidance with student autonomy. The following research documents a co-design process with interdisciplinary K-12 teachers to explore and address the current PBL challenges they face. Through teacher-driven interviews, collaborative workshops, and iterative design of wireframes, we gathered evidence for ways LLMs can support teachers in implementing high-quality PBL pedagogy by automating routine tasks and enhancing personalized learning. Teachers in the study advocated for supporting their professional growth and augmenting their current roles without replacing them. They also identified affordances and challenges around classroom integration, including resource requirements and constraints, ethical concerns, and potential immediate and long-term impacts. Drawing on these, we propose design guidelines for future deployment of LLM tools in PBL.

Authors:Swadhin Das, Raksha Sharma
Title: FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning
Abstract:
Remote sensing image captioning aims to generate descriptive text from remote sensing images, typically employing an encoder-decoder framework. In this setup, a convolutional neural network (CNN) extracts feature representations from the input image, which then guide the decoder in a sequence-to-sequence caption generation process. Although much research has focused on refining the decoder, the quality of image representations from the encoder remains crucial for accurate captioning. This paper introduces a novel approach that integrates features from two distinct CNN based encoders, capturing complementary information to enhance caption generation. Additionally, we propose a weighted averaging technique to combine the outputs of all GRUs in the stacked decoder. Furthermore, a comparison-based beam search strategy is incorporated to refine caption selection. The results demonstrate that our fusion-based approach, along with the enhanced stacked decoder, significantly outperforms both the transformer-based state-of-the-art model and other LSTM-based baselines.

Authors:Greta Warren, Irina Shklovski, Isabelle Augenstein
Title: Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking
Abstract:
The pervasiveness of large language models and generative AI in online media has amplified the need for effective automated fact-checking to assist fact-checkers in tackling the increasing volume and sophistication of misinformation. The complex nature of fact-checking demands that automated fact-checking systems provide explanations that enable fact-checkers to scrutinise their outputs. However, it is unclear how these explanations should align with the decision-making and reasoning processes of fact-checkers to be effectively integrated into their workflows. Through semi-structured interviews with fact-checking professionals, we bridge this gap by: (i) providing an account of how fact-checkers assess evidence, make decisions, and explain their processes; (ii) examining how fact-checkers use automated tools in practice; and (iii) identifying fact-checker explanation requirements for automated fact-checking tools. The findings show unmet explanation needs and identify important criteria for replicable fact-checking explanations that trace the model's reasoning path, reference specific evidence, and highlight uncertainty and information gaps.

Authors:Xinkai Wang, Yue Yang, Kehong Zhou, Xue Xie, Lifeng Zhu, Aiguo Song, Bruce Daniel
Title: MRUCT: Mixed Reality Assistance for Acupuncture Guided by Ultrasonic Computed Tomography
Abstract:
Chinese acupuncture practitioners primarily depend on muscle memory and tactile feedback to insert needles and accurately target acupuncture points, as the current workflow lacks imaging modalities and visual aids. Consequently, new practitioners often learn through trial and error, requiring years of experience to become proficient and earn the trust of patients. Medical students face similar challenges in mastering this skill. To address these challenges, we developed an innovative system, MRUCT, that integrates ultrasonic computed tomography (UCT) with mixed reality (MR) technology to visualize acupuncture points in real-time. This system offers offline image registration and real-time guidance during needle insertion, enabling them to accurately position needles based on anatomical structures such as bones, muscles, and auto-generated reference points, with the potential for clinical implementation. In this paper, we outline the non-rigid registration methods used to reconstruct anatomical structures from UCT data, as well as the key design considerations of the MR system. We evaluated two different 3D user interface (3DUI) designs and compared the performance of our system to traditional workflows for both new practitioners and medical students. The results highlight the potential of MR to enhance therapeutic medical practices and demonstrate the effectiveness of the system we developed.

Authors:Robin Connor Schramm, Markus Sasalovici, Jann Philipp Freiwald, Michael Otto, Melissa Reinelt, Ulrich Schwanecke
Title: Blending the Worlds: World-Fixed Visual Appearances in Automotive Augmented Reality
Abstract:
With the transition to fully autonomous vehicles, non-driving related tasks (NDRTs) become increasingly important, allowing passengers to use their driving time more efficiently. In-car Augmented Reality (AR) gives the possibility to engage in NDRTs while also allowing passengers to engage with their surroundings, for example, by displaying world-fixed points of interest (POIs). This can lead to new discoveries, provide information about the environment, and improve locational awareness. To explore the optimal visualization of POIs using in-car AR, we conducted a field study (N = 38) examining six parameters: positioning, scaling, rotation, render distance, information density, and appearance. We also asked for intention of use, preferred seat positions and preferred automation level for the AR function in a post-study questionnaire. Our findings reveal user preferences and general acceptance of the AR functionality. Based on these results, we derived UX-guidelines for the visual appearance and behavior of location-based POIs in in-car AR.

Authors:Robin Connor Schramm, Ginevra Fedrizzi, Markus Sasalovici, Jann Philipp Freiwald, Ulrich Schwanecke
Title: Augmented Journeys: Interactive Points of Interest for In-Car Augmented Reality
Abstract:
As passengers spend more time in vehicles, the demand for non-driving related tasks (NDRTs) increases. In-car Augmented Reality (AR) has the potential to enhance passenger experiences by enabling interaction with the environment through NDRTs using world-fixed Points of Interest (POIs). However, the effectiveness of existing interaction techniques and visualization methods for in-car AR remains unclear. Based on a survey (N=110) and a pre-study (N=10), we developed an interactive in-car AR system using a video see-through head-mounted display to engage with POIs via eye-gaze and pinch. Users could explore passed and upcoming POIs using three visualization techniques: List, Timeline, and Minimap. We evaluated the system's feasibility in a field study (N=21). Our findings indicate general acceptance of the system, with the List visualization being the preferred method for exploring POIs. Additionally, the study highlights limitations of current AR hardware, particularly the impact of vehicle movement on 3D interaction.

Authors:Amit Kumar, Arman Hosseini, Arghavan Azarbayjani, Arsalan Heydarian, Omidreza Shoghli
Title: Adoption of AI-Assisted E-Scooters: The Role of Perceived Trust, Safety, and Demographic Drivers
Abstract:
E-scooters have become a more dominant mode of transport in recent years. However, the rise in their usage has been accompanied by an increase in injuries, affecting the trust and perceived safety of both users and non-users. Artificial intelligence (AI), as a cutting-edge and widely applied technology, has demonstrated potential to enhance transportation safety, particularly in driver assistance systems. The integration of AI into e-scooters presents a promising approach to addressing these safety concerns. This study aims to explore the factors influencing individuals willingness to use AI-assisted e-scooters. Data were collected using a structured questionnaire, capturing responses from 405 participants. The questionnaire gathered information on demographic characteristics, micromobility usage frequency, road users' perception of safety around e-scooters, perceptions of safety in AI-enabled technology, trust in AI-enabled e-scooters, and involvement in e-scooter crash incidents. To examine the impact of demographic factors on participants' preferences between AI-assisted and regular e-scooters, decision tree analysis is employed, indicating that ethnicity, income, and age significantly influence preferences. To analyze the impact of other factors on the willingness to use AI-enabled e-scooters, a full-scale Structural Equation Model (SEM) is applied, revealing that the perception of safety in AI enabled technology and the level of trust in AI-enabled e-scooters are the strongest predictors.

Authors:Ryan Yen, Jian Zhao, Daniel Vogel
Title: Code Shaping: Iterative Code Editing with Free-form AI-Interpreted Sketching
Abstract:
We introduce the concept of code shaping, an interaction paradigm for editing code using free-form sketch annotations directly on top of the code and console output. To evaluate this concept, we conducted a three-stage design study with 18 different programmers to investigate how sketches can communicate intended code edits to an AI model for interpretation and execution. The results show how different sketches are used, the strategies programmers employ during iterative interactions with AI interpretations, and interaction design principles that support the reconciliation between the code editor and sketches. Finally, we demonstrate the practical application of the code shaping concept with two use case scenarios, illustrating design implications from the study.

Authors:Tailia Malloy, Maria Jose Ferreira, Fei Fang, Cleotilde Gonzalez
Title: Training Users Against Human and GPT-4 Generated Social Engineering Attacks
Abstract:
In real-world decision making, outcomes are often delayed, meaning individuals must make multiple decisions before receiving any feedback. Moreover, feedback can be presented in different ways: it may summarize the overall results of multiple decisions (aggregated feedback) or report the outcome of individual decisions after some delay (clustered feedback). Despite its importance, the timing and presentation of delayed feedback has received little attention in cognitive modeling of decision-making, which typically focuses on immediate feedback. To address this, we conducted an experiment to compare the effect of delayed vs. immediate feedback and aggregated vs. clustered feedback. We also propose a Hierarchical Instance-Based Learning (HIBL) model that captures how people make decisions in delayed feedback settings. HIBL uses a super-model that chooses between sub-models to perform the decision-making task until an outcome is observed. Simulations show that HIBL best predicts human behavior and specific patterns, demonstrating the flexibility of IBL models.

Authors:Merle M. Reimann, Koen V. Hindriks, Florian A. Kunneman, Catharine Oertel, Gabriel Skantze, Iolanda Leite
Title: What Can You Say to a Robot? Capability Communication Leads to More Natural Conversations
Abstract:
When encountering a robot in the wild, it is not inherently clear to human users what the robot's capabilities are. When encountering misunderstandings or problems in spoken interaction, robots often just apologize and move on, without additional effort to make sure the user understands what happened. We set out to compare the effect of two speech based capability communication strategies (proactive, reactive) to a robot without such a strategy, in regard to the user's rating of and their behavior during the interaction. For this, we conducted an in-person user study with 120 participants who had three speech-based interactions with a social robot in a restaurant setting. Our results suggest that users preferred the robot communicating its capabilities proactively and adjusted their behavior in those interactions, using a more conversational interaction style while also enjoying the interaction more.

Authors:Logan Lane, Jerald Thomas, Alexander Giovannelli, Ibrahim Tahmid, Doug Bowman
Title: Exploring the Effects of Level of Control in the Initialization of Shared Whiteboarding Sessions in Collaborative Augmented Reality
Abstract:
Augmented Reality (AR) collaboration can benefit from a shared 2D surface, such as a whiteboard. However, many features of each collaborators physical environment must be considered in order to determine the best placement and shape of the shared surface. We explored the effects of three methods for beginning a collaborative whiteboarding session with varying levels of user control: MANUAL, DISCRETE CHOICE, and AUTOMATIC by conducting a simulated AR study within Virtual Reality (VR). In the MANUAL method, users draw their own surfaces directly in the environment until they agree on the placement; in the DISCRETE CHOICE method, the system provides three options for whiteboard size and location; and in the AUTOMATIC method, the system automatically creates a whiteboard that fits within each collaborators environment. We evaluate these three conditions in a study in which two collaborators used each method to begin collaboration sessions. After establishing a session, the users worked together to complete an affinity diagramming task using the shared whiteboard. We found that the majority of participants preferred to have direct control during the initialization of a new collaboration session, despite the additional workload induced by the Manual method.

Authors:Alexander Giovannelli, Fionn Murphy, Trey Davis, Chaerin Lee, Rehema Abulikemu, Matthew Gallagher, Sahil Sharma, Lee Lisle, Doug Bowman
Title: Planet Purifiers: A Collaborative Immersive Experience Proposing New Modifications to HOMER and Fishing Reel Interaction Techniques
Abstract:
This paper presents our solution to the 2025 3DUI Contest challenge. We aimed to develop a collaborative, immersive experience that raises awareness about trash pollution in natural landscapes while enhancing traditional interaction techniques in virtual environments. To achieve these objectives, we created an engaging multiplayer game where one user collects harmful pollutants while the other user provides medication to impacted wildlife using enhancements to traditional interaction techniques: HOMER and Fishing Reel. We enhanced HOMER to use a cone volume to reduce the precise aiming required by a selection raycast to provide a more efficient means to collect pollutants at large distances, coined as FLOW-MATCH. To improve the animal feed distribution to wildlife far away from the user with Fishing Reel, we created RAWR-XD, an asymmetric bi-manual technique to more conveniently adjust the reeling speed using the non-selecting wrist rotation of the user.

Authors:Sheng Lyu, Ruiming Huang, Sijie Ji, Yasar Abbas Ur Rehman, Lan Ma, Chenshu Wu
Title: CardioLive: Empowering Video Streaming with Online Cardiac Monitoring
Abstract:
Online Cardiac Monitoring (OCM) emerges as a compelling enhancement for the next-generation video streaming platforms. It enables various applications including remote health, online affective computing, and deepfake detection. Yet the physiological information encapsulated in the video streams has been long neglected. In this paper, we present the design and implementation of CardioLive, the first online cardiac monitoring system in video streaming platforms. We leverage the naturally co-existed video and audio streams and devise CardioNet, the first audio-visual network to learn the cardiac series. It incorporates multiple unique designs to extract temporal and spectral features, ensuring robust performance under realistic video streaming conditions. To enable the Service-On-Demand online cardiac monitoring, we implement CardioLive as a plug-and-play middleware service and develop systematic solutions to practical issues including changing FPS and unsynchronized streams. Extensive experiments have been done to demonstrate the effectiveness of our system. We achieve a Mean Square Error (MAE) of 1.79 BPM error, outperforming the video-only and audio-only solutions by 69.2% and 81.2%, respectively. Our CardioLive service achieves average throughputs of 115.97 and 98.16 FPS when implemented in Zoom and YouTube. We believe our work opens up new applications for video stream systems. We will release the code soon.

Authors:Keshav Bimbraw, Srikar Nekkanti, Daniel B. Tiller, Mihir Deshmukh, Berk Calli, Robert D. Howe, Haichong K. Zhang
Title: Simultaneous Estimation of Manipulation Skill and Hand Grasp Force from Forearm Ultrasound Images
Abstract:
Accurate estimation of human hand configuration and the forces they exert is critical for effective teleoperation and skill transfer in robotic manipulation. A deeper understanding of human interactions with objects can further enhance teleoperation performance. To address this need, researchers have explored methods to capture and translate human manipulation skills and applied forces to robotic systems. Among these, biosignal-based approaches, particularly those using forearm ultrasound data, have shown significant potential for estimating hand movements and finger forces. In this study, we present a method for simultaneously estimating manipulation skills and applied hand force using forearm ultrasound data. Data collected from seven participants were used to train deep learning models for classifying manipulation skills and estimating grasp force. Our models achieved an average classification accuracy of 94.87 percent plus or minus 10.16 percent for manipulation skills and an average root mean square error (RMSE) of 0.51 plus or minus 0.19 Newtons for force estimation, as evaluated using five-fold cross-validation. These results highlight the effectiveness of forearm ultrasound in advancing human-machine interfacing and robotic teleoperation for complex manipulation tasks. This work enables new and effective possibilities for human-robot skill transfer and tele-manipulation, bridging the gap between human dexterity and robotic control.

Authors:Sean Memery, Kevin Denamganai, Jiaxin Zhang, Zehai Tu, Yiwen Guo, Kartic Subr
Title: CueTip: An Interactive and Explainable Physics-aware Pool Assistant
Abstract:
We present an interactive and explainable automated coaching assistant called CueTip for a variant of pool/billiards. CueTip's novelty lies in its combination of three features: a natural-language interface, an ability to perform contextual, physics-aware reasoning, and that its explanations are rooted in a set of predetermined guidelines developed by domain experts. We instrument a physics simulator so that it generates event traces in natural language alongside traditional state traces. Event traces lend themselves to interpretation by language models, which serve as the interface to our assistant. We design and train a neural adaptor that decouples tactical choices made by CueTip from its interactivity and explainability allowing it to be reconfigured to mimic any pool playing agent. Our experiments show that CueTip enables contextual query-based assistance and explanations while maintaining the strength of the agent in terms of win rate (improving it in some situations). The explanations generated by CueTip are physically-aware and grounded in the expert rules and are therefore more reliable.

Authors:Yuzhe Zhang, Chengxi Xie, Huan Liu, Yuhan Shi, Dalin Zhang
Title: MIND-EEG: Multi-granularity Integration Network with Discrete Codebook for EEG-based Emotion Recognition
Abstract:
Emotion recognition using electroencephalogram (EEG) signals has broad potential across various domains. EEG signals have ability to capture rich spatial information related to brain activity, yet effectively modeling and utilizing these spatial relationships remains a challenge. Existing methods struggle with simplistic spatial structure modeling, failing to capture complex node interactions, and lack generalizable spatial connection representations, failing to balance the dynamic nature of brain networks with the need for discriminative and generalizable features. To address these challenges, we propose the Multi-granularity Integration Network with Discrete Codebook for EEG-based Emotion Recognition (MIND-EEG). The framework employs a multi-granularity approach, integrating global and regional spatial information through a Global State Encoder, an Intra-Regional Functionality Encoder, and an Inter-Regional Interaction Encoder to comprehensively model brain activity. Additionally, we introduce a discrete codebook mechanism for constructing network structures via vector quantization, ensuring compact and meaningful brain network representations while mitigating over-smoothing and enhancing model generalization. The proposed framework effectively captures the dynamic and diverse nature of EEG signals, enabling robust emotion recognition. Extensive comparisons and analyses demonstrate the effectiveness of MIND-EEG, and the source code is publicly available at https://anonymous.4open.science/r/MIND_EEG.

Authors:Shuang Xie, Yang Liu, Jeannie S. A. Lee, Haiwei Dong
Title: MetaDecorator: Generating Immersive Virtual Tours through Multimodality
Abstract:
MetaDecorator, is a framework that empowers users to personalize virtual spaces. By leveraging text-driven prompts and image synthesis techniques, MetaDecorator adorns static panoramas captured by 360° imaging devices, transforming them into uniquely styled and visually appealing environments. This significantly enhances the realism and engagement of virtual tours compared to traditional offerings. Beyond the core framework, we also discuss the integration of Large Language Models (LLMs) and haptics in the VR application to provide a more immersive experience.

Authors:Yimeng Wang, Yinzhou Wang, Kelly Crace, Yixuan Zhang
Title: Understanding Attitudes and Trust of Generative AI Chatbots for Social Anxiety Support
Abstract:
Social anxiety (SA) has become increasingly prevalent. Traditional coping strategies often face accessibility challenges. Generative AI (GenAI), known for their knowledgeable and conversational capabilities, are emerging as alternative tools for mental well-being. With the increased integration of GenAI, it is important to examine individuals' attitudes and trust in GenAI chatbots' support for SA. Through a mixed-method approach that involved surveys (n = 159) and interviews (n = 17), we found that individuals with severe symptoms tended to trust and embrace GenAI chatbots more readily, valuing their non-judgmental support and perceived emotional comprehension. However, those with milder symptoms prioritized technical reliability. We identified factors influencing trust, such as GenAI chatbots' ability to generate empathetic responses and its context-sensitive limitations, which were particularly important among individuals with SA. We also discuss the design implications and use of GenAI chatbots in fostering cognitive and emotional trust, with practical and design considerations.

Authors:Yinzhou Wang, Yimeng Wang, Ye Xiao, Liabette Escamilla, Bianca Augustine, Kelly Crace, Gang Zhou, Yixuan Zhang
Title: Evaluating an LLM-Powered Chatbot for Cognitive Restructuring: Insights from Mental Health Professionals
Abstract:
Recent advancements in large language models (LLMs) promise to expand mental health interventions by emulating therapeutic techniques, potentially easing barriers to care. Yet there is a lack of real-world empirical evidence evaluating the strengths and limitations of LLM-enabled psychotherapy interventions. In this work, we evaluate an LLM-powered chatbot, designed via prompt engineering to deliver cognitive restructuring (CR), with 19 users. Mental health professionals then examined the resulting conversation logs to uncover potential benefits and pitfalls. Our findings indicate that an LLM-based CR approach has the capability to adhere to core CR protocols, prompt Socratic questioning, and provide empathetic validation. However, issues of power imbalances, advice-giving, misunderstood cues, and excessive positivity reveal deeper challenges, including the potential to erode therapeutic rapport and ethical concerns. We also discuss design implications for leveraging LLMs in psychotherapy and underscore the importance of expert oversight to mitigate these concerns, which are critical steps toward safer, more effective AI-assisted interventions.

Authors:Wenwen Li, Kangwei Shi, Yidong Chai
Title: AI Chatbots as Professional Service Agents: Developing a Professional Identity
Abstract:
With the rapid expansion of large language model (LLM) applications, there is an emerging shift in the role of LLM-based AI chatbots from serving merely as general inquiry tools to acting as professional service agents. However, current studies often overlook a critical aspect of professional service agents: the act of communicating in a manner consistent with their professional identities. This is of particular importance in the healthcare sector, where effective communication with patients is essential for achieving professional goals, such as promoting patient well-being by encouraging healthy behaviors. To bridge this gap, we propose LAPI (LLM-based Agent with a Professional Identity), a novel framework for designing professional service agent tailored for medical question-and-answer (Q\&A) services, ensuring alignment with a specific professional identity. Our method includes a theory-guided task planning process that decomposes complex professional tasks into manageable subtasks aligned with professional objectives and a pragmatic entropy method designed to generate professional and ethical responses with low uncertainty. Experiments on various LLMs show that the proposed approach outperforms baseline methods, including few-shot prompting, chain-of-thought prompting, across key metrics such as fluency, naturalness, empathy, patient-centricity, and ROUGE-L scores. Additionally, the ablation study underscores the contribution of each component to the overall effectiveness of the approach.

Authors:Nandi Zhang, Yukang Yan, Ryo Suzuki
Title: From Following to Understanding: Investigating the Role of Reflective Prompts in AR-Guided Tasks to Promote Task Understanding
Abstract:
Augmented Reality (AR) is a promising medium for guiding users through tasks, yet its impact on fostering deeper task understanding remains underexplored. This paper investigates the impact of reflective prompts -- strategic questions that encourage users to challenge assumptions, connect actions to outcomes, and consider hypothetical scenarios -- on task comprehension and performance. We conducted a two-phase study: a formative survey and co-design sessions (N=9) to develop reflective prompts, followed by a within-subject evaluation (N=16) comparing AR instructions with and without these prompts in coffee-making and circuit assembly tasks. Our results show that reflective prompts significantly improved objective task understanding and resulted in more proactive information acquisition behaviors during task completion. These findings highlight the potential of incorporating reflective elements into AR instructions to foster deeper engagement and learning. Based on data from both studies, we synthesized design guidelines for integrating reflective elements into AR systems to enhance user understanding without compromising task performance.

Authors:Aadarsh Padiyath, Mark Guzdial, Barbara Ericson
Title: Development of the Critical Reflection and Agency in Computing Index
Abstract:
As computing's societal impact grows, so does the need for computing students to recognize and address the ethical and sociotechnical implications of their work. While there are efforts to integrate ethics into computing curricula, we lack a standardized tool to measure those efforts, specifically, students' attitudes towards ethical reflection and their ability to effect change. This paper introduces the novel framework of Critically Conscious Computing and reports on the development and content validation of the Critical Reflection and Agency in Computing Index, a novel instrument designed to assess undergraduate computing students' attitudes towards practicing critically conscious computing. The resulting index is a theoretically grounded, expert-reviewed tool to support research and practice in computing ethics education. This enables researchers and educators to gain insights into students' perspectives, inform the design of targeted ethics interventions, and measure the effectiveness of computing ethics education initiatives.

Authors:Yuhan Hu, Peide Huang, Mouli Sivapurapu, Jian Zhang
Title: ELEGNT: Expressive and Functional Movement Design for Non-anthropomorphic Robot
Abstract:
Nonverbal behaviors such as posture, gestures, and gaze are essential for conveying internal states, both consciously and unconsciously, in human interaction. For robots to interact more naturally with humans, robot movement design should likewise integrate expressive qualities, such as intention, attention, and emotions, alongside traditional functional considerations like task fulfillment and time efficiency. In this paper, we present the design and prototyping of a lamp-like robot that explores the interplay between functional and expressive objectives in movement design. Using a research-through-design methodology, we document the hardware design process, define expressive movement primitives, and outline a set of interaction scenario storyboards. We propose a framework that incorporates both functional and expressive utilities during movement generation, and implement the robot behavior sequences in different function- and social- oriented tasks. Through a user study comparing expression-driven versus function-driven movements across six task scenarios, our findings indicate that expression-driven movements significantly enhance user engagement and perceived robot qualities. This effect is especially pronounced in social-oriented tasks.

Authors:Héctor Cadavid, Hyunho Mo, Bauke Arends, Katarzyna Dziopa, Esther E. Bron, Daniel Bos, Sonja Georgievska, Pim van der Harst
Title: MyDigiTwin: A Privacy-Preserving Framework for Personalized Cardiovascular Risk Prediction and Scenario Exploration
Abstract:
Cardiovascular disease (CVD) remains a leading cause of death, and primary prevention through personalized interventions is crucial. This paper introduces MyDigiTwin, a framework that integrates health digital twins with personal health environments to empower patients in exploring personalized health scenarios while ensuring data privacy. MyDigiTwin uses federated learning to train predictive models across distributed datasets without transferring raw data, and a novel data harmonization framework addresses semantic and format inconsistencies in health data. A proof-of-concept demonstrates the feasibility of harmonizing and using cohort data to train privacy-preserving CVD prediction models. This framework offers a scalable solution for proactive, personalized cardiovascular care and sets the stage for future applications in real-world healthcare settings.

Authors:Yuanjun Feng, Stefan Feuerriegel, Yash Raj Shrestha
Title: Contextualizing Recommendation Explanations with LLMs: A User Study
Abstract:
Large language models (LLMs) are increasingly prevalent in recommender systems, where LLMs can be used to generate personalized recommendations. Here, we examine how different LLM-generated explanations for movie recommendations affect users' perceptions of cognitive, affective, and utilitarian needs and consumption intentions. In a pre-registered, between-subject online experiment (N=759) and follow-up interviews (N=30), we compare (a) LLM-generated generic explanations, and (b) LLM-generated contextualized explanations. Our findings show that contextualized explanations (i.e., explanations that incorporate users' past behaviors) effectively meet users' cognitive needs while increasing users' intentions to watch recommended movies. However, adding explanations offers limited benefits in meeting users' utilitarian and affective needs, raising concerns about the proper design and implications of LLM-generated explanations. Qualitative insights from interviews reveal that referencing users' past preferences enhances trust and understanding but can feel excessive if overused. Furthermore, users with more active and positive engagement with the recommender system and movie-watching get substantial gains from contextualized explanations. Overall, our research clarifies how LLM-generated recommendations influence users' motivations and behaviors, providing valuable insights for the future development of user-centric recommender systems, a key element in social media platforms and online ecosystems.

Authors:Oliver Chojnowski, Alexander Eberhard, Michael Schiffmann, Ana Müller, Anja Richert
Title: Human-like Nonverbal Behavior with MetaHumans in Real-World Interaction Studies: An Architecture Using Generative Methods and Motion Capture
Abstract:
Socially interactive agents are gaining prominence in domains like healthcare, education, and service contexts, particularly virtual agents due to their inherent scalability. To facilitate authentic interactions, these systems require verbal and nonverbal communication through e.g., facial expressions and gestures. While natural language processing technologies have rapidly advanced, incorporating human-like nonverbal behavior into real-world interaction contexts is crucial for enhancing the success of communication, yet this area remains underexplored. One barrier is creating autonomous systems with sophisticated conversational abilities that integrate human-like nonverbal behavior. This paper presents a distributed architecture using Epic Games MetaHuman, combined with advanced conversational AI and camera-based user management, that supports methods like motion capture, handcrafted animation, and generative approaches for nonverbal behavior. We share insights into a system architecture designed to investigate nonverbal behavior in socially interactive agents, deployed in a three-week field study in the Deutsches Museum Bonn, showcasing its potential in realistic nonverbal behavior research.

Authors:Mo Houtti, Moyan Zhou, Loren Terveen, Stevie Chancellor
Title: Observe, Ask, Intervene: Designing AI Agents for More Inclusive Meetings
Abstract:
Video conferencing meetings are more effective when they are inclusive, but inclusion often hinges on meeting leaders' and/or co-facilitators' practices. AI systems can be designed to improve meeting inclusion at scale by moderating negative meeting behaviors and supporting meeting leaders. We explored this design space by conducting $9$ user-centered ideation sessions, instantiating design insights in a prototype ``virtual co-host'' system, and testing the system in a formative exploratory lab study ($n=68$ across $12$ groups, $18$ interviews). We found that ideation session participants wanted AI agents to ask questions before intervening, which we formalized as the ``Observe, Ask, Intervene'' (OAI) framework. Participants who used our prototype preferred OAI over fully autonomous intervention, but rationalized away the virtual co-host's critical feedback. From these findings, we derive guidelines for designing AI agents to influence behavior and mediate group work. We also contribute methodological and design guidelines specific to mitigating inequitable meeting participation.

Authors:Louisa Conwill, Megan K. Levis, Karla Badillo-Urquiola, Walter J. Scheirer
Title: Design Patterns for the Common Good: Building Better Technologies Using the Wisdom of Virtue Ethics
Abstract:
Virtue ethics is a philosophical tradition that emphasizes the cultivation of virtues in achieving the common good. It has been suggested to be an effective framework for envisioning more ethical technology, yet previous work on virtue ethics and technology design has remained at theoretical recommendations. Therefore, we propose an approach for identifying user experience design patterns that embody particular virtues to more concretely articulate virtuous technology designs. As a proof of concept for our approach, we documented seven design patterns for social media that uphold the virtues of Catholic Social Teaching. We interviewed 24 technology researchers and industry practitioners to evaluate these patterns. We found that overall the patterns enact the virtues they were identified to embody; our participants valued that the patterns fostered intentional conversations and personal connections. We pave a path for technology professionals to incorporate diverse virtue traditions into the development of technologies that support human flourishing.

Authors:Abhishek Kaushik, Sargam Yadav, Andrew Browne, David Lillis, David Williams, Jack Mc Donnell, Peadar Grant, Siobhan Connolly Kernan, Shubham Sharma, Mansi Arora
Title: Exploring the Impact of Generative Artificial Intelligence in Education: A Thematic Analysis
Abstract:
The recent advancements in Generative Artificial intelligence (GenAI) technology have been transformative for the field of education. Large Language Models (LLMs) such as ChatGPT and Bard can be leveraged to automate boilerplate tasks, create content for personalised teaching, and handle repetitive tasks to allow more time for creative thinking. However, it is important to develop guidelines, policies, and assessment methods in the education sector to ensure the responsible integration of these tools. In this article, thematic analysis has been performed on seven essays obtained from professionals in the education sector to understand the advantages and pitfalls of using GenAI models such as ChatGPT and Bard in education. Exploratory Data Analysis (EDA) has been performed on the essays to extract further insights from the text. The study found several themes which highlight benefits and drawbacks of GenAI tools, as well as suggestions to overcome these limitations and ensure that students are using these tools in a responsible and ethical manner.

Authors:Connor Scully-Allison, Katy Williams, Stephanie Brink, Olga Pearce, Katherine E. Isaacs
Title: A Tale of Two Models: Understanding Data Workers' Internal and External Representations of Complex Data
Abstract:
Data workers may have a a different mental model of their data that the one reified in code. Understanding the organization of their data is necessary for analyzing data, be it through scripting, visualization or abstract thought. More complicated organizations, such as tables with attached hierarchies, may tax people's ability to think about and interact with data. To better understand and ultimately design for these situations, we conduct a study across a team of ten people working with the same reified data model. Through interviews and sketching, we probed their conception of the data model and developed themes through reflexive data analysis. Participants had diverse data models that differed from the reified data model, even among team members who had designed the model, resulting in parallel hazards limiting their ability to reason about the data. From these observations, we suggest potential design interventions for data analysis processes and tools.

Authors:Tianhao He, Karthi Saravanan, Evangelos Niforatos, Gerd Kortuem
Title: "A Great Start, But...": Evaluating LLM-Generated Mind Maps for Information Mapping in Video-Based Design
Abstract:
Extracting concepts and understanding relationships from videos is essential in Video-Based Design (VBD), where videos serve as a primary medium for exploration but require significant effort in managing meta-information. Mind maps, with their ability to visually organize complex data, offer a promising approach for structuring and analysing video content. Recent advancements in Large Language Models (LLMs) provide new opportunities for meta-information processing and visual understanding in VBD, yet their application remains underexplored. This study recruited 28 VBD practitioners to investigate the use of prompt-tuned LLMs for generating mind maps from ethnographic videos. Comparing LLM-generated mind maps with those created by professional designers, we evaluated rated scores, design effectiveness, and user experience across two contexts. Findings reveal that LLMs effectively capture central concepts but struggle with hierarchical organization and contextual grounding. We discuss trust, customization, and workflow integration as key factors to guide future research on LLM-supported information mapping in VBD.

Authors:Lei Zhang, Daekun Kim, Youjean Cho, Ava Robinson, Yu Jiang Tham, Rajan Vaish, Andrés Monroy-Hernández
Title: Jigsaw: Authoring Immersive Storytelling Experiences with Augmented Reality and Internet of Things
Abstract:
Augmented Reality (AR) presents new opportunities for immersive storytelling. However, this immersiveness faces two main hurdles. First, AR's immersive quality is often confined to visual elements, such as pixels on a screen. Second, crafting immersive narratives is complex and generally beyond the reach of amateurs due to the need for advanced technical skills. We introduce Jigsaw, a system that empowers beginners to both experience and craft immersive stories, blending virtual and physical elements. Jigsaw uniquely combines mobile AR with readily available Internet-of-things (IoT) devices. We conducted a qualitative study with 20 participants to assess Jigsaw's effectiveness in both consuming and creating immersive narratives. The results were promising: participants not only successfully created their own immersive stories but also found the playback of three such stories deeply engaging. However, sensory overload emerged as a significant challenge in these experiences. We discuss design trade-offs and considerations for future endeavors in immersive storytelling involving AR and IoT.

Authors:Wenlu Fan, Yuqi Zhu, Chenyang Wang, Bin Wang, Wentao Xu
Title: Consistency of Responses and Continuations Generated by Large Language Models on Social Media
Abstract:
Large Language Models (LLMs) demonstrate remarkable capabilities in text generation, yet their emotional consistency and semantic coherence in social media contexts remain insufficiently understood. This study investigates how LLMs handle emotional content and maintain semantic relationships through continuation and response tasks using two open-source models: Gemma and Llama. By analyzing climate change discussions from Twitter and Reddit, we examine emotional transitions, intensity patterns, and semantic similarity between human-authored and LLM-generated content. Our findings reveal that while both models maintain high semantic coherence, they exhibit distinct emotional patterns: Gemma shows a tendency toward negative emotion amplification, particularly anger, while maintaining certain positive emotions like optimism. Llama demonstrates superior emotional preservation across a broader spectrum of affects. Both models systematically generate responses with attenuated emotional intensity compared to human-authored content and show a bias toward positive emotions in response tasks. Additionally, both models maintain strong semantic similarity with original texts, though performance varies between continuation and response tasks. These findings provide insights into LLMs' emotional and semantic processing capabilities, with implications for their deployment in social media contexts and human-AI interaction design.

Authors:Gabriella Thompson, Ebtesam Al Haque, Paulette Blanc, Meme Styles, Denae Ford, Angela D. R. Smith, Brittany Johnson
Title: An Investigation of Experiences Engaging the Margins in Data-Centric Innovation
Abstract:
Data-centric technologies provide exciting opportunities, but recent research has shown how lack of representation in datasets, often as a result of systemic inequities and socioeconomic disparities, can produce inequitable outcomes that can exclude or harm certain demographics. In this paper, we discuss preliminary insights from an ongoing effort aimed at better understanding barriers to equitable data-centric innovation. We report findings from a survey of 261 technologists and researchers who use data in their work regarding their experiences seeking adequate, representative datasets. Our findings suggest that age and identity play a significant role in the seeking and selection of representative datasets, warranting further investigation into these aspects of data-centric research and development.

Authors:Tangyao Li, Qiyuan Zhan, Yitong Zhu, Bojing Hou, Yuyang Wang
Title: A comparative study of sensory encoding models for human navigation in virtual reality
Abstract:
In virtual reality applications, users often navigate through virtual environments, but the issue of physiological responses, such as cybersickness, fatigue, and cognitive workload, can disrupt or even halt these activities. Despite its impact, the underlying mechanisms of how the sensory system encodes information in VR remain unclear. In this study, we compare three sensory encoding models, Bayesian Efficient Coding, Fitness Maximizing Coding, and the Linear Nonlinear Poisson model, regarding their ability to simulate human navigation behavior in VR. By incorporating the factor of physiological responses into the models, we find that the Bayesian Efficient Coding model generally outperforms the others. Furthermore, the Fitness Maximizing Code framework provides more accurate estimates when the error penalty is small. Our results suggest that the Bayesian Efficient Coding framework offers superior predictions in most scenarios, providing a better understanding of human navigation behavior in VR environments.

Authors:Audrey Salmon, Katie Hammer, Eddie Antonio Santos, Brett A. Becker
Title: Debugging Without Error Messages: How LLM Prompting Strategy Affects Programming Error Explanation Effectiveness
Abstract:
Making errors is part of the programming process -- even for the most seasoned professionals. Novices in particular are bound to make many errors while learning. It is well known that traditional (compiler/interpreter) programming error messages have been less than helpful for many novices and can have effects such as being frustrating, containing confusing jargon, and being downright misleading. Recent work has found that large language models (LLMs) can generate excellent error explanations, but that the effectiveness of these error messages heavily depends on whether the LLM has been provided with context -- typically the original source code where the problem occurred. Knowing that programming error messages can be misleading and/or contain that serves little-to-no use (particularly for novices) we explore the reverse: what happens when GPT-3.5 is prompted for error explanations on just the erroneous source code itself -- original compiler/interpreter produced error message excluded. We utilized various strategies to make more effective error explanations, including one-shot prompting and fine-tuning. We report the baseline results of how effective the error explanations are at providing feedback, as well as how various prompting strategies might improve the explanations' effectiveness. Our results can help educators by understanding how LLMs respond to such prompts that novices are bound to make, and hopefully lead to more effective use of Generative AI in the classroom.

Authors:Qiushi Zhou, Antony Chacon, Jiahe Pan, Wafa Johal
Title: Assisting MoCap-Based Teleoperation of Robot Arm using Augmented Reality Visualisations
Abstract:
Teleoperating a robot arm involves the human operator positioning the robot's end-effector or programming each joint. Whereas humans can control their own arms easily by integrating visual and proprioceptive feedback, it is challenging to control an external robot arm in the same way, due to its inconsistent orientation and appearance. We explore teleoperating a robot arm through motion-capture (MoCap) of the human operator's arm with the assistance of augmented reality (AR) visualisations. We investigate how AR helps teleoperation by visualising a virtual reference of the human arm alongside the robot arm to help users understand the movement mapping. We found that the AR overlay of a humanoid arm on the robot in the same orientation helped users learn the control. We discuss findings and future work on MoCap-based robot teleoperation.

Authors:Jiahe Pan, Sarah Schömbs, Yan Zhang, Ramtin Tabatabaei, Muhammad Bilal, Wafa Johal
Title: OfficeMate: Pilot Evaluation of an Office Assistant Robot
Abstract:
Office Assistant Robots (OARs) offer a promising solution to proactively provide in-situ support to enhance employee well-being and productivity in office spaces. We introduce OfficeMate, a social OAR designed to assist with practical tasks, foster social interaction, and promote health and well-being. Through a pilot evaluation with seven participants in an office environment, we found that users see potential in OARs for reducing stress and promoting healthy habits and value the robot's ability to provide companionship and physical activity reminders in the office space. However, concerns regarding privacy, communication, and the robot's interaction timing were also raised. The feedback highlights the need to carefully consider the robot's appearance and behaviour to ensure it enhances user experience and aligns with office social norms. We believe these insights will better inform the development of adaptive, intelligent OAR systems for future office space integration.

Authors:Christopher Lazik, Christopher Katins, Charlotte Kauter, Jonas Jakob, Caroline Jay, Lars Grunske, Thomas Kosch
Title: The Impostor is Among Us: Can Large Language Models Capture the Complexity of Human Personas?
Abstract:
Large Language Models (LLMs) created new opportunities for generating personas, expected to streamline and accelerate the human-centered design process. Yet, AI-generated personas may not accurately represent actual user experiences, as they can miss contextual and emotional insights critical to understanding real users' needs and behaviors. This introduces a potential threat to quality, especially for novices. This paper examines the differences in how users perceive personas created by LLMs compared to those crafted by humans regarding their credibility for design. We gathered ten human-crafted personas developed by HCI experts according to relevant attributes established in related work. Then, we systematically generated ten personas with an LLM and compared them with human-crafted ones in a survey. The results showed that participants differentiated between human-created and AI-generated personas, with the latter perceived as more informative and consistent. However, participants noted that the AI-generated personas tended to follow stereotypes, highlighting the need for a greater emphasis on diversity when utilizing LLMs for persona creation.

Authors:Tasnim Irtifa Chowdhury, Andrew Vargo, Chris Blakely, Benjamin Tag, Koichi Kise
Title: A Review of Cognitive Readiness, Wearable Devices, and Prospects
Abstract:
In Human-Computer Interaction (HCI) and Ubiquitous Computing, the objective of optimizing device interactions and personalizing user experiences has placed a new emphasis on accurately evaluating cognitive readiness using wearable devices. Interpreting cognitive readiness in real-world scenarios is complex due to the plethora of potential physiological measures, individual variability, and the limitations of wearable devices. In this review, we present a systematic overview of key physiological measures that can be used for an in-depth assessment of cognitive readiness. These measures can serve as proxies for detailed assessments of cognitive readiness. This review serves as a tool for assessing cognitive readiness for diverse applications, with special focus on in-the-wild research settings. In addition, due to the complexity of measurements and devices, we propose the development of robust catalog for cognitive readiness measurements.

Authors:Xujin Li, Wei Wei, Shuang Qiu, Xinyi Zhang, Fu Li, Huiguang He
Title: Integrating Language-Image Prior into EEG Decoding for Cross-Task Zero-Calibration RSVP-BCI
Abstract:
Rapid Serial Visual Presentation (RSVP)-based Brain-Computer Interface (BCI) is an effective technology used for information detection by detecting Event-Related Potentials (ERPs). The current RSVP decoding methods can perform well in decoding EEG signals within a single RSVP task, but their decoding performance significantly decreases when directly applied to different RSVP tasks without calibration data from the new tasks. This limits the rapid and efficient deployment of RSVP-BCI systems for detecting different categories of targets in various scenarios. To overcome this limitation, this study aims to enhance the cross-task zero-calibration RSVP decoding performance. First, we design three distinct RSVP tasks for target image retrieval and build an open-source dataset containing EEG signals and corresponding stimulus images. Then we propose an EEG with Language-Image Prior fusion Transformer (ELIPformer) for cross-task zero-calibration RSVP decoding. Specifically, we propose a prompt encoder based on the language-image pre-trained model to extract language-image features from task-specific prompts and stimulus images as prior knowledge for enhancing EEG decoding. A cross bidirectional attention mechanism is also adopted to facilitate the effective feature fusion and alignment between the EEG and language-image features. Extensive experiments demonstrate that the proposed model achieves superior performance in cross-task zero-calibration RSVP decoding, which promotes the RSVP-BCI system from research to practical application.

Authors:Ebtesam Al Haque, Chris Brown, Thomas D. LaToza, Brittany Johnson
Title: Towards Decoding Developer Cognition in the Age of AI Assistants
Abstract:
Background: The increasing adoption of AI assistants in programming has led to numerous studies exploring their benefits. While developers consistently report significant productivity gains from these tools, empirical measurements often show more modest improvements. While prior research has documented self-reported experiences with AI-assisted programming tools, little to no work has been done to understand their usage patterns and the actual cognitive load imposed in practice. Objective: In this exploratory study, we aim to investigate the role AI assistants play in developer productivity. Specifically, we are interested in how developers' expertise levels influence their AI usage patterns, and how these patterns impact their actual cognitive load and productivity during development tasks. We also seek to better understand how this relates to their perceived productivity. Method: We propose a controlled observational study combining physiological measurements (EEG and eye tracking) with interaction data to examine developers' use of AI-assisted programming tools. We will recruit professional developers to complete programming tasks both with and without AI assistance while measuring their cognitive load and task completion time. Through pre- and post-task questionnaires, we will collect data on perceived productivity and cognitive load using NASA-TLX.

Authors:Sanghyun Park, Boris Maciejovsky, Phanish Puranam
Title: Thinking with Many Minds: Using Large Language Models for Multi-Perspective Problem-Solving
Abstract:
Complex problem-solving requires cognitive flexibility--the capacity to entertain multiple perspectives while preserving their distinctiveness. This flexibility replicates the "wisdom of crowds" within a single individual, allowing them to "think with many minds." While mental simulation enables imagined deliberation, cognitive constraints limit its effectiveness. We propose synthetic deliberation, a Large Language Model (LLM)-based method that simulates discourse between agents embodying diverse perspectives, as a solution. Using a custom GPT-based model, we showcase its benefits: concurrent processing of multiple viewpoints without cognitive degradation, parallel exploration of perspectives, and precise control over viewpoint synthesis. By externalizing the deliberative process and distributing cognitive labor between parallel search and integration, synthetic deliberation transcends mental simulation's limitations. This approach shows promise for strategic planning, policymaking, and conflict resolution.

Authors:Diego Vaquero-Melchor, Ana M. Bernardos, Luca Bergesio
Title: SARA: A Microservice-Based Architecture for Cross-Platform Collaborative Augmented Reality
Abstract:
Augmented Reality (AR) functionalities may be effectively leveraged in collaborative service scenarios (e.g., remote maintenance, on-site building, street gaming, etc.). Standard development cycles for collaborative AR require to code for each specific visualization platform and implement the necessary control mechanisms over the shared assets. This paper describes SARA, an architecture to support cross-platform collaborative Augmented Reality applications based on microservices. The architecture is designed to work over the concept of collaboration models (turn, layer, ownership,hierarchy-based and unconstrained examples) which regulate the interaction and permissions of each user over the AR assets. Thanks to the reusability of its components, during the development of an application, SARA enables focusing on the application logic while avoiding the implementation of the communication protocol, data model handling and orchestration between the different, possibly heterogeneous,devices involved in the collaboration (i.e., mobile or wearable AR devices using different operating systems). To describe how to build an application based on SARA, a prototype for HoloLens and iOS devices has been implemented. the prototype is a collaborative voxel-based game in which several players work real time together on a piece of land, adding or eliminating cubes in a collaborative manner to create buildings and landscapes. Turn-based and unconstrained collaboration models are applied to regulate the interaction, the development workflow for this case study shows how the architecture serves as a framework to support the deployment of collaborative AR services, enabling the reuse of collaboration model components, agnostically handling client technologies.

Authors:Mykola Maslych, Christian Pumarada, Amirpouya Ghasemaghaei, Joseph J. LaViola
Title: Takeaways from Applying LLM Capabilities to Multiple Conversational Avatars in a VR Pilot Study
Abstract:
We present a virtual reality (VR) environment featuring conversational avatars powered by a locally-deployed LLM, integrated with automatic speech recognition (ASR), text-to-speech (TTS), and lip-syncing. Through a pilot study, we explored the effects of three types of avatar status indicators during response generation. Our findings reveal design considerations for improving responsiveness and realism in LLM-driven conversational systems. We also detail two system architectures: one using an LLM-based state machine to control avatar behavior and another integrating retrieval-augmented generation (RAG) for context-grounded responses. Together, these contributions offer practical insights to guide future work in developing task-oriented conversational AI in VR environments.

Authors:Kaixuan Wang, Jason T. Jacques, Chenxin Diao, Carl-Cyril J Dreue
Title: Positioning AI Tools to Support Online Harm Reduction Practice: Applications and Design Directions
Abstract:
Access to accurate and actionable harm reduction information can directly impact the health outcomes of People Who Use Drugs (PWUD), yet existing online channels often fail to meet their diverse and dynamic needs due to limitations in adaptability, accessibility, and the pervasive impact of stigma. Large Language Models (LLMs) present a novel opportunity to enhance information provision, but their application in such a high-stakes domain is under-explored and presents socio-technical challenges. This paper investigates how LLMs can be responsibly designed to support the information needs of PWUD. Through a qualitative workshop involving diverse stakeholder groups (academics, harm reduction practitioners, and an online community moderator), we explored LLM capabilities, identified potential use cases, and delineated core design considerations. Our findings reveal that while LLMs can address some existing information barriers (e.g., by offering responsive, multilingual, and potentially less stigmatising interactions), their effectiveness is contingent upon overcoming challenges related to ethical alignment with harm reduction principles, nuanced contextual understanding, effective communication, and clearly defined operational boundaries. We articulate design pathways emphasising collaborative co-design with experts and PWUD to develop LLM systems that are helpful, safe, and responsibly governed. This work contributes empirically grounded insights and actionable design considerations for the responsible development of LLMs as supportive tools within the harm reduction ecosystem.

Authors:Ruben Janssens, Jens De Bock, Sofie Labat, Eva Verhelst, Veronique Hoste, Tony Belpaeme
Title: Why Robots Are Bad at Detecting Their Mistakes: Limitations of Miscommunication Detection in Human-Robot Dialogue
Abstract:
Detecting miscommunication in human-robot interaction is a critical function for maintaining user engagement and trust. While humans effortlessly detect communication errors in conversations through both verbal and non-verbal cues, robots face significant challenges in interpreting non-verbal feedback, despite advances in computer vision for recognizing affective expressions. This research evaluates the effectiveness of machine learning models in detecting miscommunications in robot dialogue. Using a multi-modal dataset of 240 human-robot conversations, where four distinct types of conversational failures were systematically introduced, we assess the performance of state-of-the-art computer vision models. After each conversational turn, users provided feedback on whether they perceived an error, enabling an analysis of the models' ability to accurately detect robot mistakes. Despite using state-of-the-art models, the performance barely exceeds random chance in identifying miscommunication, while on a dataset with more expressive emotional content, they successfully identified confused states. To explore the underlying cause, we asked human raters to do the same. They could also only identify around half of the induced miscommunications, similarly to our model. These results uncover a fundamental limitation in identifying robot miscommunications in dialogue: even when users perceive the induced miscommunication as such, they often do not communicate this to their robotic conversation partner. This knowledge can shape expectations of the performance of computer vision models and can help researchers to design better human-robot conversations by deliberately eliciting feedback where needed.

Authors:Seraphina Yong, Ashlee Milton, Evan Suma Rosenberg, Stevie Chancellor, Svetlana Yarosh
Title: "I'm Petting the Laptop, Which Has You Inside It": Reflecting on Lived Experiences of Online Friendship
Abstract:
Online(-only) friendships have become increasingly common in daily lives post-COVID despite debates around their mental health benefits and equivalence to ''real'' relationships. Previous research has reflected a need to understand how online friends engage beyond individual platforms, and the lack of platform-agnostic inquiry limits our ability to fully understand the dynamics of online friendship. We employed an activity-grounded analysis of 25 interviews on lived experiences of close online friendship spanning multiple years. Our findings present unique challenges and strategies in online friendships, such as stigma from real-life circles, an ambivalent relationship with online communities, and counter-theoretical reappropriations of communication technology. This study contributes to HCI research in online communities and social interface design by refocusing prior impressions of strong vs. weak-ties in online social spaces and foregrounding time-stable interactions in design for relationship maintenance through technology. Our work also promotes critical reflection on biased perspectives towards technology-mediated practices and consideration of online friends as an invisible marginalized community.

Authors:Blade Frisch, Keith Vertanen
Title: Refining Participatory Design for AAC Users
Abstract:
Augmentative and alternative communication (AAC) is a field of research and practice that works with people who have a communication disability. One form AAC can take is a high-tech tool, such as a software-based communication system. Like all user interfaces, these systems must be designed and it is critical to include AAC users in the design process for their systems. A participatory design approach can include AAC users in the design process, but modifications may be necessary to make these methods more accessible. We present a two-part design process we are investigating for improving the participatory design for high-tech AAC systems. We discuss our plans to refine the accessibility of this process based on participant feedback.

Authors:Rodrigo Oliveira Zacarias, Léo Carvalho Ramos Antunes, Márcio de Oliveira Barros, Rodrigo Pereira dos Santos, Patricia Lago
Title: Exploring Developer Experience Factors in Software Ecosystems
Abstract:
Context: Developer experience (DX) plays a key role in developers' performance and their continued involvement in a software ecosystem (SECO) platform. While researchers and practitioners have recognized several factors affecting DX in SECO platforms, a clear roadmap of the most influential factors is still missing. This is particularly important given the direct impact on developers' interest in SECO and their ongoing engagement with the common technological platform. Goal: This work aims to identify key DX factors and understand how they influence third-party developers' decisions to adopt and keep contributing to a SECO. Methods: We conducted a systematic mapping study (SMS), analyzing 29 studies to assess the state-of-the-art of DX in SECO. Additionally, we conducted a Delphi study to evaluate the influence of 27 DX factors (identified in our SMS) from the perspective of 21 third-party developers to adopt and keep contributing to a SECO. Results: The factors that most strongly influence developers' adoption and ongoing contributions to a SECO are: financial costs for using the platform, desired technical resources for development, low barriers to entry into the applications market, and more financial gains. Conclusion: DX is essential for the success and sustainability of SECO. Our set of DX factors provides valuable insights and recommendations for researchers and practitioners to address key DX concerns from the perspective of third-party developers.

Authors:Adarsa Sivaprasad, Ehud Reiter, David McLernon, Nava Tintarev, Siladitya Bhattacharya, Nir Oren
Title: Patient-Centred Explainability in IVF Outcome Prediction
Abstract:
This paper evaluates the user interface of an in vitro fertility (IVF) outcome prediction tool, focussing on its understandability for patients or potential patients. We analyse four years of anonymous patient feedback, followed by a user survey and interviews to quantify trust and understandability. Results highlight a lay user's need for prediction model \emph{explainability} beyond the model feature space. We identify user concerns about data shifts and model exclusions that impact trust. The results call attention to the shortcomings of current practices in explainable AI research and design and the need for explainability beyond model feature space and epistemic assumptions, particularly in high-stakes healthcare contexts where users gather extensive information and develop complex mental models. To address these challenges, we propose a dialogue-based interface and explore user expectations for personalised explanations.

Authors:J. Recker, R. Lukyanenko, M. A. Jabbari, B. M. Samuel, A. Castellanos
Title: From Representation to Mediation: A New Agenda for Conceptual Modeling Research in A Digital World
Abstract:
The role of information systems (IS) as representations of real-world systems is changing in an increasingly digitalized world, suggesting that conceptual modeling is losing its relevance to the IS field. We argue the opposite: Conceptual modeling research is more relevant to the IS field than ever, but it requires an update with current theory. We develop a new theoretical framework of conceptual modeling that delivers a fundamental shift in the assumptions that govern research in this area. This move can make traditional knowledge about conceptual modeling consistent with the emerging requirements of a digital world. Our framework draws attention to the role of conceptual modeling scripts as mediators between physical and digital realities. We identify new research questions about grammars, methods, scripts, agents, and contexts that are situated in intertwined physical and digital realities. We discuss several implications for conceptual modeling scholarship that relate to the necessity of developing new methods and grammars for conceptual modeling, broadening the methodological array of conceptual modeling scholarship, and considering new dependent variables.

Authors:R. Lukyanenko, O. Pastor, V. C. Storey
Title: Conceptual Modelling for Life Sciences Based on Systemist Foundations
Abstract:
All aspects of our society, including the life sciences, need a mechanism for people working within them to represent the concepts they employ to carry out their research. For the information systems being designed and developed to support researchers and scientists in conducting their work, conceptual models of the relevant domains are usually designed as both blueprints for a system being developed and as a means of communication between the designer and developer. Most conceptual modelling concepts are generic in the sense that they are applied with the same understanding across many applications. Problems in the life sciences, however, are especially complex and important, because they deal with humans, their well-being, and their interactions with the environment as well as other organisms. This work proposes a systemist perspective for creating a conceptual model of a life scientist's problem. We introduce the notion of a system and then show how it can be applied to the development of an information system for handling genomic-related information. We extend our discussion to show how the proposed systemist perspective can support the modelling of precision medicine. This research recognizes challenges in life sciences research of how to model problems to better represent the connections between physical and digital worlds. We propose a new notation that explicitly incorporates systemist thinking, as well as the components of systems based on recent ontological foundations. The new notation captures important semantics in the domain of life sciences. It may be used to facilitate understanding, communication and problem-solving more broadly. We also provide a precise, sound, ontologically supported characterization of the term system, as a basic construct for conceptual modelling in life sciences.

Authors:Bushra Asseri, Estabraq Abdelaziz, Maha Al Mogren, Tayef Alhefdhi, Areej Al-Wabil
Title: Deciphering Emotions in Children Storybooks: A Comparative Analysis of Multimodal LLMs in Educational Applications
Abstract:
Emotion recognition capabilities in multimodal AI systems are crucial for developing culturally responsive educational technologies, yet remain underexplored for Arabic language contexts where culturally appropriate learning tools are critically needed. This study evaluates the emotion recognition performance of two advanced multimodal large language models, GPT-4o and Gemini 1.5 Pro, when processing Arabic children's storybook illustrations. We assessed both models across three prompting strategies (zero-shot, few-shot, and chain-of-thought) using 75 images from seven Arabic storybooks, comparing model predictions with human annotations based on Plutchik's emotional framework. GPT-4o consistently outperformed Gemini across all conditions, achieving the highest macro F1-score of 59% with chain-of-thought prompting compared to Gemini's best performance of 43%. Error analysis revealed systematic misclassification patterns, with valence inversions accounting for 60.7% of errors, while both models struggled with culturally nuanced emotions and ambiguous narrative contexts. These findings highlight fundamental limitations in current models' cultural understanding and emphasize the need for culturally sensitive training approaches to develop effective emotion-aware educational technologies for Arabic-speaking learners.

Authors:Bushra Asseri, Estabrag Abdelaziz, Areej Al-Wabil
Title: Prompt Engineering Techniques for Mitigating Cultural Bias Against Arabs and Muslims in Large Language Models: A Systematic Review
Abstract:
Large language models have demonstrated remarkable capabilities across various domains, yet concerns about cultural bias - particularly towards Arabs and Muslims - pose significant ethical challenges by perpetuating harmful stereotypes and marginalization. Despite growing recognition of bias in LLMs, prompt engineering strategies specifically addressing Arab and Muslim representation remain understudied. This mixed-methods systematic review examines such techniques, offering evidence-based guidance for researchers and practitioners. Following PRISMA guidelines and Kitchenham's systematic review methodology, we analyzed 8 empirical studies published between 2021-2024 investigating bias mitigation strategies. Our findings reveal five primary prompt engineering approaches: cultural prompting, affective priming, self-debiasing techniques, structured multi-step pipelines, and parameter-optimized continuous prompts. Although all approaches show potential for reducing bias, effectiveness varied substantially across studies and bias types. Evidence suggests that certain bias types may be more resistant to prompt-based mitigation than others. Structured multi-step pipelines demonstrated the highest overall effectiveness, achieving up to 87.7% reduction in bias, though they require greater technical expertise. Cultural prompting offers broader accessibility with substantial effectiveness. These results underscore the accessibility of prompt engineering for mitigating cultural bias without requiring access to model parameters. The limited number of studies identified highlights a significant research gap in this critical area. Future research should focus on developing culturally adaptive prompting techniques, creating Arab and Muslim-specific evaluation resources, and integrating prompt engineering with complementary debiasing methods to address deeper stereotypes while maintaining model utility.

Authors:Carter Blair, Kate Larson, Edith Law
Title: Reflective Verbal Reward Design for Pluralistic Alignment
Abstract:
AI agents are commonly aligned with "human values" through reinforcement learning from human feedback (RLHF), where a single reward model is learned from aggregated human feedback and used to align an agent's behavior. However, human values are not homogeneous--different people hold distinct and sometimes conflicting values. Aggregating feedback into a single reward model risks disproportionately suppressing minority preferences. To address this, we present a novel reward modeling approach for learning individualized reward models. Our approach uses a language model to guide users through reflective dialogues where they critique agent behavior and construct their preferences. This personalized dialogue history, containing the user's reflections and critiqued examples, is then used as context for another language model that serves as an individualized reward function (what we call a "verbal reward model") for evaluating new trajectories. In studies with 30 participants, our method achieved a 9-12% improvement in accuracy over non-reflective verbal reward models while being more sample efficient than traditional supervised learning methods.

Authors:Wei Sun, Minghong Fang, Mengyuan Li
Title: VReaves: Eavesdropping on Virtual Reality App Identity and Activity via Electromagnetic Side Channels
Abstract:
Virtual reality (VR) has recently proliferated significantly, consisting of headsets or head-mounted displays (HMDs) and hand controllers for an embodied and immersive experience. The VR device is usually embedded with different kinds of IoT sensors, such as cameras, microphones, communication sensors, etc. However, VR security has not been scrutinized from a physical hardware point of view, especially electromagnetic emanations (EM) that are automatically and unintentionally emitted from the VR headset. This paper presents VReaves, a system that can eavesdrop on the electromagnetic emanation side channel of a VR headset for VR app identification and activity recognition. To do so, we first characterize the electromagnetic emanations from the embedded IoT sensors (e.g., cameras and microphones) in the VR headset through a signal processing pipeline and further propose machine learning models to identify the VR app and recognize the VR app activities. Our experimental evaluation with commercial off-the-shelf VR devices demonstrates the efficiency of VR app identification and activity recognition via electromagnetic emanation side channel.

Authors:Maeve Hutchinson, Radu Jianu, Aidan Slingsby, Jo Wood, Pranava Madhyastha
Title: Capturing Visualization Design Rationale
Abstract:
Prior natural language datasets for data visualization have focused on tasks such as visualization literacy assessment, insight generation, and visualization generation from natural language instructions. These studies often rely on controlled setups with purpose-built visualizations and artificially constructed questions. As a result, they tend to prioritize the interpretation of visualizations, focusing on decoding visualizations rather than understanding their encoding. In this paper, we present a new dataset and methodology for probing visualization design rationale through natural language. We leverage a unique source of real-world visualizations and natural language narratives: literate visualization notebooks created by students as part of a data visualization course. These notebooks combine visual artifacts with design exposition, in which students make explicit the rationale behind their design decisions. We also use large language models (LLMs) to generate and categorize question-answer-rationale triples from the narratives and articulations in the notebooks. We then carefully validate the triples and curate a dataset that captures and distills the visualization design choices and corresponding rationales of the students.

Authors:Vlad Cnejevici, Matthias Ponfick, Raul C. Sîmpetru, Alessandro Del Vecchio
Title: Closed-Loop Control of Electrical Stimulation through Spared Motor Unit Ensembles Restores Foot Movements after Spinal Cord Injury
Abstract:
Restoring movement of a paralyzed foot is a key challenge in helping individuals with neurological conditions such as spinal cord injury (SCI) to improve their quality of life. Neuroprostheses based on functional electrical stimulation (FES) can restore the physiological range of motion by stimulating the affected muscles using surface electrodes. We have previously shown that, despite chronic motor-complete SCI, it is possible to capture paralyzed hand movements in individuals with tetraplegia using spared and modulated motor unit (MU) activity decoded with non-invasive electromyography (EMG) sensors. This study investigated whether a wearable high-density surface EMG system could capture and control paralyzed foot kinematics in closed-loop control with an FES system. We found that all our participants with SCI (2 with chronic SCI and 3 with acute SCI) retained distinct spared EMG activity for at least three ankle movements, which allowed them to reliably control a digital cursor using their spared tibialis anterior and triceps surae MU activity. Movement separability was further reconfirmed by extracting task-modulated MU activity during foot flexion/extension (3-7 modulated MUs/participant). Three participants were further able to modulate and maintain their foot flexion/extension EMG levels with an accuracy of >70%. Lastly, we show that real-time control of a FES system using EMG from the affected limb can restore foot movements in a highly intuitive way, significantly improving the lost or pathological foot range of motion. Our system provides an intuitive approach for closed-loop control of FES that has the potential to assist individuals with SCI in regaining lost motor functions.

Authors:Guilherme Guerino, Luiz Rodrigues, Bruna Capeleti, Rafael Ferreira Mello, André Freire, Luciana Zaina
Title: Can GPT-4o Evaluate Usability Like Human Experts? A Comparative Study on Issue Identification in Heuristic Evaluation
Abstract:
Heuristic evaluation is a widely used method in Human-Computer Interaction (HCI) to inspect interfaces and identify issues based on heuristics. Recently, Large Language Models (LLMs), such as GPT-4o, have been applied in HCI to assist in persona creation, the ideation process, and the analysis of semi-structured interviews. However, considering the need to understand heuristics and the high degree of abstraction required to evaluate them, LLMs may have difficulty conducting heuristic evaluation. However, prior research has not investigated GPT-4o's performance in heuristic evaluation compared to HCI experts in web-based systems. In this context, this study aims to compare the results of a heuristic evaluation performed by GPT-4o and human experts. To this end, we selected a set of screenshots from a web system and asked GPT-4o to perform a heuristic evaluation based on Nielsen's Heuristics from a literature-grounded prompt. Our results indicate that only 21.2% of the issues identified by human experts were also identified by GPT-4o, despite it found 27 new issues. We also found that GPT-4o performed better for heuristics related to aesthetic and minimalist design and match between system and real world, whereas it has difficulty identifying issues in heuristics related to flexibility, control, and user efficiency. Additionally, we noticed that GPT-4o generated several false positives due to hallucinations and attempts to predict issues. Finally, we highlight five takeaways for the conscious use of GPT-4o in heuristic evaluations.

Authors:Mohna Chakraborty, Lu Wang, David Jurgens
Title: Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework
Abstract:
Large language models (LLMs) are increasingly deployed in domains requiring moral understanding, yet their reasoning often remains shallow, and misaligned with human reasoning. Unlike humans, whose moral reasoning integrates contextual trade-offs, value systems, and ethical theories, LLMs often rely on surface patterns, leading to biased decisions in morally and ethically complex scenarios. To address this gap, we present a value-grounded framework for evaluating and distilling structured moral reasoning in LLMs. We benchmark 12 open-source models across four moral datasets using a taxonomy of prompts grounded in value systems, ethical theories, and cognitive reasoning strategies. Our evaluation is guided by four questions: (1) Does reasoning improve LLM decision-making over direct prompting? (2) Which types of value/ethical frameworks most effectively guide LLM reasoning? (3) Which cognitive reasoning strategies lead to better moral performance? (4) Can small-sized LLMs acquire moral competence through distillation? We find that prompting with explicit moral structure consistently improves accuracy and coherence, with first-principles reasoning and Schwartz's + care-ethics scaffolds yielding the strongest gains. Furthermore, our supervised distillation approach transfers moral competence from large to small models without additional inference cost. Together, our results offer a scalable path toward interpretable and value-grounded models.

Authors:Christina Bremer, Harshit Gujral, Michelle Lin, Lily Hinkers, Christoph Becker, Vlad C. Coroamă
Title: How Viable are Energy Savings in Smart Homes? A Call to Embrace Rebound Effects in Sustainable HCI
Abstract:
As part of global climate action, digital technologies are seen as a key enabler of energy efficiency savings. A popular application domain for this work is smart homes. There is a risk, however, that these efficiency gains result in rebound effects, which reduce or even overcompensate the savings. Rebound effects are well-established in economics, but it is less clear whether they also inform smart energy research in other disciplines. In this paper, we ask: to what extent have rebound effects and their underlying mechanisms been considered in computing, HCI and smart home research? To answer this, we conducted a literature mapping drawing on four scientific databases and a SIGCHI corpus. Our results reveal limited consideration of rebound effects and significant opportunities for HCI to advance this topic. We conclude with a taxonomy of actions for HCI to address rebound effects and help determine the viability of energy efficiency projects.

Authors:Nirodya Pussadeniya, Bahareh Nakisa, Mohmmad Naim Rastgoo
Title: Affective-CARA: A Knowledge Graph Driven Framework for Culturally Adaptive Emotional Intelligence in HCI
Abstract:
Culturally adaptive emotional responses remain a critical challenge in affective computing. This paper introduces Affective-CARA, an agentic framework designed to enhance user-agent interactions by integrating a Cultural Emotion Knowledge Graph (derived from StereoKG) with Valence, Arousal, and Dominance annotations, culture-specific data, and cross-cultural checks to minimize bias. A Gradient-Based Reward Policy Optimization mechanism further refines responses according to cultural alignment, affective appropriateness, and iterative user feedback. A Cultural-Aware Response Mediator coordinates knowledge retrieval, reinforcement learning updates, and historical data fusion. By merging real-time user input with past emotional states and cultural insights, Affective-CARA delivers narratives that are deeply personalized and sensitive to diverse cultural norms. Evaluations on AffectNet, SEMAINE DB, and MERD confirm that the framework consistently outperforms baseline models in sentiment alignment, cultural adaptation, and narrative quality. Affective-CARA achieved a Cultural Semantic Density of 9.32 out of 10 and lowered cultural representation bias by 61% (KL-Divergence: 0.28), demonstrating robust performance in generating ethical, adaptive responses. These findings suggest the potential for more inclusive and empathetic interactions, making Affective-CARA an avenue for fostering culturally grounded user experiences across domains such as cross-cultural communication, mental health support, and education.

Authors:Andrew Chang, Chenkai Hu, Ji Qi, Zhuojian Wei, Kexin Zhang, Viswadruth Akkaraju, David Poeppel, Dustin Freeman
Title: Multimodal Fusion with Semi-Supervised Learning Minimizes Annotation Quantity for Modeling Videoconference Conversation Experience
Abstract:
Group conversations over videoconferencing are a complex social behavior. However, the subjective moments of negative experience, where the conversation loses fluidity or enjoyment remain understudied. These moments are infrequent in naturalistic data, and thus training a supervised learning (SL) model requires costly manual data annotation. We applied semi-supervised learning (SSL) to leverage targeted labeled and unlabeled clips for training multimodal (audio, facial, text) deep features to predict non-fluid or unenjoyable moments in holdout videoconference sessions. The modality-fused co-training SSL achieved an ROC-AUC of 0.9 and an F1 score of 0.6, outperforming SL models by up to 4% with the same amount of labeled data. Remarkably, the best SSL model with just 8% labeled data matched 96% of the SL model's full-data performance. This shows an annotation-efficient framework for modeling videoconference experience.

Authors:Lynn Khellaf, Ipek Baris Schlicht, Tilman Mirass, Julia Bayer, Tilman Wagner, Ruben Bouwmeester
Title: SPOT: Bridging Natural Language and Geospatial Search for Investigative Journalists
Abstract:
OpenStreetMap (OSM) is a vital resource for investigative journalists doing geolocation verification. However, existing tools to query OSM data such as Overpass Turbo require familiarity with complex query languages, creating barriers for non-technical users. We present SPOT, an open source natural language interface that makes OSM's rich, tag-based geographic data more accessible through intuitive scene descriptions. SPOT interprets user inputs as structured representations of geospatial object configurations using fine-tuned Large Language Models (LLMs), with results being displayed in an interactive map interface. While more general geospatial search tasks are conceivable, SPOT is specifically designed for use in investigative journalism, addressing real-world challenges such as hallucinations in model output, inconsistencies in OSM tagging, and the noisy nature of user input. It combines a novel synthetic data pipeline with a semantic bundling system to enable robust, accurate query generation. To our knowledge, SPOT is the first system to achieve reliable natural language access to OSM data at this level of accuracy. By lowering the technical barrier to geolocation verification, SPOT contributes a practical tool to the broader efforts to support fact-checking and combat disinformation.

Authors:Xiangyan Chen, Yujian Gan, Yimeng Gu, Matthew Purver
Title: Improving Factuality for Dialogue Response Generation via Graph-Based Knowledge Augmentation
Abstract:
Large Language Models (LLMs) succeed in many natural language processing tasks. However, their tendency to hallucinate - generate plausible but inconsistent or factually incorrect text - can cause significant problems in certain tasks, including response generation in dialogue. To mitigate this issue, we propose two novel graph knowledge-augmented frameworks, Dialogue Response Generation via Textualised Graphs (TG-DRG) and Graph-Aware Dialogue Response Generation (GA-DRG), which combine reasoning-guided dialogue reformulation, dialogue sense knowledge selection, and graph-enhanced response generation to improve the factuality of dialogue responses. To evaluate the factuality of generated responses, we propose a dialogue fact score that addresses the limitations of existing fact-score methods in dialogue settings, providing a more reliable assessment of factual consistency. We evaluate our methods using different baselines on the OpendialKG and HybriDialogue datasets. Our methods noticeably improve factuality compared to other graph knowledge-augmentation baselines, including the state-of-the-art G-retriever, achieving improvements of 3.47% on OpendialKG and 3.12% on HybriDialogue in terms of dialogue fact score. The code will be released on GitHub.

Authors:Nima Hadidi, Jason Chan, Ebrahim Feghhi, Jonathan Kao
Title: SplashNet: Split-and-Share Encoders for Accurate and Efficient Typing with Surface Electromyography
Abstract:
Surface electromyography (sEMG) at the wrists could enable natural, keyboard-free text entry, yet the state-of-the-art emg2qwerty baseline still misrecognizes $51.8\%$ of characters in the zero-shot setting on unseen users and $7.0\%$ after user-specific fine-tuning. We trace many of these errors to mismatched cross-user signal statistics, fragile reliance on high-order feature dependencies, and the absence of architectural inductive biases aligned with the bilateral nature of typing. To address these issues, we introduce three simple modifications: (i) Rolling Time Normalization, which adaptively aligns input distributions across users; (ii) Aggressive Channel Masking, which encourages reliance on low-order feature combinations more likely to generalize across users; and (iii) a Split-and-Share encoder that processes each hand independently with weight-shared streams to reflect the bilateral symmetry of the neuromuscular system. Combined with a five-fold reduction in spectral resolution ($33\!\rightarrow\!6$ frequency bands), these components yield a compact Split-and-Share model, SplashNet-mini, which uses only $\tfrac14$ the parameters and $0.6\times$ the FLOPs of the baseline while reducing character-error rate (CER) to $36.4\%$ zero-shot and $5.9\%$ after fine-tuning. An upscaled variant, SplashNet ($\tfrac12$ the parameters, $1.15\times$ the FLOPs of the baseline), further lowers error to $35.7\%$ and $5.5\%$, representing relative improvements of $31\%$ and $21\%$ in the zero-shot and fine-tuned settings, respectively. SplashNet therefore establishes a new state of the art without requiring additional data.

Authors:Conrad Borchers, Xiaoyi Tian, Kristy Elizabeth Boyer, Maya Israel
Title: Combining Log Data and Collaborative Dialogue Features to Predict Project Quality in Middle School AI Education
Abstract:
Project-based learning plays a crucial role in computing education. However, its open-ended nature makes tracking project development and assessing success challenging. We investigate how dialogue and system interaction logs predict project quality during collaborative, project-based AI learning of 94 middle school students working in pairs. We used linguistic features from dialogue transcripts and behavioral features from system logs to predict three project quality outcomes: productivity (number of training phrases), content richness (word density), and lexical variation (word diversity) of chatbot training phrases. We compared the predictive accuracy of each modality and a fusion of the modalities. Results indicate log data better predicts productivity, while dialogue data is more effective for content richness. Both modalities modestly predict lexical variation. Multimodal fusion improved predictions for productivity and lexical variation of training phrases but not content richness. These findings suggest that the value of multimodal fusion depends on the specific learning outcome. The study contributes to multimodal learning analytics by demonstrating the nuanced interplay between behavioral and linguistic data in assessing student learning progress in open-ended AI learning environments.

Authors:Carla F. Griggio, Boel Nelson, Zefan Sramek, Aslan Askarov
Title: User Perceptions and Attitudes Toward Untraceability in Messaging Platforms
Abstract:
Mainstream messaging platforms offer a variety of features designed to enhance user privacy, such as password-protected chats and end-to-end encryption, which primarily protect message contents. Beyond contents, a lot can be inferred about people simply by tracing who sends and receives messages, when, and how often. This paper explores user perceptions of and attitudes toward "untraceability", defined as preventing third parties from tracing who communicates with whom, to inform the design of privacy-enhancing technologies and untraceable communication protocols. Through a vignette-based qualitative study with 189 participants, we identify a diverse set of features that users perceive to be useful for untraceable messaging, ranging from using aliases instead of real names to VPNs. Through a reflexive thematic analysis, we uncover three overarching attitudes that influence the support or rejection of untraceability in messaging platforms and that can serve as a set of new privacy personas: privacy fundamentalists, who advocate for privacy as a universal right; safety fundamentalists, who support surveillance for the sake of accountability; and optimists, who advocate for privacy in principle but also endorse exceptions in idealistic ways, such as encryption backdoors. We highlight a critical gap between the threat models assumed by users and those addressed by untraceable communication protocols. Many participants understood untraceability as a form of anonymity, but interpret it as senders and receivers hiding their identities from each other, rather than from external network observers. We discuss implications for design of strategic communication and user interfaces of untraceable messaging protocols, and propose framing untraceability as a form of "altruistic privacy", i.e., adopting privacy-enhancing technologies to protect others, as a promising strategy to foster broad adoption.

Authors:Jeongone Seo, Ryan Womack, Tawfiq Ammari
Title: Intergenerational AI Literacy in Korean Immigrant Families: Interpretive Gatekeeping Meets Convenient Critical Deferment
Abstract:
As artificial intelligence (AI) becomes deeply integrated into family life, immigrant families must navigate unique intergenerational, linguistic, and cultural challenges. This study examines how Korean immigrant families in the United States negotiate the use of AI tools such as ChatGPT and smart assistants in their homes. Through 20 semi-structured interviews with parents and teens, we identify two key practices that shape their engagement: interpretive gatekeeping, where parents mediate their children's AI use through a lens of cultural and ethical values, and convenient critical deferment, where teens strategically postpone critical evaluation of AI for immediate academic and social utility. These intertwined practices challenge conventional, skills-based models of AI literacy, revealing it instead as a dynamic and relational practice co-constructed through ongoing family negotiation. We contribute to information science and HCI by offering a new conceptual extension for intergenerational AI literacy and providing design implications for more equitable, culturally attuned, and family-centered AI systems.

Authors:Joseph Corneli, Charles J. Danoff, Raymond S. Puzio, Sridevi Ayloo, Serge Belich, Mary Tedeschi, Charlotte Pierce
Title: Patterns for a New Generation: In-Person and Virtual Workshops
Abstract:
Through a series of workshops, we looked at ways to structure and scaffold group dialogue, and support the emergence of novel design patterns. We contrast these sessions--which we ran with other humans--with two "virtual workshops" which we simulated with ChatGPT. Limitations in both human and virtual settings are discussed, alongside lessons learned. We conclude by proposing a development trajectory that combines AI agents, pattern-based design, and institutional governance.

Authors:J. Parsons, R. Lukyanenko, B. Greenwood, C. Cooper
Title: Understanding and Improving Data Repurposing
Abstract:
We live in an age of unprecedented opportunities to use existing data for tasks not anticipated when those data were collected, resulting in widespread data repurposing. This commentary defines and maps the scope of data repurposing to highlight its importance for organizations and society and the need to study data repurposing as a frontier of data management. We explain how repurposing differs from original data use and data reuse and then develop a framework for data repurposing consisting of concepts and activities for adapting existing data to new tasks. The framework and its implications are illustrated using two examples of repurposing, one in healthcare and one in citizen science. We conclude by suggesting opportunities for research to better understand data repurposing and enable more effective data repurposing practices.

Authors:Liangliang Chen, Huiru Xie, Jacqueline Rohde, Ying Zhang
Title: WIP: Large Language Model-Enhanced Smart Tutor for Undergraduate Circuit Analysis
Abstract:
This research-to-practice work-in-progress (WIP) paper presents an AI-enabled smart tutor designed to provide homework assessment and feedback for students in an undergraduate circuit analysis course. We detail the tutor's design philosophy and core components, including open-ended question answering and homework feedback generation. The prompts are carefully crafted to optimize responses across different problems. The smart tutor was deployed on the Microsoft Azure platform and is currently in use in an undergraduate circuit analysis course at the School of Electrical and Computer Engineering in a large, public, research-intensive institution in the Southeastern United States. Beyond offering personalized instruction and feedback, the tutor collects student interaction data, which is summarized and shared with the course instructor. To evaluate its effectiveness, we collected student feedback, with 90.9% of responses indicating satisfaction with the tutor. Additionally, we analyze a subset of collected data on preliminary circuit analysis topics to assess tutor usage frequency for each problem and identify frequently asked questions. These insights help instructors gain real-time awareness of student difficulties, enabling more targeted classroom instruction. In future work, we will release a full analysis once the complete dataset is available after the Spring 2025 semester. We also explore the potential applications of this smart tutor across a broader range of engineering disciplines by developing improved prompts, diagram-recognition methods, and database management strategies, which remain ongoing areas of research.

Authors:Nicolas Grelier, Johannes Pfau, Nicolas Mathieu, Stéphane Kaufmann
Title: From Fads to Classics -- Analyzing Video Game Trend Evolutions through Steam Tags
Abstract:
The video game industry deals with a fast-paced, competitive and almost unpredictable market. Trends of genres, settings and modalities change on a perpetual basis, studios are often one big hit or miss away from surviving or perishing, and hitting the pulse of the time has become one of the greatest challenges for industrials, investors and other stakeholders. In this work, we aim to support the understanding of video game trends over time based on data-driven analysis, visualization and interpretation of Steam tag evolutions. We confirm underlying groundwork that trends can be categorized in short-lived fads, contemporary fashions, or stable classics, and derived that the surge of a trend averages at about four years in the realm of video games. After using industrial experts to validate our findings, we deliver visualizations, insights and an open approach of deciphering shifts in video game trends.

Authors:Kazuki Kawamura, Jun Rekimoto
Title: SakugaFlow: A Stagewise Illustration Framework Emulating the Human Drawing Process and Providing Interactive Tutoring for Novice Drawing Skills
Abstract:
While current AI illustration tools can generate high-quality images from text prompts, they rarely reveal the step-by-step procedure that human artists follow. We present SakugaFlow, a four-stage pipeline that pairs diffusion-based image generation with a large-language-model tutor. At each stage, novices receive real-time feedback on anatomy, perspective, and composition, revise any step non-linearly, and branch alternative versions. By exposing intermediate outputs and embedding pedagogical dialogue, SakugaFlow turns a black-box generator into a scaffolded learning environment that supports both creative exploration and skills acquisition.

Authors:Manooshree Patel, Rayna Bhattacharyya, Thomas Lu, Arnav Mehta, Niels Voss, Narges Norouzi, Gireeja Ranade
Title: LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs
Abstract:
We present LeanTutor, a Large Language Model (LLM)-based tutoring system for math proofs. LeanTutor interacts with the student in natural language, formally verifies student-written math proofs in Lean, generates correct next steps, and provides the appropriate instructional guidance. LeanTutor is composed of three modules: (i) an autoformalizer/proof-checker, (ii) a next-step generator, and (iii) a natural language feedback generator. The first module faithfully autoformalizes student proofs into Lean and verifies proof accuracy via successful code compilation. If the proof has an error, the incorrect step is identified. The next-step generator module outputs a valid next Lean tactic for incorrect proofs via LLM-based candidate generation and proof search. The feedback generator module leverages Lean data to produce a pedagogically-motivated natural language hint for the student user. To evaluate our system, we introduce PeanoBench, a human-written dataset derived from the Natural Numbers Game, consisting of 371 Peano Arithmetic proofs, where each natural language proof step is paired with the corresponding logically equivalent tactic in Lean. The Autoformalizer correctly formalizes 57% of tactics in correct proofs and accurately identifies the incorrect step in 30% of incorrect proofs. In generating natural language hints for erroneous proofs, LeanTutor outperforms a simple baseline on accuracy and relevance metrics.

Authors:Kat R. Agres, Adyasha Dash, Phoebe Chua, Stefan K. Ehrlich
Title: AffectMachine-Pop: A controllable expert system for real-time pop music generation
Abstract:
Music is a powerful medium for influencing listeners' emotional states, and this capacity has driven a surge of research interest in AI-based affective music generation in recent years. Many existing systems, however, are a black box which are not directly controllable, thus making these systems less flexible and adaptive to users. We present \textit{AffectMachine-Pop}, an expert system capable of generating retro-pop music according to arousal and valence values, which can either be pre-determined or based on a listener's real-time emotion states. To validate the efficacy of the system, we conducted a listening study demonstrating that AffectMachine-Pop is capable of generating affective music at target levels of arousal and valence. The system is tailored for use either as a tool for generating interactive affective music based on user input, or for incorporation into biofeedback or neurofeedback systems to assist users with emotion self-regulation.

Authors:Yichi Zhang, Brandon Lyman, Celia Pearce, Miso Kim, Casper Harteveld, Leanne Chukoskie, Bob De Schutter
Title: Integrating Artificial Intelligence as Assistive Technology for Older Adult Gamers: A Pilot Study
Abstract:
With respect to digital games, older adults are a demographic that is often underserved due to an industry-wide focus on younger audiences' preferences and skill sets. Meanwhile, as artificial intelligence (AI) continues to expand into everyday technologies, its assistive capabilities have been recognized, suggesting its potential in improving the gaming experience for older gamers. To study this potential, we iteratively developed a pilot survey aimed at understanding older adult gamers' current gameplay preference, challenges they are facing, and their perspectives of AI usage in gaming. This article contributes an overview of our iterative survey-design workflow, and pilot results from 39 participants. During each iteration, we analyzed the survey's efficacy and adjusted the content, language, and format to better capture meaningful data, and was able to create a refined survey for a larger, more representative future parent study. At the same time, preliminary findings suggest that for older adult gamers, usability issues in gaming remain key obstacles, while this demographic's perceptions of AI are shaped by both its practical benefits and concerns about autonomy and complexity. These findings also offer early insights for the design of age-inclusive, AI-supported gaming experiences.

Authors:Brandon Lyman, Yichi Zhang, Celia Pearce, Miso Kim, Casper Harteveld, Leanne Chukoskie, Bob De Schutter
Title: Supporting Aging Well through Accessible Digital Games: The Supplemental Role of AI in Game Design for Older Adults
Abstract:
As the population continues to age, and gaming continues to grow as a hobby for older people, heterogeneity among older adult gamers is increasing. We argue that traditional game-based accessibility features, such as simplified input schemes, redundant information channels, and increased legibility of digital user interfaces, are increasingly limited in the face of this heterogeneity. This is because such features affect all older adult players simultaneously and therefore are designed generically. We introduce artificial intelligence, although it has its own limitations and ethical concerns, as a method of creating player-based accessibility features, given the adaptive nature of the emerging technology. These accessibility features may help to address unique assemblage of accessibility needs an individual may accumulate through age. We adopt insights from gerontology, HCI, and disability studies into the digital game design discourse for older adults, and we contribute insight that can guide the integration of player-based accessibility features to supplement game-based counterparts. The accessibility of digital games for heterogenous older adult audience is paramount, as the medium offers short-term social, emotional, psychological, cognitive, and physical that support the long-term goal of aging well.

Authors:Leah Hope Ajmani, Nuredin Ali Abdelkadir, Stevie Chancellor
Title: Secondary Stakeholders in AI: Fighting for, Brokering, and Navigating Agency
Abstract:
As AI technologies become more human-facing, there have been numerous calls to adapt participatory approaches to AI development -- spurring the idea of participatory AI. However, these calls often focus only on primary stakeholders, such as end-users, and not secondary stakeholders. This paper seeks to translate the ideals of participatory AI to a broader population of secondary AI stakeholders through semi-structured interviews. We theorize that meaningful participation involves three participatory ideals: (1) informedness, (2) consent, and (3) agency. We also explore how secondary stakeholders realize these ideals by traversing a complicated problem space. Like walking up the rungs of a ladder, these ideals build on one another. We introduce three stakeholder archetypes: the reluctant data contributor, the unsupported activist, and the well-intentioned practitioner, who must navigate systemic barriers to achieving agentic AI relationships. We envision an AI future where secondary stakeholders are able to meaningfully participate with the AI systems they influence and are influenced by.

Authors:Emmanuel Deruty, Maarten Grachten
Title: Insights on Harmonic Tones from a Generative Music Experiment
Abstract:
The ultimate purpose of generative music AI is music production. The studio-lab, a social form within the art-science branch of cross-disciplinarity, is a way to advance music production with AI music models. During a studio-lab experiment involving researchers, music producers, and an AI model for music generating bass-like audio, it was observed that the producers used the model's output to convey two or more pitches with a single harmonic complex tone, which in turn revealed that the model had learned to generate structured and coherent simultaneous melodic lines using monophonic sequences of harmonic complex tones. These findings prompt a reconsideration of the long-standing debate on whether humans can perceive harmonics as distinct pitches and highlight how generative AI can not only enhance musical creativity but also contribute to a deeper understanding of music.

Authors:Szeyi Chan, Jiachen Li, Siman Ao, Yufei Wang, Ibrahim Bilau, Brian Jones, Eunhwa Yang, Elizabeth D Mynatt, Xiang Zhi Tan
Title: Insights from Designing Context-Aware Meal Preparation Assistance for Older Adults with Mild Cognitive Impairment (MCI) and Their Care Partners
Abstract:
Older adults with mild cognitive impairment (MCI) often face challenges during meal preparation, such as forgetting ingredients, skipping steps, or leaving appliances on, which can compromise their safety and independence. Our study explores the design of context-aware assistive technologies for meal preparation using a user-centered iterative design process. Through three iterative phases of design and feedback, evolving from low-tech lightbox to a digital screen, we gained insights into managing diverse contexts and personalizing assistance through collaboration with older adults with MCI and their care partners. We concluded our findings in three key contexts--routine-based, real-time, and situational--that informed strategies for designing context-aware meal prep assistance tailored to users' needs. Our results provide actionable insights for creating technologies to assist meal preparation that are personalized for the unique lifestyles of older adults with MCI, situated in the complex and dynamic homebound context, and respecting the collaboration between older adults and their care partners.

Authors:Valerie Krug, Sebastian Stober
Title: Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models
Abstract:
Deep Learning models have achieved remarkable success. Training them is often accelerated by building on top of pre-trained models which poses the risk of perpetuating encoded biases. Here, we investigate biases in the representations of commonly used ImageNet classifiers for facial images while considering intersections of sensitive variables age, race and gender. To assess the biases, we use linear classifier probes and visualize activations as topographic maps. We find that representations in ImageNet classifiers particularly allow differentiation between ages. Less strongly pronounced, the models appear to associate certain ethnicities and distinguish genders in middle-aged groups.

Authors:Kyra Wang, Boon-Kiat Quek, Jessica Goh, Dorien Herremans
Title: To Embody or Not: The Effect Of Embodiment On User Perception Of LLM-based Conversational Agents
Abstract:
Embodiment in conversational agents (CAs) refers to the physical or visual representation of these agents, which can significantly influence user perception and interaction. Limited work has been done examining the effect of embodiment on the perception of CAs utilizing modern large language models (LLMs) in non-hierarchical cooperative tasks, a common use case of CAs as more powerful models become widely available for general use. To bridge this research gap, we conducted a mixed-methods within-subjects study on how users perceive LLM-based CAs in cooperative tasks when embodied and non-embodied. The results show that the non-embodied agent received significantly better quantitative appraisals for competence than the embodied agent, and in qualitative feedback, many participants believed that the embodied CA was more sycophantic than the non-embodied CA. Building on prior work on users' perceptions of LLM sycophancy and anthropomorphic features, we theorize that the typically-positive impact of embodiment on perception of CA credibility can become detrimental in the presence of sycophancy. The implication of such a phenomenon is that, contrary to intuition and existing literature, embodiment is not a straightforward way to improve a CA's perceived credibility if there exists a tendency to sycophancy.

Authors:Sebe Vanbrabant, Gustavo Rovelo Ruiz, Davy Vanacken
Title: Composable Building Blocks for Controllable and Transparent Interactive AI Systems
Abstract:
While the increased integration of AI technologies into interactive systems enables them to solve an equally increasing number of tasks, the black box problem of AI models continues to spread throughout the interactive system as a whole. Explainable AI (XAI) techniques can make AI models more accessible by employing post-hoc methods or transitioning to inherently interpretable models. While this makes individual AI models clearer, the overarching system architecture remains opaque. To this end, we propose an approach to represent interactive systems as sequences of structural building blocks, such as AI models and control mechanisms grounded in the literature. These can then be explained through accompanying visual building blocks, such as XAI techniques. The flow and APIs of the structural building blocks form an explicit overview of the system. This serves as a communication basis for both humans and automated agents like LLMs, aligning human and machine interpretability of AI models. We discuss a selection of building blocks and concretize our flow-based approach in an architecture and accompanying prototype interactive system.

Authors:Takao Fujii, Katie Seaborn, Madeleine Steeds, Jun Kato
Title: Inter(sectional) Alia(s): Ambiguity in Voice Agent Identity via Intersectional Japanese Self-Referents
Abstract:
Conversational agents that mimic people have raised questions about the ethics of anthropomorphizing machines with human social identity cues. Critics have also questioned assumptions of identity neutrality in humanlike agents. Recent work has revealed that intersectional Japanese pronouns can elicit complex and sometimes evasive impressions of agent identity. Yet, the role of other "neutral" non-pronominal self-referents (NPSR) and voice as a socially expressive medium remains unexplored. In a crowdsourcing study, Japanese participants (N = 204) evaluated three ChatGPT voices (Juniper, Breeze, and Ember) using seven self-referents. We found strong evidence of voice gendering alongside the potential of intersectional self-referents to evade gendering, i.e., ambiguity through neutrality and elusiveness. Notably, perceptions of age and formality intersected with gendering as per sociolinguistic theories, especially boku and watakushi. This work provides a nuanced take on agent identity perceptions and champions intersectional and culturally-sensitive work on voice agents.

Authors:Parsa Hassani Shariat Panahi, Amir Hossein Jalilvand, M. Hassan Najafi
Title: Bridging Subjective and Objective QoE: Operator-Level Aggregation Using LLM-Based Comment Analysis and Network MOS Comparison
Abstract:
This paper introduces a dual-layer framework for network operator-side quality of experience (QoE) assessment that integrates both objective network modeling and subjective user perception extracted from live-streaming platforms. On the objective side, we develop a machine learning model trained on mean opinion scores (MOS) computed via the ITU-T P.1203 reference implementation, allowing accurate prediction of user-perceived video quality using only network parameters such as packet loss, delay, jitter, and throughput without reliance on video content or client-side instrumentation. On the subjective side, we present a semantic filtering and scoring pipeline that processes user comments from live streams to extract performance-related feedback. A large language model is used to assign scalar MOS scores to filtered comments in a deterministic and reproducible manner. To support scalable and interpretable analysis, we construct a labeled dataset of 47,894 live-stream comments, of which about 34,000 are identified as QoE-relevant through multi-layer semantic filtering. Each comment is enriched with simulated Internet Service Provider attribution and temporally aligned using synthetic timestamps in 5-min intervals. The resulting dataset enables operator-level aggregation and time-series analysis of user-perceived quality. A delta MOS metric is proposed to measure each Internet service provider's deviation from platform-wide sentiment, allowing detection of localized degradations even in the absence of direct network telemetry. A controlled outage simulation confirms the framework's effectiveness in identifying service disruptions through comment-based trends alone. The system provides each operator with its own subjective MOS and the global platform average per interval, enabling real-time interpretation of performance deviations and comparison with objective network-based QoE estimates.

Authors:Keyeun Lee, Seolhee Lee, Esther Hehsun Kim, Yena Ko, Jinsu Eun, Dahee Kim, Hyewon Cho, Haiyi Zhu, Robert E. Kraut, Eunyoung Suh, Eun-mee Kim, Hajin Lim
Title: Adaptive-VP: A Framework for LLM-Based Virtual Patients that Adapts to Trainees' Dialogue to Facilitate Nurse Communication Training
Abstract:
Effective communication training is essential to preparing nurses for high-quality patient care. While standardized patient (SP) simulations provide valuable experiential learning, they are often costly and inflexible. Virtual patient (VP) systems offer a scalable alternative, but most fail to adapt to the varying communication skills of trainees. In particular, when trainees respond ineffectively, VPs should escalate in hostility or become uncooperative--yet this level of adaptive interaction remains largely unsupported. To address this gap, we introduce Adaptive-VP, a VP dialogue generation framework that leverages large language models (LLMs) to dynamically adapt VP behavior based on trainee input. The framework features a pipeline for constructing clinically grounded yet flexible VP scenarios and a modular system for assessing trainee communication and adjusting VP responses in real time, while ensuring learner safety. We validated Adaptive-VP by simulating challenging patient conversations. Automated evaluation using a corpus from practicing nurses showed that our communication skill evaluation mechanism reflected real-world proficiency levels. Expert nurses further confirmed that Adaptive-VP produced more natural and realistic interactions than existing approaches, demonstrating its potential as a scalable and effective tool for nursing communication training.

Authors:Taku Yamazaki, Kaito Watanabe, Tatsuya Kase, Kenta Hasegawa, Koki Saida, Takumi Miyoshi
Title: A 3D Mobile Crowdsensing Framework for Sustainable Urban Digital Twins
Abstract:
In this article, we propose a 3D mobile crowdsensing (3D-MCS) framework aimed at sustainable urban digital twins (UDTs). The framework comprises four key mechanisms: (1) the 3D-MCS mechanism, consisting of active and passive models; (2) the Geohash-based spatial information management mechanism; (3) the dynamic point cloud integration mechanism for UDTs; and (4) the web-based real-time visualizer for 3D-MCS and UDTs. The active sensing model features a gamified 3D-MCS approach, where participants collect point cloud data through an augmented reality territory coloring game. In contrast, the passive sensing model employs a wearable 3D-MCS approach, where participants wear smartphones around their necks without disrupting daily activities. The spatial information management mechanism efficiently partitions the space into regions using Geohash. The dynamic point cloud integration mechanism incorporates point clouds collected by 3D-MCS into UDTs through global and local point cloud registration. Finally, we evaluated the proposed framework through real-world experiments. We verified the effectiveness of the proposed 3D-MCS models from the perspectives of subjective evaluation and data collection and analysis. Furthermore, we analyzed the performance of the dynamic point cloud integration using a dataset.

Authors:Tawfiq Ammari, Meilun Chen, S M Mehedi Zaman, Kiran Garimella
Title: How Students (Really) Use ChatGPT: Uncovering Experiences Among Undergraduate Students
Abstract:
This study investigates how undergraduate students engage with ChatGPT in self-directed learning contexts. Analyzing naturalistic interaction logs, we identify five dominant use categories of ChatGPT: information seeking, content generation, language refinement, metacognitive engagement, and conversational repair. Behavioral modeling reveals that structured, goal-driven tasks like coding, multiple-choice solving, and job application writing are strong predictors of continued use. Drawing on Self-Directed Learning (SDL) and the Uses and Gratifications Theory (UGT), we show how students actively manage ChatGPT's affordances and limitations through prompt adaptation, follow-ups, and emotional regulation. Rather than disengaging after breakdowns, students often persist through clarification and repair, treating the assistant as both tool and learning partner. We also offer design and policy recommendations to support transparent, responsive, and pedagogically grounded integration of generative AI in higher education.

Authors:Bhawana Chhaglani, Sarmistha Sarna Gomasta, Yuvraj Agarwal, Jeremy Gummeson, Prashant Shenoy
Title: FeatureSense: Protecting Speaker Attributes in Always-On Audio Sensing System
Abstract:
Audio is a rich sensing modality that is useful for a variety of human activity recognition tasks. However, the ubiquitous nature of smartphones and smart speakers with always-on microphones has led to numerous privacy concerns and a lack of trust in deploying these audio-based sensing systems. This paper addresses this critical challenge of preserving user privacy when using audio for sensing applications while maintaining utility. While prior work focuses primarily on protecting recoverable speech content, we show that sensitive speaker-specific attributes such as age and gender can still be inferred after masking speech and propose a comprehensive privacy evaluation framework to assess this speaker attribute leakage. We design and implement FeatureSense, an open-source library that provides a set of generalizable privacy-aware audio features that can be used for wide range of sensing applications. We present an adaptive task-specific feature selection algorithm that optimizes the privacy-utility-cost trade-off based on the application requirements. Through our extensive evaluation, we demonstrate the high utility of FeatureSense across a diverse set of sensing tasks. Our system outperforms existing privacy techniques by 60.6% in preserving user-specific privacy. This work provides a foundational framework for ensuring trust in audio sensing by enabling effective privacy-aware audio classification systems.

Authors:Vishnu Ramineni, Shivareddy Devarapalli, Balakrishna Pothineni, Prema Kumar Veerapaneni, Aditya Gupta, Pankaj Gupta
Title: Advancing Digital Accessibility: Integrating AR/VR and Health Tech for Inclusive Healthcare Solutions
Abstract:
Modern healthcare domain incorporates a feature of digital accessibility to ensure seamless flow of online services for the patients. However, this feature of digital accessibility poses a challenge particularly for patients with disabilities. To eradicate this issue and provide immersive and user-friendly experiences, evolving technologies like Augmented Reality (AR) and Virtual Reality (VR) are integrated in medical applications to enhance accessibility. The present research paper aims to study inclusivity and accessibility features of AR/VR in revolutionizing healthcare practices especially in domains like telemedicine, patient education, assistive tools, and rehabilitation for persons with disabilities. The current trends of advancements and case studies are also analyzed to measure the efficacy of AR/VR in healthcare. Moreover, the paper entails a detailed analysis of the challenges of its adoption particularly technical limitations, implementation costs, and regulatory aspects. Finally, the paper concludes with recommendations for integrating AR/VR to foster a more equitable and inclusive healthcare system and provide individuals with auditory, visual, and motor impairments with digital healthcare solutions.

Authors:Xiaoyuan Wu, Weiran Lin, Omer Akgul, Lujo Bauer
Title: Estimating LLM Consistency: A User Baseline vs Surrogate Metrics
Abstract:
Large language models (LLMs) are prone to hallucinations and sensitive to prompt perturbations, often resulting in inconsistent or unreliable generated text. Different methods have been proposed to mitigate such hallucinations and fragility -- one of them being measuring the consistency (the model's confidence in the response, or likelihood of generating a similar response when resampled) of LLM responses. In previous work, measuring consistency often relied on the probability of a response appearing within a pool of resampled responses, or internal states or logits of responses. However, it is not yet clear how well these approaches approximate how humans perceive the consistency of LLM responses. We performed a user study (n=2,976) and found current methods typically do not approximate users' perceptions of LLM consistency very well. We propose a logit-based ensemble method for estimating LLM consistency, and we show that this method matches the performance of the best-performing existing metric in estimating human ratings of LLM consistency. Our results suggest that methods of estimating LLM consistency without human evaluation are sufficiently imperfect that we suggest evaluation with human input be more broadly used.

Authors:Xiaoye Michael Wang, Matthew Prenevost, Aneesh Tarun, Ian Robinson, Michael Nitsche, Gabby Resch, Ali Mazalek, Timothy N. Welsh
Title: Investigating A Geometrical Solution to the Vergence-Accommodation Conflict for Targeted Movements in Virtual Reality
Abstract:
While virtual reality (VR) holds significant potential to revolutionize digital user interaction, how visual information is presented through VR head-mounted displays (HMDs) differs from naturalistic viewing and interactions in physical environments, leading to performance decrements. One critical challenge in VR development is the vergence-accommodation conflict (VAC), which arises due to the intrinsic constraints of approximating the natural viewing geometry through digital displays. Although various hardware and software solutions have been proposed to address VAC, no commercially viable option has been universally adopted by manufacturers. This paper presents and evaluates a software solution grounded in a vision-based geometrical model of VAC that mediates VAC's impact on movement in VR. This model predicts the impact of VAC as a constant offset to the vergence angle, distorting the binocular viewing geometry that results in movement undershooting. In Experiment 1, a 3D pointing task validated the model's predictions and demonstrated that VAC primarily affects online movements involving real-time visual feedback. Experiment 2 implemented a shader program to rectify the effect of VAC, improving movement accuracy by approximately 30%. Overall, this work presented a practical approach to reducing the impact of VAC on HMD-based manual interactions, enhancing the user experience in virtual environments.

Authors:Emmanuel Anaya González, Raven Rothkopf, Sorin Lerner, Nadia Polikarpova
Title: HiLDe: Intentional Code Generation via Human-in-the-Loop Decoding
Abstract:
While AI programming tools hold the promise of increasing programmers' capabilities and productivity to a remarkable degree, they often exclude users from essential decision-making processes, causing many to effectively "turn off their brains" and over-rely on solutions provided by these systems. These behaviors can have severe consequences in critical domains, like software security. We propose Human-in-the-loop Decoding, a novel interaction technique that allows users to observe and directly influence LLM decisions during code generation, in order to align the model's output with their personal requirements. We implement this technique in HiLDe, a code completion assistant that highlights critical decisions made by the LLM and provides local alternatives for the user to explore. In a within-subjects study (N=18) on security-related tasks, we found that HiLDe led participants to generate significantly fewer vulnerabilities and better align code generation with their goals compared to a traditional code completion assistant.

Authors:Kyzyl Monteiro, Yuchen Wu, Sauvik Das
Title: Imago Obscura: An Image Privacy AI Co-pilot to Enable Identification and Mitigation of Risks
Abstract:
Users often struggle to navigate the privacy / publicity boundary in sharing images online: they may lack awareness of image privacy risks and/or the ability to apply effective mitigation strategies. To address this challenge, we introduce and evaluate Imago Obscura, an AI-powered, image-editing copilot that enables users to identify and mitigate privacy risks with images they intend to share. Driven by design requirements from a formative user study with 7 image-editing experts, Imago Obscura enables users to articulate their image-sharing intent and privacy concerns. The system uses these inputs to surface contextually pertinent privacy risks, and then recommends and facilitates application of a suite of obfuscation techniques found to be effective in prior literature -- e.g., inpainting, blurring, and generative content replacement. We evaluated Imago Obscura with 15 end-users in a lab study and found that it greatly improved users' awareness of image privacy risks and their ability to address those risks, allowing them to make more informed sharing decisions.

Authors:Robin Burchard, Kristof Van Laerhoven
Title: Enhancing Wearable Tap Water Audio Detection through Subclass Annotation in the HD-Epic Dataset
Abstract:
Wearable human activity recognition has been shown to benefit from the inclusion of acoustic data, as the sounds around a person often contain valuable context. However, due to privacy concerns, it is usually not ethically feasible to record and save microphone data from the device, since the audio could, for instance, also contain private conversations. Rather, the data should be processed locally, which in turn requires processing power and consumes energy on the wearable device. One special use case of contextual information that can be utilized to augment special tasks in human activity recognition is water flow detection, which can, e.g., be used to aid wearable hand washing detection. We created a new label called tap water for the recently released HD-Epic data set, creating 717 hand-labeled annotations of tap water flow, based on existing annotations of the water class. We analyzed the relation of tap water and water in the dataset and additionally trained and evaluated two lightweight classifiers to evaluate the newly added label class, showing that the new class can be learned more easily.

Authors:Ryo Iijima, Shigeo Yoshida, Atsushi Hashimoto, Jiaxin Ma
Title: FairTalk: Facilitating Balanced Participation in Video Conferencing by Implicit Visualization of Predicted Turn-Grabbing Intention
Abstract:
Creating fair opportunities for all participants to contribute is a notable challenge in video conferencing. This paper introduces FairTalk, a system that facilitates the subconscious redistribution of speaking opportunities. FairTalk predicts participants' turn-grabbing intentions using a machine learning model trained on web-collected videoconference data with positive-unlabeled learning, where turn-taking detection provides automatic positive labels. To subtly balance speaking turns, the system visualizes predicted intentions by mimicking natural human behaviors associated with the desire to speak. A user study suggests that FairTalk may help improve speaking balance, though subjective feedback indicates no significant perceived impact. We also discuss design implications derived from participant interviews.

Authors:Ryo Ohara, Chi-Lan Yang, Takuji Narumi, Hideaki Kuzuoka
Title: Understanding and Supporting Co-viewing Comedy in VR with Embodied Expressive Avatars
Abstract:
Co-viewing videos with family and friends remotely has become prevalent with the support of communication channels such as text messaging or real-time voice chat. However, current co-viewing platforms often lack visible embodied cues, such as body movements and facial expressions. This absence can reduce emotional engagement and the sense of co-presence when people are watching together remotely. Although virtual reality (VR) is an emerging technology that allows individuals to participate in various social activities while embodied as avatars, we still do not fully understand how this embodiment in VR affects co-viewing experiences, particularly in terms of engagement, emotional contagion, and expressive norms. In a controlled experiment involving eight triads of three participants each (N=24), we compared the participants' perceptions and reactions while watching comedy in VR using embodied expressive avatars that displayed visible laughter cues. This was contrasted with a control condition where no such embodied expressions were presented. With a mixed-method analysis, we found that embodied laughter cues shifted participants' engagement from individual immersion to socially coordinated participation. Participants reported heightened self-awareness of emotional expression, greater emotional contagion, and the development of expressive norms surrounding co-viewers' laughter. The result highlighted the tension between individual engagement and interpersonal emotional accommodation when co-viewing with embodied expressive avatars.

Authors:Riley Simmons-Edler, Jean Dong, Paul Lushenko, Kanaka Rajan, Ryan P. Badman
Title: Military AI Needs Technically-Informed Regulation to Safeguard AI Research and its Applications
Abstract:
Military weapon systems and command-and-control infrastructure augmented by artificial intelligence (AI) have seen rapid development and deployment in recent years. However, the sociotechnical impacts of AI on combat systems, military decision-making, and the norms of warfare have been understudied. We focus on a specific subset of lethal autonomous weapon systems (LAWS) that use AI for targeting or battlefield decisions. We refer to this subset as AI-powered lethal autonomous weapon systems (AI-LAWS) and argue that they introduce novel risks -- including unanticipated escalation, poor reliability in unfamiliar environments, and erosion of human oversight -- all of which threaten both military effectiveness and the openness of AI research. These risks cannot be addressed by high-level policy alone; effective regulation must be grounded in the technical behavior of AI models. We argue that AI researchers must be involved throughout the regulatory lifecycle. Thus, we propose a clear, behavior-based definition of AI-LAWS -- systems that introduce unique risks through their use of modern AI -- as a foundation for technically grounded regulation, given that existing frameworks do not distinguish them from conventional LAWS. Using this definition, we propose several technically-informed policy directions and invite greater participation from the AI research community in military AI policy discussions.

Authors:Jeongone Seo, Tawfiq Ammari
Title: Pragmatic Disengagement and Culturally Situated Non Use Older Korean Immigrants Strategies for Navigating Digital Noise
Abstract:
Older immigrant adults often face layered barriers to digital participation, including language exclusion, generational divides, and emotional fatigue. This study examines how older Korean immigrants in the greater NYC area selectively engage with digital tools such as smartphones, YouTube, and AI platforms. Using a community-based participatory research (CBPR) framework and 22 semi-structured interviews, we identify two key practices: pragmatic disengagement, where users avoid emotionally taxing or culturally misaligned content, and interdependent navigation, where digital use is shaped through reliance on family or community support. These strategies challenge deficit-oriented narratives of non-use, showing how disengagement can be thoughtful, protective, and culturally situated. We contribute to CSCW by expanding theories of non-use and algorithmic resistance and by offering design and policy recommendations to support more dignified, culturally attuned digital engagement for aging immigrant populations.

Authors:Ashwin George, Luciano Cavalcante Siebert, David A. Abbink, Arkady Zgonnikov
Title: Feasible Action Space Reduction for Quantifying Causal Responsibility in Continuous Spatial Interactions
Abstract:
Understanding the causal influence of one agent on another agent is crucial for safely deploying artificially intelligent systems such as automated vehicles and mobile robots into human-inhabited environments. Existing models of causal responsibility deal with simplified abstractions of scenarios with discrete actions, thus, limiting real-world use when understanding responsibility in spatial interactions. Based on the assumption that spatially interacting agents are embedded in a scene and must follow an action at each instant, Feasible Action-Space Reduction (FeAR) was proposed as a metric for causal responsibility in a grid-world setting with discrete actions. Since real-world interactions involve continuous action spaces, this paper proposes a formulation of the FeAR metric for measuring causal responsibility in space-continuous interactions. We illustrate the utility of the metric in prototypical space-sharing conflicts, and showcase its applications for analysing backward-looking responsibility and in estimating forward-looking responsibility to guide agent decision making. Our results highlight the potential of the FeAR metric for designing and engineering artificial agents, as well as for assessing the responsibility of agents around humans.

Authors:Dillon Lohr, Michael J. Proulx, Mehedi Hasan Raju, Oleg V. Komogortsev
Title: Ocular Authentication: Fusion of Gaze and Periocular Modalities
Abstract:
This paper investigates the feasibility of fusing two eye-centric authentication modalities-eye movements and periocular images-within a calibration-free authentication system. While each modality has independently shown promise for user authentication, their combination within a unified gaze-estimation pipeline has not been thoroughly explored at scale. In this report, we propose a multimodal authentication system and evaluate it using a large-scale in-house dataset comprising 9202 subjects with an eye tracking (ET) signal quality equivalent to a consumer-facing virtual reality (VR) device. Our results show that the multimodal approach consistently outperforms both unimodal systems across all scenarios, surpassing the FIDO benchmark. The integration of a state-of-the-art machine learning architecture contributed significantly to the overall authentication performance at scale, driven by the model's ability to capture authentication representations and the complementary discriminative characteristics of the fused modalities.

Authors:Ayae Ide, Tory Park, Jaron Mink, Tanusree Sharma
Title: Signals of Provenance: Practices & Challenges of Navigating Indicators in AI-Generated Media for Sighted and Blind Individuals
Abstract:
AI-Generated (AIG) content has become increasingly widespread by recent advances in generative models and the easy-to-use tools that have significantly lowered the technical barriers for producing highly realistic audio, images, and videos through simple natural language prompts. In response, platforms are adopting provable provenance with platforms recommending AIG to be self-disclosed and signaled to users. However, these indicators may be often missed, especially when they rely solely on visual cues and make them ineffective to users with different sensory abilities. To address the gap, we conducted semi-structured interviews (N=28) with 15 sighted and 13 BLV participants to examine their interaction with AIG content through self-disclosed AI indicators. Our findings reveal diverse mental models and practices, highlighting different strengths and weaknesses of content-based (e.g., title, description) and menu-aided (e.g., AI labels) indicators. While sighted participants leveraged visual and audio cues, BLV participants primarily relied on audio and existing assistive tools, limiting their ability to identify AIG. Across both groups, they frequently overlooked menu-aided indicators deployed by platforms and rather interacted with content-based indicators such as title and comments. We uncovered usability challenges stemming from inconsistent indicator placement, unclear metadata, and cognitive overload. These issues were especially critical for BLV individuals due to the insufficient accessibility of interface elements. We provide practical recommendations and design implications for future AIG indicators across several dimensions.

Authors:Wenqing Wu, Haixu Xi, Chengzhi Zhang
Title: Are the confidence scores of reviewers consistent with the review content? Evidence from top conference proceedings in AI
Abstract:
Peer review is vital in academia for evaluating research quality. Top AI conferences use reviewer confidence scores to ensure review reliability, but existing studies lack fine-grained analysis of text-score consistency, potentially missing key details. This work assesses consistency at word, sentence, and aspect levels using deep learning and NLP conference review data. We employ deep learning to detect hedge sentences and aspects, then analyze report length, hedge word/sentence frequency, aspect mentions, and sentiment to evaluate text-score alignment. Correlation, significance, and regression tests examine confidence scores' impact on paper outcomes. Results show high text-score consistency across all levels, with regression revealing higher confidence scores correlate with paper rejection, validating expert assessments and peer review fairness.

Authors:Yunhee Shim, Shagun Jhaver
Title: The Pin of Shame: Examining Content Creators' Adoption of Pinning Inappropriate Comments as a Moderation Strategy
Abstract:
Many social media platforms allow content creators to pin user comments in response to their content. Once pinned, a comment remains fixed at the top of the comments section, regardless of subsequent activity or the selected sorting order. The "Pin of Shame" refers to an innovative re-purposing of this feature, where creators intentionally pin norm-violating comments to spotlight them and prompt shaming responses from their audiences. This study explores how creators adopt this emerging moderation tactic, examining their motivations, its outcomes, and how it compares-procedurally and in effect-to other content moderation strategies. Through interviews with 20 content creators who had pinned negative comments on their posts, we find that the Pin of Shame is used to punish and educate inappropriate commenters, elicit emotional accountability, provoke audience negotiation of community norms, and support creators' impression management goals. Our findings shed light on the benefits, precarities, and risks of using public shaming as a tool for norm enforcement. We contribute to HCI research by informing the design of user-centered tools for addressing content-based harm.

Authors:Leif Johnson, Johan Engström, Aravinda Srinivasan, Ibrahim Özturk, Gustav Markkula
Title: Looking for an out: Affordances, uncertainty and collision avoidance behavior of human drivers
Abstract:
Understanding collision avoidance behavior is of key importance in traffic safety research and for designing and evaluating advanced driver assistance systems and autonomous vehicles. While existing experimental work has primarily focused on response timing in traffic conflicts, the goal of the present study was to gain a better understanding of human evasive maneuver decisions and execution in collision avoidance scenarios. To this end, we designed a driving simulator study where participants were exposed to one of three surprising opposite direction lateral incursion (ODLI) scenario variants. The results demonstrated that both the participants' collision avoidance behavior patterns and the collision outcome was strongly determined by the scenario kinematics and, more specifically, by the uncertainty associated with the oncoming vehicle's future trajectory. We discuss pitfalls related to hindsight bias when judging the quality of evasive maneuvers in uncertain situations and suggest that the availability of escape paths in collision avoidance scenarios can be usefully understood based on the notion of affordances, and further demonstrate how such affordances can be operationalized in terms of reachable sets. We conclude by discussing how these results can be used to inform computational models of collision avoidance behavior.

Authors:V. C. Storey, R. Lukyanenko, A. Castellanos
Title: Conceptual Modeling: Topics, Themes, and Technology Trends
Abstract:
Conceptual modeling is an important part of information systems development and use that involves identifying and representing relevant aspects of reality. Although the past decades have experienced continuous digitalization of services and products that impact business and society, conceptual modeling efforts are still required to support new technologies as they emerge. This paper surveys research on conceptual modeling over the past five decades and shows how its topics and trends continue to evolve to accommodate emerging technologies, while remaining grounded in basic constructs. We survey over 5,300 papers that address conceptual modeling topics from the 1970s to the present, which are collected from 35 multidisciplinary journals and conferences, and use them as the basis from which to analyze the progression of conceptual modeling. The important role that conceptual modeling should play in our evolving digital world is discussed, and future research directions proposed.

Authors:Jingyang Peng, Wenyuan Shen, Jiarui Rao, Jionghao Lin
Title: Automated Bias Assessment in AI-Generated Educational Content Using CEAT Framework
Abstract:
Recent advances in Generative Artificial Intelligence (GenAI) have transformed educational content creation, particularly in developing tutor training materials. However, biases embedded in AI-generated content--such as gender, racial, or national stereotypes--raise significant ethical and educational concerns. Despite the growing use of GenAI, systematic methods for detecting and evaluating such biases in educational materials remain limited. This study proposes an automated bias assessment approach that integrates the Contextualized Embedding Association Test with a prompt-engineered word extraction method within a Retrieval-Augmented Generation framework. We applied this method to AI-generated texts used in tutor training lessons. Results show a high alignment between the automated and manually curated word sets, with a Pearson correlation coefficient of r = 0.993, indicating reliable and consistent bias assessment. Our method reduces human subjectivity and enhances fairness, scalability, and reproducibility in auditing GenAI-produced educational content.

Authors:Alexandre Banks, Randy Moore, Sayem Nazmuz Zaman, Alaa Eldin Abdelaal, Septimiu E. Salcudean
Title: AutoCam: Hierarchical Path Planning for an Autonomous Auxiliary Camera in Surgical Robotics
Abstract:
Incorporating an autonomous auxiliary camera into robot-assisted minimally invasive surgery (RAMIS) enhances spatial awareness and eliminates manual viewpoint control. Existing path planning methods for auxiliary cameras track two-dimensional surgical features but do not simultaneously account for camera orientation, workspace constraints, and robot joint limits. This study presents AutoCam: an automatic auxiliary camera placement method to improve visualization in RAMIS. Implemented on the da Vinci Research Kit, the system uses a priority-based, workspace-constrained control algorithm that combines heuristic geometric placement with nonlinear optimization to ensure robust camera tracking. A user study (N=6) demonstrated that the system maintained 99.84% visibility of a salient feature and achieved a pose error of 4.36 $\pm$ 2.11 degrees and 1.95 $\pm$ 5.66 mm. The controller was computationally efficient, with a loop time of 6.8 $\pm$ 12.8 ms. An additional pilot study (N=6), where novices completed a Fundamentals of Laparoscopic Surgery training task, suggests that users can teleoperate just as effectively from AutoCam's viewpoint as from the endoscope's while still benefiting from AutoCam's improved visual coverage of the scene. These results indicate that an auxiliary camera can be autonomously controlled using the da Vinci patient-side manipulators to track a salient feature, laying the groundwork for new multi-camera visualization methods in RAMIS.

Authors:Tuan Dung Nguyen, Duncan J. Watts, Mark E. Whiting
Title: Empirically evaluating commonsense intelligence in large language models with large-scale human judgments
Abstract:
Commonsense intelligence in machines is often assessed by static benchmarks that compare a model's output against human-prescribed correct labels. An important, albeit implicit, assumption of these labels is that they accurately capture what any human would think, effectively treating human common sense as homogeneous. However, recent empirical work has shown that humans vary enormously in what they consider commonsensical; thus what appears self-evident to one benchmark designer may not be so to another. Here, we propose a novel method for evaluating common sense in artificial intelligence (AI), specifically in large language models (LLMs), that incorporates empirically observed heterogeneity among humans by measuring the correspondence between a model's judgment and that of a human population. We first find that, when treated as independent survey respondents, most LLMs remain below the human median in their individual commonsense competence. Second, when used as simulators of a hypothetical population, LLMs correlate with real humans only modestly in the extent to which they agree on the same set of statements. In both cases, smaller, open-weight models are surprisingly more competitive than larger, proprietary frontier models. Our evaluation framework, which ties commonsense intelligence to its cultural basis, contributes to the growing call for adapting AI models to human collectivities that possess different, often incompatible, social stocks of knowledge.
中文摘要:该摘要批评了静态基准测试假设人类常识同质化的做法,提出了一种通过将AI常识与人类多样性对齐的评估方法,发现较小模型在匹配人类判断差异方面常优于大型模型。
English Summary: The abstract critiques static benchmarks for assuming uniform human common sense and introduces a method to evaluate AI's common sense by aligning it with human diversity, revealing that smaller models often outperform larger ones in matching human judgment variability.

Authors:Ziyuan Zhang, Darcy Wang, Ningyuan Chen, Rodrigo Mansur, Vahid Sarhangian
Title: Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Tasks
Abstract:
Large language models (LLMs) are increasingly used to simulate or automate human behavior in complex sequential decision-making tasks. A natural question is then whether LLMs exhibit similar decision-making behavior to humans, and can achieve comparable (or superior) performance. In this work, we focus on the exploration-exploitation (E&E) tradeoff, a fundamental aspect of dynamic decision-making under uncertainty. We employ canonical multi-armed bandit (MAB) tasks introduced in the cognitive science and psychiatry literature to conduct a comparative study of the E&E strategies of LLMs, humans, and MAB algorithms. We use interpretable choice models to capture the E&E strategies of the agents and investigate how explicit reasoning, through both prompting strategies and reasoning-enhanced models, shapes LLM decision-making. We find that reasoning shifts LLMs toward more human-like behavior, characterized by a mix of random and directed exploration. In simple stationary tasks, reasoning-enabled LLMs exhibit similar levels of random and directed exploration compared to humans. However, in more complex, non-stationary environments, LLMs struggle to match human adaptability, particularly in effective directed exploration, despite achieving similar regret in certain scenarios. Our findings highlight both the promise and limits of LLMs as simulators of human behavior and tools for automated decision-making and point to potential areas of improvements.

Authors:Ruichen Yang, György M. Lévay, Christopher L. Hunt, Dániel Czeiner, Megan C. Hodgson, Damini Agarwal, Rahul R. Kaliki, Nitish V. Thakor
Title: Visual Feedback of Pattern Separability Improves Myoelectric Decoding Performance of Upper Limb Prostheses
Abstract:
State-of-the-art upper limb myoelectric prostheses often use pattern recognition (PR) control systems that translate electromyography (EMG) signals into desired movements. As prosthesis movement complexity increases, users often struggle to produce sufficiently distinct EMG patterns for reliable classification. Existing training typically involves heuristic, trial-and-error user adjustments to static decoder boundaries. Goal: We introduce the Reviewer, a 3D visual interface projecting EMG signals directly into the decoder's classification space, providing intuitive, real-time insight into PR algorithm behavior. This structured feedback reduces cognitive load and fosters mutual, data-driven adaptation between user-generated EMG patterns and decoder boundaries. Methods: A 10-session study with 12 able-bodied participants compared PR performance after motor-based training and updating using the Reviewer versus conventional virtual arm visualization. Performance was assessed using a Fitts law task that involved the aperture of the cursor and the control of orientation. Results: Participants trained with the Reviewer achieved higher completion rates, reduced overshoot, and improved path efficiency and throughput compared to the standard visualization group. Significance: The Reviewer introduces decoder-informed motor training, facilitating immediate and consistent PR-based myoelectric control improvements. By iteratively refining control through real-time feedback, this approach reduces reliance on trial-and-error recalibration, enabling a more adaptive, self-correcting training framework. Conclusion: The 3D visual feedback significantly improves PR control in novice operators through structured training, enabling feedback-driven adaptation and reducing reliance on extensive heuristic adjustments.

Authors:Lei Fan, Kunyang Deng, Fangxue Liu
Title: Educational impacts of generative artificial intelligence on learning and performance of engineering students in China
Abstract:
With the rapid advancement of generative artificial intelligence(AI), its potential applications in higher education have attracted significant attention. This study investigated how 148 students from diverse engineering disciplines and regions across China used generative AI, focusing on its impact on their learning experience and the opportunities and challenges it poses in engineering education. Based on the surveyed data, we explored four key areas: the frequency and application scenarios of AI use among engineering students, its impact on students' learning and performance, commonly encountered challenges in using generative AI, and future prospects for its adoption in engineering education. The results showed that more than half of the participants reported a positive impact of generative AI on their learning efficiency, initiative, and creativity, with nearly half believing it also enhanced their independent thinking. However, despite acknowledging improved study efficiency, many felt their actual academic performance remained largely unchanged and expressed concerns about the accuracy and domain-specific reliability of generative AI. Our findings provide a first-hand insight into the current benefits and challenges generative AI brings to students, particularly Chinese engineering students, while offering several recommendations, especially from the students' perspective, for effectively integrating generative AI into engineering education.

Authors:Hannu Simonen, Atte Kiviniemi, Hannah Johnston, Helena Barranha, Jonas Oppenlaender
Title: An Exploration of Default Images in Text-to-Image Generation
Abstract:
In the creative practice of text-to-image (TTI) generation, images are synthesized from textual prompts. By design, TTI models always yield an output, even if the prompt contains unknown terms. In this case, the model may generate default images: images that closely resemble each other across many unrelated prompts. Studying default images is valuable for designing better solutions for prompt engineering and TTI generation. We present the first investigation into default images on Midjourney. We describe an initial study in which we manually created input prompts triggering default images, and several ablation studies. Building on these, we conduct a computational analysis of about 750,000 images, revealing consistent default images across unrelated prompts. We also conduct an online user study investigating how default images may affect user satisfaction. Our work lays the foundation for understanding default images in TTI generation, highlighting their practical relevance as well as challenges and future research directions.

Authors:Yichen Zhao, Yuhua Wang, Xi Cheng, Junhao Fang, Yang Yang
Title: Integrating Natural Language Processing and Exercise Monitoring for Early Diagnosis of Metabolic Syndrome: A Deep Learning Approach
Abstract:
Metabolic syndrome (MetS) is a medication condition characterized by abdominal obesity, insulin resistance, hypertension and hyperlipidemia. It increases the risk of majority of chronic diseases, including type 2 diabetes mellitus, and affects about one quarter of the global population. Therefore, early detection and timely intervention for MetS are crucial. Standard diagnosis for MetS components requires blood tests conducted within medical institutions. However, it is frequently underestimated, leading to unmet need for care for MetS population. This study aims to use the least physiological data and free texts about exercises related activities, which are obtained easily in daily life, to diagnosis MetS. We collected the data from 40 volunteers in a nursing home and used data augmentation to reduce the imbalance. We propose a deep learning framework for classifying MetS that integrates natural language processing (NLP) and exercise monitoring. The results showed that the best model reported a high positive result (AUROC=0.806 and REC=76.3%) through 3-fold cross-validation. Feature importance analysis revealed that text and minimum heart rate on a daily basis contribute the most in the classification of MetS. This study demonstrates the potential application of data that are easily measurable in daily life for the early diagnosis of MetS, which could contribute to reducing the cost of screening and management for MetS population.

Authors:Nico Feld, Pauline Bimberg, Michael Feldmann, Matthias Wölwer, Eike Langbehn, Benjamin Weyers, Daniel Zielasko
Title: Investigating Resolution Strategies for Workspace-Occlusion in Augmented Virtuality
Abstract:
Augmented Virtuality integrates physical content into virtual environments, but the occlusion of physical by virtual content is a challenge. This unwanted occlusion may disrupt user interactions with physical devices and compromise safety and usability. This paper investigates two resolution strategies to address this issue: Redirected Walking, which subtly adjusts the user's movement to maintain physical-virtual alignment, and Automatic Teleport Rotation, which realigns the virtual environment during travel. A user study set in a virtual forest demonstrates that both methods effectively reduce occlusion. While in our testbed, Automatic Teleport Rotation achieves higher occlusion resolution, it is suspected to increase cybersickness compared to the less intrusive Redirected Walking approach.

Authors:Gabriel Lima, Nina Grgić-Hlača, Markus Langer, Yixin Zou
Title: Laypeople's Attitudes Towards Fair, Affirmative, and Discriminatory Decision-Making Algorithms
Abstract:
Affirmative algorithms have emerged as a potential answer to algorithmic discrimination, seeking to redress past harms and rectify the source of historical injustices. We present the results of two experiments ($N$$=$$1193$) capturing laypeople's perceptions of affirmative algorithms -- those which explicitly prioritize the historically marginalized -- in hiring and criminal justice. We contrast these opinions about affirmative algorithms with folk attitudes towards algorithms that prioritize the privileged (i.e., discriminatory) and systems that make decisions independently of demographic groups (i.e., fair). We find that people -- regardless of their political leaning and identity -- view fair algorithms favorably and denounce discriminatory systems. In contrast, we identify disagreements concerning affirmative algorithms: liberals and racial minorities rate affirmative systems as positively as their fair counterparts, whereas conservatives and those from the dominant racial group evaluate affirmative algorithms as negatively as discriminatory systems. We identify a source of these divisions: people have varying beliefs about who (if anyone) is marginalized, shaping their views of affirmative algorithms. We discuss the possibility of bridging these disagreements to bring people together towards affirmative algorithms.

Authors:Anna Wróblewska, Bartosz Grabek, Jakub Świstak, Daniel Dan
Title: Evaluating LLM-Generated Q&A Test: a Student-Centered Study
Abstract:
This research prepares an automatic pipeline for generating reliable question-answer (Q&A) tests using AI chatbots. We automatically generated a GPT-4o-mini-based Q&A test for a Natural Language Processing course and evaluated its psychometric and perceived-quality metrics with students and experts. A mixed-format IRT analysis showed that the generated items exhibit strong discrimination and appropriate difficulty, while student and expert star ratings reflect high overall quality. A uniform DIF check identified two items for review. These findings demonstrate that LLM-generated assessments can match human-authored tests in psychometric performance and user satisfaction, illustrating a scalable approach to AI-assisted assessment development.

Authors:Nikolai Bahr, Christoph Zetzsche
Title: Human causal perception in a cube-stacking task
Abstract:
In intuitive physics the process of stacking cubes has become a paradigmatic, canonical task. Even though it gets employed in various shades and complexities, the very fundamental setting with two cubes has not been thoroughly investigated. Furthermore, the majority of settings feature only a reduced, one dimensional (1D) decision space. In this paper an experiment is conducted in which participants judge the stability of two cubes stacked on top of each other. It is performed in the full 3D setting which features a 2D decision surface. The analysis yield a shape of a rotated square for the perceived stability area instead of the commonly reported safety margin in 1D. This implies a more complex decision behavior in human than previously assumed.

Authors:Ohida Binte Amin, Varun Mishra, Tinashe M. Tapera, Robert Volpe, Aarti Sathyanarayana
Title: Extending Stress Detection Reproducibility to Consumer Wearable Sensors
Abstract:
Wearable sensors are widely used to collect physiological data and develop stress detection models. However, most studies focus on a single dataset, rarely evaluating model reproducibility across devices, populations, or study conditions. We previously assessed the reproducibility of stress detection models across multiple studies, testing models trained on one dataset against others using heart rate (with R-R interval) and electrodermal activity (EDA). In this study, we extended our stress detection reproducibility to consumer wearable sensors. We compared validated research-grade devices, to consumer wearables - Biopac MP160, Polar H10, Empatica E4, to the Garmin Forerunner 55s, assessing device-specific stress detection performance by conducting a new stress study on undergraduate students. Thirty-five students completed three standardized stress-induction tasks in a lab setting. Biopac MP160 performed the best, being consistent with our expectations of it as the gold standard, though performance varied across devices and models. Combining heart rate variability (HRV) and EDA enhanced stress prediction across most scenarios. However, Empatica E4 showed variability; while HRV and EDA improved stress detection in leave-one-subject-out (LOSO) evaluations (AUROC up to 0.953), device-specific limitations led to underperformance when tested with our pre-trained stress detection tool (AUROC 0.723), highlighting generalizability challenges related to hardware-model compatibility. Garmin Forerunner 55s demonstrated strong potential for real-world stress monitoring, achieving the best mental arithmetic stress detection performance in LOSO (AUROC up to 0.961) comparable to research-grade devices like Polar H10 (AUROC 0.954), and Empatica E4 (AUROC 0.905 with HRV-only model and AUROC 0.953 with HRV+EDA model), with the added advantage of consumer-friendly wearability for free-living contexts.

Authors:Jeffrey Basoah, Daniel Chechelnitsky, Tao Long, Katharina Reinecke, Chrysoula Zerva, Kaitlyn Zhou, Mark Díaz, Maarten Sap
Title: Not Like Us, Hunty: Measuring Perceptions and Behavioral Effects of Minoritized Anthropomorphic Cues in LLMs
Abstract:
As large language models (LLMs) increasingly adapt and personalize to diverse sets of users, there is an increased risk of systems appropriating sociolects, i.e., language styles or dialects that are associated with specific minoritized lived experiences (e.g., African American English, Queer slang). In this work, we examine whether sociolect usage by an LLM agent affects user reliance on its outputs and user perception (satisfaction, frustration, trust, and social presence). We designed and conducted user studies where 498 African American English (AAE) speakers and 487 Queer slang speakers performed a set of question-answering tasks with LLM-based suggestions in either standard American English (SAE) or their self-identified sociolect. Our findings showed that sociolect usage by LLMs influenced both reliance and perceptions, though in some surprising ways. Results suggest that both AAE and Queer slang speakers relied more on the SAE agent, and had more positive perceptions of the SAE agent. Yet, only Queer slang speakers felt more social presence from the Queer slang agent over the SAE one, whereas only AAE speakers preferred and trusted the SAE agent over the AAE one. These findings emphasize the need to test for behavioral outcomes rather than simply assume that personalization would lead to a better and safer reliance outcome. They also highlight the nuanced dynamics of minoritized language in machine interactions, underscoring the need for LLMs to be carefully designed to respect cultural and linguistic boundaries while fostering genuine user engagement and trust.

Authors:Mukund Telukunta, Venkata Sriram Siddhardh Nadendla, Morgan Stuart, Casey Canfield
Title: Fairness Perceptions in Regression-based Predictive Models
Abstract:
Regression-based predictive analytics used in modern kidney transplantation is known to inherit biases from training data. This leads to social discrimination and inefficient organ utilization, particularly in the context of a few social groups. Despite this concern, there is limited research on fairness in regression and its impact on organ utilization and placement. This paper introduces three novel divergence-based group fairness notions: (i) independence, (ii) separation, and (iii) sufficiency to assess the fairness of regression-based analytics tools. In addition, fairness preferences are investigated from crowd feedback, in order to identify a socially accepted group fairness criterion for evaluating these tools. A total of 85 participants were recruited from the Prolific crowdsourcing platform, and a Mixed-Logit discrete choice model was used to model fairness feedback and estimate social fairness preferences. The findings clearly depict a strong preference towards the separation and sufficiency fairness notions, and that the predictive analytics is deemed fair with respect to gender and race groups, but unfair in terms of age groups.

Authors:Austin Lu, Kanad Sarkar, Yongjie Zhuang, Leo Lin, Ryan M Corey, Andrew C Singer
Title: Accelerating Audio Research with Robotic Dummy Heads
Abstract:
This work introduces a robotic dummy head that fuses the acoustic realism of conventional audiological mannequins with the mobility of robots. The proposed device is capable of moving, talking, and listening as people do, and can be used to automate spatially-stationary audio experiments, thus accelerating the pace of audio research. Critically, the device may also be used as a moving sound source in dynamic experiments, due to its quiet motor. This feature differentiates our work from previous robotic acoustic research platforms. Validation that the robot enables high quality audio data collection is provided through various experiments and acoustic measurements. These experiments also demonstrate how the robot might be used to study adaptive binaural beamforming. Design files are provided as open-source to stimulate novel audio research.

Authors:Matthias Matt, Jana Sedlakova, Jürgen Bernard, Matthias Zeppelzauer, Manuela Waldner
Title: Scalable Class-Centric Visual Interactive Labeling
Abstract:
Large unlabeled datasets demand efficient and scalable data labeling solutions, in particular when the number of instances and classes is large. This leads to significant visual scalability challenges and imposes a high cognitive load on the users. Traditional instance-centric labeling methods, where (single) instances are labeled in each iteration struggle to scale effectively in these scenarios. To address these challenges, we introduce cVIL, a Class-Centric Visual Interactive Labeling methodology designed for interactive visual data labeling. By shifting the paradigm from assigning-classes-to-instances to assigning-instances-to-classes, cVIL reduces labeling effort and enhances efficiency for annotators working with large, complex and class-rich datasets. We propose a novel visual analytics labeling interface built on top of the conceptual cVIL workflow, enabling improved scalability over traditional visual labeling. In a user study, we demonstrate that cVIL can improve labeling efficiency and user satisfaction over instance-centric interfaces. The effectiveness of cVIL is further demonstrated through a usage scenario, showcasing its potential to alleviate cognitive load and support experts in managing extensive labeling tasks efficiently.

Authors:Mouath Abu Daoud, Chaimae Abouzahir, Leen Kharouf, Walid Al-Eisawi, Nizar Habash, Farah E. Shamout
Title: MedArabiQ: Benchmarking Large Language Models on Arabic Medical Tasks
Abstract:
Large Language Models (LLMs) have demonstrated significant promise for various applications in healthcare. However, their efficacy in the Arabic medical domain remains unexplored due to the lack of high-quality domain-specific datasets and benchmarks. This study introduces MedArabiQ, a novel benchmark dataset consisting of seven Arabic medical tasks, covering multiple specialties and including multiple choice questions, fill-in-the-blank, and patient-doctor question answering. We first constructed the dataset using past medical exams and publicly available datasets. We then introduced different modifications to evaluate various LLM capabilities, including bias mitigation. We conducted an extensive evaluation with five state-of-the-art open-source and proprietary LLMs, including GPT-4o, Claude 3.5-Sonnet, and Gemini 1.5. Our findings highlight the need for the creation of new high-quality benchmarks that span different languages to ensure fair deployment and scalability of LLMs in healthcare. By establishing this benchmark and releasing the dataset, we provide a foundation for future research aimed at evaluating and enhancing the multilingual capabilities of LLMs for the equitable use of generative AI in healthcare.

Authors:Kurtis Haut, Masum Hasan, Thomas Carroll, Ronald Epstein, Taylan Sen, Ehsan Hoque
Title: AI Standardized Patient Improves Human Conversations in Advanced Cancer Care
Abstract:
Serious illness communication (SIC) in end-of-life care faces challenges such as emotional stress, cultural barriers, and balancing hope with honesty. Despite its importance, one of the few available ways for clinicians to practice SIC is with standardized patients, which is expensive, time-consuming, and inflexible. In this paper, we present SOPHIE, an AI-powered standardized patient simulation and automated feedback system. SOPHIE combines large language models (LLMs), a lifelike virtual avatar, and automated, personalized feedback based on clinical literature to provide remote, on-demand SIC training. In a randomized control study with healthcare students and professionals, SOPHIE users demonstrated significant improvement across three critical SIC domains: Empathize, Be Explicit, and Empower. These results suggest that AI-driven tools can enhance complex interpersonal communication skills, offering scalable, accessible solutions to address a critical gap in clinician education.

Authors:Nicholas Hafner, Chaoran Liu, Carlos Ishi, Hiroshi Ishiguro
Title: Quadrupedal Spine Control Strategies: Exploring Correlations Between System Dynamic Responses and Human Perspectives
Abstract:
Unlike their biological cousins, the majority of existing quadrupedal robots are constructed with rigid chassis. This results in motion that is either beetle-like or distinctly robotic, lacking the natural fluidity characteristic of mammalian movements. Existing literature on quadrupedal robots with spinal configurations primarily focuses on energy efficiency and does not consider the effects in human-robot interaction scenarios. Our contributions include an initial investigation into various trajectory generation strategies for a quadrupedal robot with a four degree of freedom spine, and an analysis on the effect that such methods have on human perception of gait naturalness compared to a fixed spine baseline. The strategies were evaluated using videos of walking, trotting and turning simulations. Among the four different strategies developed, the optimised time varying and the foot-tracking strategies were perceived to be more natural than the baseline in a randomised trial with 50 participants. Although none of the strategies demonstrated any energy efficiency improvements over the no-spine baseline, some showed greater footfall consistency at higher speeds. Given the greater likeability drawn from the more natural locomotion patterns, this type of robot displays potential for applications in social robot scenarios such as elderly care, where energy efficiency is not a primary concern.

Authors:Maryam Sadeghi, Darío Fernández Khatiboun, Yasser Rezaeiyan, Saima Rizwan, Alessandro Barcellona, Andrea Merello, Marco Crepaldi, Gabriella Panuccio, Farshad Moradi
Title: Closed-loop control of seizure activity via real-time seizure forecasting by reservoir neuromorphic computing
Abstract:
Closed-loop brain stimulation holds potential as personalized treatment for drug-resistant epilepsy (DRE) but still suffers from limitations that result in highly variable efficacy. First, stimulation is typically delivered upon detection of the seizure to abort rather than prevent it; second, the stimulation parameters are established by trial and error, requiring lengthy rounds of fine-tuning, which delay steady-state therapeutic efficacy. Here, we address these limitations by leveraging the potential of neuromorphic computing. We present a neuromorphic reservoir computing hardware system capable of driving real-time personalized free-run stimulations based on seizure forecasting, wherein each forecast triggers an electrical pulse rather than an arbitrarily predefined fixed-frequency stimulus train. The system achieves 83.33% accuracy in forecasting seizure occurrences during the training phase. We validate the system using hippocampal spheroids coupled to 3D microelectrode array as a simplified testbed, achieving seizure reduction >97% during the real-time processing while primarily using instantaneous stimulation frequencies within 20 Hz, well below what typically used in clinical practice. Our work demonstrates the potential of neuromorphic systems as a next-generation neuromodulation strategy for personalized DRE treatment, leveraging their sparse and event-driven processing for real-time applications.

Authors:Tamim Ahmed, Thanassis Rikakis
Title: Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study
Abstract:
Manual scoring of the Action Research Arm Test (ARAT) for upper extremity assessment in stroke rehabilitation is time-intensive and variable. We propose an automated ARAT scoring system integrating multimodal video analysis with SlowFast, I3D, and Transformer-based models using OpenPose keypoints and object locations. Our approach employs multi-view data (ipsilateral, contralateral, and top perspectives), applying early and late fusion to combine features across views and models. Hierarchical Bayesian Models (HBMs) infer movement quality components, enhancing interpretability. A clinician dashboard displays task scores, execution times, and quality assessments. We conducted a study with five clinicians who reviewed 500 video ratings generated by our system, providing feedback on its accuracy and usability. Evaluated on a stroke rehabilitation dataset, our framework achieves 89.0% validation accuracy with late fusion, with HBMs aligning closely with manual assessments. This work advances automated rehabilitation by offering a scalable, interpretable solution with clinical validation.

Authors:Christopher Cofie Kuzagbe, Fabrice Mukarage, Skye Nandi Adams, N'guessan Yves-Roland Douha, Edith Talina Luhanga
Title: Content and Quality Analysis of Parent-Facing Applications for Feeding Children with Autism Spectrum Disorder
Abstract:
Approximately 1 in 100 children worldwide are diagnosed with Autism Spectrum Disorder (ASD), and 46% to 89% experience significant feeding difficulties. Although mobile health (mHealth) applications offer potential support for caregivers, the quality and relevance of apps targeting autism-related feeding issues remain unclear. This systematic review evaluated mobile applications available on the Apple App Store and the Google Play Store between September and October 2024. The searches were carried out using 15 predefined terms (e.g., "child autism feeding", "child autism food"). Applications were eligible if they were in English, free to download, updated within the past year, explicitly addressed feeding in children with autism, accessible in Africa, and had more than 100 downloads. Of the 326 apps identified, only two iOS applications met all inclusion criteria; no Android apps qualified. Behavior Change Wheel (BCW) analysis showed that the selected applications incorporated multiple intervention functions, such as education, training, enablement, incentivization, and modeling, though none addressed the full spectrum of behavioral strategies. Mobile App Rating Scale (MARS) indicated moderate to high usability, with features such as sensory-friendly food routines and structured caregiver tools. However, both apps lacked clinical validation and comprehensive customization. These findings highlight a critical gap in the availability of evidence-based high-quality mHealth tools for caregivers managing ASD-related feeding challenges and underscore the need for professionally developed and culturally sensitive digital solutions.

Authors:Phil Lopes, Nuno Fachada, Maria Fonseca
Title: Closing the Loop: A Systematic Review of Experience-Driven Game Adaptation
Abstract:
Adaptive game systems aim to enrich player experiences by dynamically adjusting game content in response to user data. While extensive research has addressed content personalization and player experience modeling, the integration of these components into fully operational adaptive gameplay systems remains limited. This systematic review, conducted in accordance with PRISMA guidelines, analyzes 17 empirical studies published between January 2015 and May 2024, identifying and analyzing approaches that implement the complete experience-driven loop -- including player sensing, modeling, and content adaptation. Game telemetry remains the most prevalent sensing modality, although other non-invasive methods suitable for affective modeling -- such as facial expression analysis (FEA) and peripheral interaction data -- remain underutilized despite their potential for real-time emotional inference. Knowledge-based methods, such as rule-based systems and heuristics, dominate modeling and adaptation due to their interpretability and low resource demands, whereas machine learning approaches face challenges related to data availability and transparency. Despite their relevance to immersive and therapeutic experiences, affective states such as stress and anxiety remain largely ignored, as systems continue to favor performance over emotion-sensitive adaptation. These findings highlight a crucial research direction: advancing emotionally responsive game systems that move beyond performance optimization by incorporating underutilized sensing modalities -- such as FEA and peripheral interaction -- to enable real-time affect-driven personalization. Advancing in this direction holds strong potential to increase immersion, personalize gameplay, and support affect regulation across entertainment and therapeutic contexts.

Authors:Jeffrey Lim, Po T. Wang, Won Joon Sohn, Derrick Lin, Shravan Thaploo, Luke Bashford, David Bjanes, Angelica Nguyen, Hui Gong, Michelle Armacost, Susan J. Shaw, Spencer Kellis, Brian Lee, Darrin Lee, Payam Heydari, Richard A. Andersen, Zoran Nenadic, Charles Y. Liu, An H. Do
Title: Real-Time Brain-Computer Interface Control of Walking Exoskeleton with Bilateral Sensory Feedback
Abstract:
Invasive brain-computer interface (BCI) technology has demonstrated the possibility of restoring brain-controlled walking in paraplegic spinal cord injury patients. However, current implementations of BCI-controlled walking still have significant drawbacks. In particular, prior systems are unidirectional and lack sensory feedback for insensate patients, have suboptimal reliance on brain signals from the bilateral arm areas of the motor cortex, and depend on external systems for signal processing. Motivated by these shortcomings, this study is the first time a bidirectional brain-computer interface (BDBCI) has demonstrated the restoration of both brain-controlled walking and leg sensory feedback while utilizing the bilateral leg motor and sensory cortices. Here, a subject undergoing subdural electrocorticogram electrode implantation for epilepsy surgery evaluation leveraged the leg representation areas of the bilateral interhemispheric primary motor and sensory cortices to operate a BDBCI with high performance. Although electrode implantation in the interhemispheric region is uncommon, electrodes can be safely implanted in this region to access rich leg motor information and deliver bilateral leg sensory feedback. Finally, we demonstrated that all BDBCI operations can be executed on a dedicated, portable embedded system. These results indicate that BDBCIs can potentially provide brain-controlled ambulation and artificial leg sensation to people with paraplegia after spinal cord injury in a manner that emulates full-implantability and is untethered from any external systems.

Authors:Ju Wu, Calvin K. L. Or
Title: Position Paper: Towards Open Complex Human-AI Agents Collaboration System for Problem-Solving and Knowledge Management
Abstract:
This position paper critically surveys a broad spectrum of recent empirical developments on human-AI agents collaboration, highlighting both their technical achievements and persistent gaps. We observe a lack of a unifying theoretical framework that can coherently integrate these varied studies, especially when tackling open-ended, complex tasks. To address this, we propose a novel conceptual architecture: one that systematically interlinks the technical details of multi-agent coordination, knowledge management, cybernetic feedback loops, and higher-level control mechanisms. By mapping existing contributions, from symbolic AI techniques and connectionist LLM-based agents to hybrid organizational practices, onto this proposed framework (Hierarchical Exploration-Exploitation Net), our approach facilitates revision of legacy methods and inspires new work that fuses qualitative and quantitative paradigms. The paper's structure allows it to be read from any section, serving equally as a critical review of technical implementations and as a forward-looking reference for designing or extending human-AI symbioses. Together, these insights offer a stepping stone toward deeper co-evolution of human cognition and AI capability.

Authors:Justin B. Bullock, Janet V. T. Pauketat, Hsini Huang, Yi-Fan Wang, Jacy Reese Anthis
Title: Public Opinion and The Rise of Digital Minds: Perceived Risk, Trust, and Regulation Support
Abstract:
Governance institutions must respond to societal risks, including those posed by generative AI. This study empirically examines how public trust in institutions and AI technologies, along with perceived risks, shape preferences for AI regulation. Using the nationally representative 2023 Artificial Intelligence, Morality, and Sentience (AIMS) survey, we assess trust in government, AI companies, and AI technologies, as well as public support for regulatory measures such as slowing AI development or outright bans on advanced AI. Our findings reveal broad public support for AI regulation, with risk perception playing a significant role in shaping policy preferences. Individuals with higher trust in government favor regulation, while those with greater trust in AI companies and AI technologies are less inclined to support restrictions. Trust in government and perceived risks significantly predict preferences for both soft (e.g., slowing development) and strong (e.g., banning AI systems) regulatory interventions. These results highlight the importance of public opinion in AI governance. As AI capabilities advance, effective regulation will require balancing public concerns about risks with trust in institutions. This study provides a foundational empirical baseline for policymakers navigating AI governance and underscores the need for further research into public trust, risk perception, and regulatory strategies in the evolving AI landscape.

Authors:Antonia Azzini, Ilaria Baroni, Irene Celino
Title: A Conversational Approach to Well-being Awareness Creation and Behavioural Intention
Abstract:
The promotion of a healthy lifestyle is one of the main drivers of an individual's overall physical and psycho-emotional well-being. Digital technologies are more and more adopted as ''facilitators'' for this goal, to raise awareness and solicit healthy lifestyle habits. This study aims to experiment the effects of the adoption of a digital conversational tool to influence awareness creation and behavioural change in the context of a well-being lifestyle. Our aim is to collect evidence of the aspects that must be taken into account when designing and implementing such tools in well-being promotion campaigns. To this end, we created a conversational application for promoting well-being and healthy lifestyles, which presents relevant information and asks specific questions to its intended users within an interaction happening through a chat interface; the conversational tool presents itself as a well-being counsellor named Allegra and follows a coaching approach to structure the interaction with the user. In our user study, participants were asked to first interact with Allegra in one of three experimental conditions, corresponding to different conversational styles; then, they answered a questionnaire about their experience. The questionnaire items were related to intrinsic motivation factors as well as awareness creation and behavioural change. The collected data allowed us to assess the hypotheses of our model that put in connection those variables. Our results confirm the positive effect of intrinsic motivation factors on both awareness creation and behavioural intention in the context of well-being and healthy lifestyle; on the other hand, we did not record any statistically significant effect of different language and communication styles on the outcomes.

Authors:Nikos Bikakis, Panos K. Chrysanthis, Guoliang Li, George Papastefanatos, Lingyun Yu
Title: Visual Analytics Challenges and Trends in the Age of AI: The BigVis Community Perspective
Abstract:
This report provides insights into the challenges, emerging topics, and opportunities related to human-data interaction and visual analytics in the AI era. The BigVis 2024 organizing committee conducted a survey among experts in the field. They invite the Program Committee members and the authors of accepted papers to share their views. Thirty-two scientists from diverse research communities, including Databases, Information Visualization, and Human-Computer Interaction, participated in the study. These scientists, representing both industry and academia, provided valuable insights into the current and future landscape of the field. In this report, we analyze the survey responses and compare them to the findings of a similar study conducted four years ago. The results reveal some interesting insights. First, many of the critical challenges identified in the previous survey remain highly relevant today, despite being unrelated to AI. Meanwhile, the field's landscape has significantly evolved, with most of today's vital challenges not even being mentioned in the earlier survey, underscoring the profound impact of AI-related advancements. By summarizing the perspectives of the research community, this report aims to shed light on the key challenges, emerging trends, and potential research directions in human-data interaction and visual analytics in the AI era.

Authors:David Zhou, John R. Gallagher, Sarah Sterman
Title: Thoughtful, Confused, or Untrustworthy: How Text Presentation Influences Perceptions of AI Writing Tools
Abstract:
AI writing tools have been shown to dramatically change the way people write, yet the effects of AI text presentation are not well understood nor always intentionally designed. Although text presentation in existing large language model interfaces is linked to the speed of the underlying model, text presentation speed can impact perceptions of AI systems, potentially influencing whether AI suggestions are accepted or rejected. In this paper, we analyze the effects of varying text generation speed in creative and professional writing scenarios on an online platform (n=297). We find that speed is correlated with perceived humanness and trustworthiness of the AI tool, as well as the perceived quality of the generated text. We discuss its implications on creative and writing processes, along with future steps in the intentional design of AI writing tool interfaces.

Authors:Ozioma C. Oguine, Oghenemaro Anuyah, Zainab Agha, Iris Melgarez, Adriana Alvarado Garcia, Karla Badillo-Urquiola
Title: Online Safety for All: Sociocultural Insights from a Systematic Review of Youth Online Safety in the Global South
Abstract:
Youth online safety research in HCI has historically centered on perspectives from the Global North, often overlooking the unique particularities and cultural contexts of regions in the Global South. This paper presents a systematic review of 66 youth online safety studies published between 2014 and 2024, specifically focusing on regions in the Global South. Our findings reveal a concentrated research focus in Asian countries and predominance of quantitative methods. We also found limited research on marginalized youth populations and a primary focus on risks related to cyberbullying. Our analysis underscores the critical role of cultural factors in shaping online safety, highlighting the need for educational approaches that integrate social dynamics and awareness. We propose methodological recommendations and a future research agenda that encourages the adoption of situated, culturally sensitive methodologies and youth-centered approaches to researching youth online safety regions in the Global South. This paper advocates for greater inclusivity in youth online safety research, emphasizing the importance of addressing varied sociocultural contexts to better understand and meet the online safety needs of youth in the Global South.

Authors:Brandon Woodard, Melvin He, Mose Sakashita, Jing Qian, Zainab Iftikhar, Joseph J. LaViola
Title: Cam-2-Cam: Exploring the Design Space of Dual-Camera Interactions for Smartphone-based Augmented Reality
Abstract:
Off-the-shelf smartphone-based AR systems typically use a single front-facing or rear-facing camera, which restricts user interactions to a narrow field of view and small screen size, thus reducing their practicality. We present Cam-2-Cam, an interaction concept implemented in three smartphone-based AR applications with interactions that span both cameras. Results from our qualitative analysis conducted on 30 participants presented two major design lessons that explore the interaction space of smartphone AR while maintaining critical AR interface attributes like embodiment and immersion: (1) Balancing Contextual Relevance and Feedback Quality serves to outline a delicate balance between implementing familiar interactions people do in the real world and the quality of multimodal AR responses and (2) Preventing Disorientation using Simultaneous Capture and Alternating Cameras which details how to prevent disorientation during AR interactions using the two distinct camera techniques we implemented in the paper. Additionally, we consider observed user assumptions or natural tendencies to inform future implementations of dual-camera setups for smartphone-based AR. We envision our design lessons as an initial pioneering step toward expanding the interaction space of smartphone-based AR, potentially driving broader adoption and overcoming limitations of single-camera AR.

Authors:Johannes Eschner, Roberto Labadie-Tamayo, Matthias Zeppelzauer, Manuela Waldner
Title: Interactive Discovery and Exploration of Visual Bias in Generative Text-to-Image Models
Abstract:
Bias in generative Text-to-Image (T2I) models is a known issue, yet systematically analyzing such models' outputs to uncover it remains challenging. We introduce the Visual Bias Explorer (ViBEx) to interactively explore the output space of T2I models to support the discovery of visual bias. ViBEx introduces a novel flexible prompting tree interface in combination with zero-shot bias probing using CLIP for quick and approximate bias exploration. It additionally supports in-depth confirmatory bias analysis through visual inspection of forward, intersectional, and inverse bias queries. ViBEx is model-agnostic and publicly available. In four case study interviews, experts in AI and ethics were able to discover visual biases that have so far not been described in literature.

Authors:Apurv Varshney, Lucas Nadolskis, Tobias Höllerer, Michael Beyeler
Title: Beyond Physical Reach: Comparing Head- and Cane-Mounted Cameras for Last-Mile Navigation by Blind Users
Abstract:
Blind individuals face persistent challenges in last-mile navigation, including locating entrances, identifying obstacles, and navigating complex or cluttered spaces. Although wearable cameras are increasingly used in assistive systems, there has been no systematic, vantage-focused comparison to guide their design. This paper addresses that gap through a two-part investigation. First, we surveyed ten experienced blind cane users, uncovering navigation strategies, pain points, and technology preferences. Participants stressed the importance of multi-sensory integration, destination-focused travel, and assistive tools that complement (rather than replace) the cane's tactile utility. Second, we conducted controlled data collection with a blind participant navigating five real-world environments using synchronized head- and cane-mounted cameras, isolating vantage placement as the primary variable. To assess how each vantage supports spatial perception, we evaluated SLAM performance (for localization and mapping) and NeRF-based 3D reconstruction (for downstream scene understanding). Head-mounted sensors delivered superior localization accuracy, while cane-mounted views offered broader ground-level coverage and richer environmental reconstructions. A combined (head+cane) configuration consistently outperformed both. These results highlight the complementary strengths of different sensor placements and offer actionable guidance for developing hybrid navigation aids that are perceptive, robust, and user-aligned.

Authors:Edward Misback, Erik Vank, Zachary Tatlock, Steven Tanimoto
Title: Codetations: Intelligent, Persistent Notes and UIs for Programs and Other Documents
Abstract:
Software developers maintain extensive mental models of code they produce and its context, often relying on memory to retrieve or reconstruct design decisions, edge cases, and debugging experiences. These missing links and data obstruct both developers and, more recently, large language models (LLMs) working with unfamiliar code. We present Codetations, a system that helps developers contextualize documents with rich notes and tools. Unlike previous approaches, notes in Codetations stay outside the document to prevent code clutter, attaching to spans in the document using a hybrid edit-tracking/LLM-based method. Their content is dynamic, interactive, and synchronized with code changes. A worked example shows that relevant notes with interactively-collected data improve LLM performance during code repair. In our user evaluation, developers praised these properties and saw significant potential in annotation types that we generated with an LLM in just a few minutes.

Authors:Mario Fuksa, Sandro Speth, Steffen Becker
Title: MVVM Revisited: Exploring Design Variants of the Model-View-ViewModel Pattern
Abstract:
Many enterprise software systems provide complex Graphical User Interfaces (GUIs) that need robust architectural patterns for well-structured software design. However, popular GUI architectural patterns like Model-View-ViewModel (MVVM) often lack detailed implementation guidance, leading GUI developers to inappropriately use the pattern without a comprehensive overview of design variants and often-mentioned trade-offs. Therefore, this paper presents an extensive review of MVVM design aspects and trade-offs, extending beyond the standard MVVM definition. We conducted a multivocal literature review (MLR), including white and gray literature, to cover essential knowledge from blogs, published papers, and other unpublished formats like books. Using the standard MVVM definition as a baseline, our study identifies (1) 76 additional design constructs grouped into 29 design aspects and (2) 16 additional benefits and 15 additional drawbacks. These insights can guide enterprise application developers in implementing practical MVVM solutions and enable informed design decisions.

Authors:Suwon Yoon, Seungwon Yang, Jeongwon Choi, Wonjeong Park, Inseok Hwang
Title: Chatperone: An LLM-Based Negotiable Scaffolding System for Mediating Adolescent Mobile Interactions
Abstract:
Adolescents' uncontrolled exposure to digital content can negatively impact their development. Traditional regulatory methods, such as time limits or app restrictions, often take a rigid approach, ignoring adolescents' decision-making abilities. Another issue is the lack of content and services tailored for adolescents. To address this, we propose Chatperone, a concept of a system that provides adaptive scaffolding to support adolescents. Chatperone fosters healthy mobile interactions through three key modules: Perception, Negotiation, and Moderation. This paper outlines these modules' functionalities and discusses considerations for real-world implementation.

Authors:Jarne Thys, Sebe Vanbrabant, Davy Vanacken, Gustavo Rovelo Ruiz
Title: INSIGHT: Bridging the Student-Teacher Gap in Times of Large Language Models
Abstract:
The rise of AI, especially Large Language Models, presents challenges and opportunities to integrate such technology into the classroom. AI has the potential to revolutionize education by helping teaching staff with various tasks, such as personalizing their teaching methods, but it also raises concerns, for example, about the degradation of student-teacher interactions and user privacy. Based on interviews with teaching staff, this paper introduces INSIGHT, a proof of concept to combine various AI tools to assist teaching staff and students in the process of solving exercises. INSIGHT has a modular design that allows it to be integrated into various higher education courses. We analyze students' questions to an LLM by extracting keywords, which we use to dynamically build an FAQ from students' questions and provide new insights for the teaching staff to use for more personalized face-to-face support. Future work could build upon INSIGHT by using the collected data to provide adaptive learning and adjust content based on student progress and learning styles to offer a more interactive and inclusive learning experience.

Authors:Anton Andreev, Grégoire Cattan, Marco Congedo
Title: The Riemannian Means Field Classifier for EEG-Based BCI Data
Abstract:
A substantial amount of research has demonstrated the robustness and accuracy of the Riemannian minimum distance to mean (MDM) classifier for all kinds of EEG-based brain--computer interfaces (BCIs). This classifier is simple, fully deterministic, robust to noise, computationally efficient, and prone to transfer learning. Its training is very simple, requiring just the computation of a geometric mean of a symmetric positive-definite (SPD) matrix per class. We propose an improvement of the MDM involving a number of power means of SPD matrices instead of the sole geometric mean. By the analysis of 20 public databases, 10 for the motor-imagery BCI paradigm and 10 for the P300 BCI paradigm, comprising 587 individuals in total, we show that the proposed classifier clearly outperforms the MDM, approaching the state-of-the art in terms of performance while retaining the simplicity and the deterministic behavior. In order to promote reproducible research, our code will be released as open source.

Authors:Felix Kares, Timo Speith, Hanwei Zhang, Markus Langer
Title: What Makes for a Good Saliency Map? Comparing Strategies for Evaluating Saliency Maps in Explainable AI (XAI)
Abstract:
Saliency maps are a popular approach for explaining classifications of (convolutional) neural networks. However, it remains an open question as to how best to evaluate salience maps, with three families of evaluation methods commonly being used: subjective user measures, objective user measures, and mathematical metrics. We examine three of the most popular saliency map approaches (viz., LIME, Grad-CAM, and Guided Backpropagation) in a between subject study (N=166) across these families of evaluation methods. We test 1) for subjective measures, if the maps differ with respect to user trust and satisfaction; 2) for objective measures, if the maps increase users' abilities and thus understanding of a model; 3) for mathematical metrics, which map achieves the best ratings across metrics; and 4) whether the mathematical metrics can be associated with objective user measures. To our knowledge, our study is the first to compare several salience maps across all these evaluation methods$-$with the finding that they do not agree in their assessment (i.e., there was no difference concerning trust and satisfaction, Grad-CAM improved users' abilities best, and Guided Backpropagation had the most favorable mathematical metrics). Additionally, we show that some mathematical metrics were associated with user understanding, although this relationship was often counterintuitive. We discuss these findings in light of general debates concerning the complementary use of user studies and mathematical metrics in the evaluation of explainable AI (XAI) approaches.

Authors:Guanzhou Ji, Azadeh O. Sawyer, Srinivasa G. Narasimhan
Title: Digital Kitchen Remodeling: Editing and Relighting Intricate Indoor Scenes from a Single Panorama
Abstract:
We present a novel virtual staging application for kitchen remodeling from a single panorama. To ensure the realism of the virtual rendered scene, we capture real-world High Dynamic Range (HDR) panoramas and recover the absolute scene radiance for high-quality scene relighting. Our application pipeline consists of three key components: (1) HDR photography for capturing paired indoor and outdoor panoramas, (2) automatic kitchen layout generation with new kitchen components, and (3) an editable rendering pipeline that flexibly edits scene materials and relights the new virtual scene with global illumination. Additionally, we contribute a novel Pano-Pano HDR dataset with 141 paired indoor and outdoor panoramas and present a low-cost photometric calibration method for panoramic HDR photography.

Authors:Rune Møberg Jacobsen, Joel Wester, Helena Bøjer Djernæs, Niels van Berkel
Title: Distributed Cognition for AI-supported Remote Operations: Challenges and Research Directions
Abstract:
This paper investigates the impact of artificial intelligence integration on remote operations, emphasising its influence on both distributed and team cognition. As remote operations increasingly rely on digital interfaces, sensors, and networked communication, AI-driven systems transform decision-making processes across domains such as air traffic control, industrial automation, and intelligent ports. However, the integration of AI introduces significant challenges, including the reconfiguration of human-AI team cognition, the need for adaptive AI memory that aligns with human distributed cognition, and the design of AI fallback operators to maintain continuity during communication disruptions. Drawing on theories of distributed and team cognition, we analyse how cognitive overload, loss of situational awareness, and impaired team coordination may arise in AI-supported environments. Based on real-world intelligent port scenarios, we propose research directions that aim to safeguard human reasoning and enhance collaborative decision-making in AI-augmented remote operations.

Authors:Michael Färber, Parisa Aghdam, Kyuri Im, Mario Tawfelis, Hardik Ghoshal
Title: SimplifyMyText: An LLM-Based System for Inclusive Plain Language Text Simplification
Abstract:
Text simplification is essential for making complex content accessible to diverse audiences who face comprehension challenges. Yet, the limited availability of simplified materials creates significant barriers to personal and professional growth and hinders social inclusion. Although researchers have explored various methods for automatic text simplification, none fully leverage large language models (LLMs) to offer tailored customization for different target groups and varying levels of simplicity. Moreover, despite its proven benefits for both consumers and organizations, the well-established practice of plain language remains underutilized. In this paper, we https://simplifymytext.org, the first system designed to produce plain language content from multiple input formats, including typed text and file uploads, with flexible customization options for diverse audiences. We employ GPT-4 and Llama-3 and evaluate outputs across multiple metrics. Overall, our work contributes to research on automatic text simplification and highlights the importance of tailored communication in promoting inclusivity.

Authors:Qazi Mamunur Rashid, Erin van Liemt, Tiffany Shih, Amber Ebinama, Karla Barrios Ramos, Madhurima Maji, Aishwarya Verma, Charu Kalia, Jamila Smith-Loud, Joyce Nakatumba-Nabende, Rehema Baguma, Andrew Katumba, Chodrine Mutebi, Jagen Marvin, Eric Peter Wairagala, Mugizi Bruce, Peter Oketta, Lawrence Nderu, Obichi Obiajunwa, Abigail Oppong, Michael Zimba, Data Authors
Title: Amplify Initiative: Building A Localized Data Platform for Globalized AI
Abstract:
Current AI models often fail to account for local context and language, given the predominance of English and Western internet content in their training data. This hinders the global relevance, usefulness, and safety of these models as they gain more users around the globe. Amplify Initiative, a data platform and methodology, leverages expert communities to collect diverse, high-quality data to address the limitations of these models. The platform is designed to enable co-creation of datasets, provide access to high-quality multilingual datasets, and offer recognition to data authors. This paper presents the approach to co-creating datasets with domain experts (e.g., health workers, teachers) through a pilot conducted in Sub-Saharan Africa (Ghana, Kenya, Malawi, Nigeria, and Uganda). In partnership with local researchers situated in these countries, the pilot demonstrated an end-to-end approach to co-creating data with 155 experts in sensitive domains (e.g., physicians, bankers, anthropologists, human and civil rights advocates). This approach, implemented with an Android app, resulted in an annotated dataset of 8,091 adversarial queries in seven languages (e.g., Luganda, Swahili, Chichewa), capturing nuanced and contextual information related to key themes such as misinformation and public interest topics. This dataset in turn can be used to evaluate models for their safety and cultural relevance within the context of these languages.

Authors:Stephen N. Freund, Brooke Simon, Emery D. Berger, Eunice Jun
Title: Flowco: Rethinking Data Analysis in the Age of LLMs
Abstract:
Conducting data analysis typically involves authoring code to transform, visualize, analyze, and interpret data. Large language models (LLMs) are now capable of generating such code for simple, routine analyses. LLMs promise to democratize data science by enabling those with limited programming expertise to conduct data analyses, including in scientific research, business, and policymaking. However, analysts in many real-world settings must often exercise fine-grained control over specific analysis steps, verify intermediate results explicitly, and iteratively refine their analytical approaches. Such tasks present barriers to building robust and reproducible analyses using LLMs alone or even in conjunction with existing authoring tools (e.g., computational notebooks). This paper introduces Flowco, a new mixed-initiative system to address these challenges. Flowco leverages a visual dataflow programming model and integrates LLMs into every phase of the authoring process. A user study suggests that Flowco supports analysts, particularly those with less programming experience, in quickly authoring, debugging, and refining data analyses.

Authors:Paul Taele, Tracy Hammond
Title: Hashigo: A Next Generation Sketch Interactive System for Japanese Kanji
Abstract:
Language students can increase their effectiveness in learning written Japanese by mastering the visual structure and written technique of Japanese kanji. Yet, existing kanji handwriting recognition systems do not assess the written technique sufficiently enough to discourage students from developing bad learning habits. In this paper, we describe our work on Hashigo, a kanji sketch interactive system which achieves human instructor-level critique and feedback on both the visual structure and written technique of students' sketched kanji. This type of automated critique and feedback allows students to target and correct specific deficiencies in their sketches that, if left untreated, are detrimental to effective long-term kanji learning.

Authors:Paul Taele, Laura Barreto, Tracy Hammond
Title: Maestoso: An Intelligent Educational Sketching Tool for Learning Music Theory
Abstract:
Learning music theory not only has practical benefits for musicians to write, perform, understand, and express music better, but also for both non-musicians to improve critical thinking, math analytical skills, and music appreciation. However, current external tools applicable for learning music theory through writing when human instruction is unavailable are either limited in feedback, lacking a written modality, or assuming already strong familiarity of music theory concepts. In this paper, we describe Maestoso, an educational tool for novice learners to learn music theory through sketching practice of quizzed music structures. Maestoso first automatically recognizes students' sketched input of quizzed concepts, then relies on existing sketch and gesture recognition techniques to automatically recognize the input, and finally generates instructor-emulated feedback. From our evaluations, we demonstrate that Maestoso performs reasonably well on recognizing music structure elements and that novice students can comfortably grasp introductory music theory in a single session.

Authors:Paul Taele, Jung In Koh, Tracy Hammond
Title: Kanji Workbook: A Writing-Based Intelligent Tutoring System for Learning Proper Japanese Kanji Writing Technique with Instructor-Emulated Assessment
Abstract:
Kanji script writing is a skill that is often introduced to novice Japanese foreign language students for achieving Japanese writing mastery, but often poses difficulties to students with primarily English fluency due to their its vast differences with written English. Instructors often introduce various pedagogical methods -- such as visual structure and written techniques -- to assist students in kanji study, but may lack availability providing direct feedback on students' writing outside of class. Current educational applications are also limited due to lacking richer instructor-emulated feedback. We introduce Kanji Workbook, a writing-based intelligent tutoring system for students to receive intelligent assessment that emulates human instructor feedback. Our interface not only leverages students' computing devices for allowing them to learn, practice, and review the writing of prompted characters from their course's kanji script lessons, but also provides a diverse set of writing assessment metrics -- derived from instructor interviews and classroom observation insights -- through intelligent scoring and visual animations. We deployed our interface onto novice- and intermediate-level university courses over an entire academic year, and observed that interface users on average achieved higher course grades than their peers and also reacted positively to our interface's various features.

Authors:Isabel Villanueva, Tara Bobinac, Binwei Yao, Junjie Hu, Kaiping Chen
Title: AI as a deliberative partner fosters intercultural empathy for Americans but fails for Latin American participants
Abstract:
Despite increasing AI chatbot deployment in public discourse, empirical evidence on their capacity to foster intercultural empathy remains limited. Through a randomized experiment, we assessed how different AI deliberation approaches--cross-cultural deliberation (presenting other-culture perspectives), own-culture deliberation (representing participants' own culture), and non-deliberative control--affect intercultural empathy across American and Latin American participants. Cross-cultural deliberation increased intercultural empathy among American participants through positive emotional engagement, but produced no such effects for Latin American participants, who perceived AI responses as culturally inauthentic despite explicit prompting to represent their cultural perspectives. Our analysis of participant-driven feedback, where users directly flagged and explained culturally inappropriate AI responses, revealed systematic gaps in AI's representation of Latin American contexts that persist despite sophisticated prompt engineering. These findings demonstrate that current approaches to AI cultural alignment--including linguistic adaptation and explicit cultural prompting--cannot fully address deeper representational asymmetries in AI systems. Our work advances both deliberation theory and AI alignment research by revealing how the same AI system can simultaneously promote intercultural understanding for one cultural group while failing for another, with critical implications for designing equitable AI systems for cross-cultural democratic discourse.

Authors:Sonya Falahati, Morteza Alizadeh, Fatemeh Ghazipour, Zhino Safahi, Navid Khaledian, Mohammad R. Salmanpour
Title: An AI-powered Public Health Automated Kiosk System for Personalized Care: An Experimental Pilot Study
Abstract:
Background: The HERMES Kiosk (Healthcare Enhanced Recommendations through Artificial Intelligence & Expertise System) is designed to provide personalized Over-the-Counter (OTC) medication recommendations, addressing the limitations of traditional health kiosks. It integrates an advanced GAMENet model enhanced with Graph Attention Networks (GAT) and Multi-Head Cross-Attention (MHCA) while ensuring user privacy through federated learning. This paper outlines the conceptual design and architecture of HERMES, with a focus on deployment in high-traffic public areas. Methods: HERMES analyzes self-reported symptoms and anonymized medical histories using AI algorithms to generate context-aware OTC medication recommendations. The system was initially trained using Electronic Health Records (EHR) from the MIMIC-III dataset (6,350 patients) and Drug-Drug Interaction (DDI) data from the TWOSIDES database, incorporating the top 90 severity DDI types. Real-time DDI checks and ATC-mapped drug codes further improve safety. The kiosk is designed for accessibility, offering multilingual support, large fonts, voice commands, and Braille compatibility. A built-in health education library promotes preventive care and health literacy. A survey was conducted among 10 medical professionals to evaluate its potential applications in medicine. Results: Preliminary results show that the enhanced GAMENet model achieved a Precision-Recall AUC (PRAUC) of 0.74, outperforming the original model. These findings suggest a strong potential for delivering accurate and secure healthcare recommendations in public settings. Conclusion: HERMES demonstrates how AI-driven, privacy-preserving kiosks can enhance public health access, empower users, and alleviate burdens on healthcare systems. Future work will focus on real-world deployment, usability testing, and scalability for broader adoption.

Authors:Xiangrong, Zhu, Yuan Xu, Tianjian Liu, Jingwei Sun, Yu Zhang, Xin Tong
Title: Intelligent Interaction Strategies for Context-Aware Cognitive Augmentation
Abstract:
Human cognition is constrained by processing limitations, leading to cognitive overload and inefficiencies in knowledge synthesis and decision-making. Large Language Models (LLMs) present an opportunity for cognitive augmentation, but their current reactive nature limits their real-world applicability. This position paper explores the potential of context-aware cognitive augmentation, where LLMs dynamically adapt to users' cognitive states and task environments to provide appropriate support. Through a think-aloud study in an exhibition setting, we examine how individuals interact with multi-modal information and identify key cognitive challenges in structuring, retrieving, and applying knowledge. Our findings highlight the need for AI-driven cognitive support systems that integrate real-time contextual awareness, personalized reasoning assistance, and socially adaptive interactions. We propose a framework for AI augmentation that seamlessly transitions between real-time cognitive support and post-experience knowledge organization, contributing to the design of more effective human-centered AI systems.

Authors:Niall McGuire, Yashar Moshfeghi
Title: On Error Classification from Physiological Signals within Airborne Environment
Abstract:
Human error remains a critical concern in aviation safety, contributing to 70-80% of accidents despite technological advancements. While physiological measures show promise for error detection in laboratory settings, their effectiveness in dynamic flight environments remains underexplored. Through live flight trials with nine commercial pilots, we investigated whether established error-detection approaches maintain accuracy during actual flight operations. Participants completed standardized multi-tasking scenarios across conditions ranging from laboratory settings to straight-and-level flight and 2G manoeuvres while we collected synchronized physiological data. Our findings demonstrate that EEG-based classification maintains high accuracy (87.83%) during complex flight manoeuvres, comparable to laboratory performance (89.23%). Eye-tracking showed moderate performance (82.50\%), while ECG performed near chance level (51.50%). Classification accuracy remained stable across flight conditions, with minimal degradation during 2G manoeuvres. These results provide the first evidence that physiological error detection can translate effectively to operational aviation environments.

Authors:Siwei Huang, Chenhao Yang, Chuan Hu
Title: Predicting Driver's Perceived Risk: a Model Based on Semi-Supervised Learning Strategy
Abstract:
Drivers' perception of risk determines their acceptance, trust, and use of the Automated Driving Systems (ADSs). However, perceived risk is subjective and difficult to evaluate using existing methods. To address this issue, a driver's subjective perceived risk (DSPR) model is proposed, regarding perceived risk as a dynamically triggered mechanism with anisotropy and attenuation. 20 participants are recruited for a driver-in-the-loop experiment to report their real-time subjective risk ratings (SRRs) when experiencing various automatic driving scenarios. A convolutional neural network and bidirectional long short-term memory network with temporal pattern attention (CNN-Bi-LSTM-TPA) is embedded into a semi-supervised learning strategy to predict SRRs, aiming to reduce data noise caused by subjective randomness of participants. The results illustrate that DSPR achieves the highest prediction accuracy of 87.91% in predicting SRRs, compared to three state-of-the-art risk models. The semi-supervised strategy improves accuracy by 20.12%. Besides, CNN-Bi-LSTM-TPA network presents the highest accuracy among four different LSTM structures. This study offers an effective method for assessing driver's perceived risk, providing support for the safety enhancement of ADS and driver's trust improvement.

Authors:Kris Pilcher, Esen K. Tütüncü
Title: Purposefully Induced Psychosis (PIP): Embracing Hallucination as Imagination in Large Language Models
Abstract:
Hallucinations in Large Language Models (LLMs) are widely regarded as errors - outputs that deviate from factual accuracy. However, in creative or exploratory contexts, these "mistakes" may represent unexpected avenues for innovation. We introduce Purposefully Induced Psychosis (PIP), a novel approach that amplifies LLM hallucinations for imaginative tasks such as speculative fiction, interactive storytelling, and mixed-reality simulations. Drawing on Herman Melville's Moby-Dick, where Pip's "madness" reveals profound insight, we reframe hallucinations as a source of computational imagination rather than a flaw. Our method fine-tunes LLMs to encourage speculative, metaphorical, and surreal outputs - hallucinations that are useful when factual accuracy is not the chief objective. Inspired by the consensual illusions of theater and stage magic, PIP situates these creative missteps in contexts where users willingly suspend disbelief, thereby transforming "errors" into catalysts for new ways of thinking. We discuss potential applications, design principles for ensuring user consent, preliminary observations, and implications for broader AI ethics and human-AI collaboration.

Authors:David Black, Septimiu Salcudean
Title: Linearity, Time Invariance, and Passivity of a Novice Person in Human Teleoperation
Abstract:
Low-cost teleguidance of medical procedures is becoming essential to provide healthcare to remote and underserved communities. Human teleoperation is a promising new method for guiding a novice person with relatively high precision and efficiency through a mixed reality (MR) interface. Prior work has shown that the novice, or "follower", can reliably track the MR input with performance not unlike a telerobotic system. As a consequence, it is of interest to understand and control the follower's dynamics to optimize the system performance and permit stable and transparent bilateral teleoperation. To this end, linearity, time-invariance, inter-axis coupling, and passivity are important in teleoperation and controller design. This paper therefore explores these effects with regard to the follower person in human teleoperation. It is demonstrated through modeling and experiments that the follower can indeed be treated as approximately linear and time invariant, with little coupling and a large excess of passivity at practical frequencies. Furthermore, a stochastic model of the follower dynamics is derived. These results will permit controller design and analysis to improve the performance of human teleoperation.

Authors:Siddharth Mehrotra, Ujwal Gadiraju, Eva Bittner, Folkert van Delden, Catholijn M. Jonker, Myrthe L. Tielman
Title: "Even explanations will not help in trusting [this] fundamentally biased system": A Predictive Policing Case-Study
Abstract:
In today's society, where Artificial Intelligence (AI) has gained a vital role, concerns regarding user's trust have garnered significant attention. The use of AI systems in high-risk domains have often led users to either under-trust it, potentially causing inadequate reliance or over-trust it, resulting in over-compliance. Therefore, users must maintain an appropriate level of trust. Past research has indicated that explanations provided by AI systems can enhance user understanding of when to trust or not trust the system. However, the utility of presentation of different explanations forms still remains to be explored especially in high-risk domains. Therefore, this study explores the impact of different explanation types (text, visual, and hybrid) and user expertise (retired police officers and lay users) on establishing appropriate trust in AI-based predictive policing. While we observed that the hybrid form of explanations increased the subjective trust in AI for expert users, it did not led to better decision-making. Furthermore, no form of explanations helped build appropriate trust. The findings of our study emphasize the importance of re-evaluating the use of explanations to build [appropriate] trust in AI based systems especially when the system's use is questionable. Finally, we synthesize potential challenges and policy recommendations based on our results to design for appropriate trust in high-risk based AI-based systems.

Authors:Audrey Zhang, Yifei Gao, Wannapon Suraworachet, Tanya Nazaretsky, Mutlu Cukurova
Title: Evaluating Trust in AI, Human, and Co-produced Feedback Among Undergraduate Students
Abstract:
As generative AI models, particularly large language models (LLMs), transform educational feedback practices in higher education (HE) contexts, understanding students' perceptions of different sources of feedback becomes crucial for their effective implementation and adoption. This study addresses a critical gap by comparing undergraduate students' trust in LLM, human, and human-AI co-produced feedback in their authentic HE context. More specifically, through a within-subject experimental design involving 91 participants, we investigated factors that predict students' ability to distinguish between feedback types, their perceptions of feedback quality, and potential biases related to the source of feedback. Findings revealed that when the source was blinded, students generally preferred AI and co-produced feedback over human feedback regarding perceived usefulness and objectivity. However, they presented a strong bias against AI when the source of feedback was disclosed. In addition, only AI feedback suffered a decline in perceived genuineness when feedback sources were revealed, while co-produced feedback maintained its positive perception. Educational AI experience improved students' ability to identify LLM-generated feedback and increased their trust in all types of feedback. More years of students' experience using AI for general purposes were associated with lower perceived usefulness and credibility of feedback. These insights offer substantial evidence of the importance of source credibility and the need to enhance both feedback literacy and AI literacy to mitigate bias in student perceptions for AI-generated feedback to be adopted and impact education.

Authors:Annabella Sakunkoo, Jonathan Sakunkoo
Title: Name of Thrones: Evaluating How LLMs Rank Student Names, Race, and Gender in Status Hierarchies
Abstract:
Across cultures, names tell a lot about their bearers as they carry deep personal and cultural significance. Names also serve as powerful signals of gender, race, and status in the social hierarchy - a pecking order in which individual positions shape others' expectations on their perceived competence and worth. With the widespread adoption of LLMs and as names are often an input for LLMs, it is crucial to evaluate whether LLMs may sort people into status positions based on first and last names and, if so, whether it is in an unfair, biased fashion. While prior work has primarily investigated biases in first names, little attention has been paid to last names and even less to the combined effects of first and last names. In this study, we conduct a large-scale analysis of name variations across 5 ethnicities to examine how AI exhibits name biases. Our study investigates three key characteristics of inequality and finds that LLMs reflect and reinforce status hierarchies based on names that signal gender and ethnicity as they encode differential expectations of competence, leadership, and economic potential. Contrary to the common assumption that AI tends to favor Whites, we show that East and, in some contexts, South Asian names receive higher rankings. We also disaggregate Asians, a population projected to be the largest immigrant group in the U.S. by 2055. Our results challenge the monolithic Asian model minority assumption, illustrating a more complex and stratified model of bias. Gender moderates biases, with girls facing unfair disadvantages in certain racial groups. Additionally, spanning cultural categories by adopting Western first names improves AI-perceived status for East and Southeast Asian students, particularly for girls. Our findings underscore the importance of intersectional and more nuanced understandings of race, gender, and mixed identities in the evaluation of LLMs.

Authors:HwiJoon Lee, Kashif Imteyaz, Saiph Savage
Title: Playing to Pay: Interplay of Monetization and Retention Strategies in Korean Mobile Gaming
Abstract:
Mobile gaming's global growth has introduced evolving monetization strategies, such as in app purchases and ads, designed to boost revenue while maintaining player engagement. However, there is limited understanding of the scope and frequency of these strategies, particularly in mature markets like South Korea. To address this research gap, this study examines the monetization strategies used in the top 40 most popular Korean mobile games through direct gameplay observations and targeted video analyses. We identified the prevalence of specific strategies, including time gated progression, Conflict Driven Design, and social Dynamics, which are systematically categorized in our proposed framework for monetization. Our findings also highlight ethical concerns, including issues with transparency, probability disclosures, and the exploitation of competitive pressures areas that remain poorly regulated. To address these challenges, we emphasize the need for stricter consumer protections, cross regional research, and greater focus on protecting vulnerable populations to promote a more equitable and responsible gaming environment.

Authors:Shravika Mittal, Darshi Shah, Shin Won Do, Mai ElSherief, Tanushree Mitra, Munmun De Choudhury
Title: Exposure to Content Written by Large Language Models Can Reduce Stigma Around Opioid Use Disorder in Online Communities
Abstract:
Widespread stigma, both in the offline and online spaces, acts as a barrier to harm reduction efforts in the context of opioid use disorder (OUD). This stigma is prominently directed towards clinically approved medications for addiction treatment (MAT), people with the condition, and the condition itself. Given the potential of artificial intelligence based technologies in promoting health equity, and facilitating empathic conversations, this work examines whether large language models (LLMs) can help abate OUD-related stigma in online communities. To answer this, we conducted a series of pre-registered randomized controlled experiments, where participants read LLM-generated, human-written, or no responses to help seeking OUD-related content in online communities. The experiment was conducted under two setups, i.e., participants read the responses either once (N = 2,141), or repeatedly for 14 days (N = 107). We found that participants reported the least stigmatized attitudes toward MAT after consuming LLM-generated responses under both the setups. This study offers insights into strategies that can foster inclusive online discourse on OUD, e.g., based on our findings LLMs can be used as an education-based intervention to promote positive attitudes and increase people's propensity toward MAT.

Authors:Pratham Darrpan Mehta, Rahul Ozhur Narayanan, Vidhi Kulkarni, Timothy Slesnick, Fawwaz Shaw, Duen Horng Chau
Title: HybridCollab: Unifying In-Person and Remote Collaboration for Cardiovascular Surgical Planning in Mobile Augmented Reality
Abstract:
Surgical planning for congenital heart disease traditionally relies on collaborative group examinations of a patient's 3D-printed heart model, a process that lacks flexibility and accessibility. While mobile augmented reality (AR) offers a promising alternative with its portability and familiar interaction gestures, existing solutions limit collaboration to users in the same physical space. We developed HybridCollab, the first iOS AR application that introduces a novel paradigm that enables both in-person and remote medical teams to interact with a shared AR heart model in a single surgical planning session. For example, a team of two doctors in one hospital room can collaborate in real time with another team in a different hospital.Our approach is the first to leverage Apple's GameKit service for surgical planning, ensuring an identical collaborative experience for all participants, regardless of location. Additionally, co-located users can interact with the same anchored heart model in their shared physical space. By bridging the gap between remote and in-person collaboration across medical teams, HybridCollab has the potential for significant real-world impact, streamlining communication and enhancing the effectiveness of surgical planning. Watch the demo: https://youtu.be/hElqJYDuvLM.

Authors:Song Yang, Haotian Fu, Herui Zhang, Peng Zhang, Wei Li, Dongrui Wu
Title: Spiking Neural Network for Intra-cortical Brain Signal Decoding
Abstract:
Decoding brain signals accurately and efficiently is crucial for intra-cortical brain-computer interfaces. Traditional decoding approaches based on neural activity vector features suffer from low accuracy, whereas deep learning based approaches have high computational cost. To improve both the decoding accuracy and efficiency, this paper proposes a spiking neural network (SNN) for effective and energy-efficient intra-cortical brain signal decoding. We also propose a feature fusion approach, which integrates the manually extracted neural activity vector features with those extracted by a deep neural network, to further improve the decoding accuracy. Experiments in decoding motor-related intra-cortical brain signals of two rhesus macaques demonstrated that our SNN model achieved higher accuracy than traditional artificial neural networks; more importantly, it was tens or hundreds of times more efficient. The SNN model is very suitable for high precision and low power applications like intra-cortical brain-computer interfaces.

Authors:Hyeonggeun Yun, Jinkyu Jang
Title: UX Remix: Improving Measurement Item Design Process Using Large Language Models and Prior Literature
Abstract:
Researchers often struggle to develop measurement items and lack a standardized process. To support the design process, we present UX Remix, a system to help researchers develop constructs and measurement items using large language models (LLMs). UX Remix leverages a database of constructs and associated measurement items from previous papers. Based on the data, UX Remix recommends constructs relevant to the research context. The researchers then select appropriate constructs based on the recommendations. Afterward, selected constructs are used to generate a custom construct, and UX Remix recommends measurement items. UX Remix streamlines the process of selecting constructs, developing measurement items, and adapting them to research contexts, addressing challenges in the selection and reuse of measurement items. This paper describes the implementation of the system, the potential benefits, and future directions to improve the rigor and efficiency of measurement design in human-computer interaction (HCI) research.

Authors:Alessandro Carcangiu, Marco Manca, Jacopo Mereu, Carmen Santoro, Ludovica Simeoli, Lucio Davide Spano
Title: Tell-XR: Conversational End-User Development of XR Automations
Abstract:
The availability of extended reality (XR) devices has widened their adoption, yet authoring interactive experiences remains complex for non-programmers. We introduce Tell-XR, an intelligent agent leveraging large language models (LLMs) to guide end-users in defining the interaction in XR settings using automations described as Event-Condition-Action (ECA) rules. Through a formative study, we identified the key conversation stages to define and refine automations, which informed the design of the system architecture. The evaluation study in two scenarios (a VR museum and an AR smart home) demonstrates the effectiveness of Tell-XR across different XR interaction settings.

Authors:Michael Yin, Chenxinran Shen, Robert Xiao
Title: Entertainers Between Real and Virtual -- Investigating Viewer Interaction, Engagement, and Relationships with Avatarized Virtual Livestreamers
Abstract:
Virtual YouTubers (VTubers) are avatar-based livestreamers that are voiced and played by human actors. VTubers have been popular in East Asia for years and have more recently seen widespread international growth. Despite their emergent popularity, research has been scarce into the interactions and relationships that exist between avatarized VTubers and their viewers, particularly in contrast to non-avatarized streamers. To address this gap, we performed in-depth interviews with self-reported VTuber viewers (n=21). Our findings first reveal that the avatarized nature of VTubers fosters new forms of theatrical engagement, as factors of the virtual blend with the real to create a mixture of fantasy and realism in possible livestream interactions. Avatarization furthermore results in a unique audience perception regarding the identity of VTubers - an identity which comprises a dynamic, distinct mix of the real human (the voice actor/actress) and the virtual character. Our findings suggest that each of these dual identities both individually and symbiotically affect viewer interactions and relationships with VTubers. Whereas the performer's identity mediates social factors such as intimacy, relatability, and authenticity, the virtual character's identity offers feelings of escapism, novelty in interactions, and a sense of continuity beyond the livestream. We situate our findings within existing livestreaming literature to highlight how avatarization drives unique, character-based interactions as well as reshapes the motivations and relationships that viewers form with livestreamers. Finally, we provide suggestions and recommendations for areas of future exploration to address the challenges involved in present livestreamed avatarized entertainment.

Authors:Michael Yin, Robert Xiao
Title: VIBES: Exploring Viewer Spatial Interactions as Direct Input for Livestreamed Content
Abstract:
Livestreaming has rapidly become a popular online pastime, with real-time interaction between streamer and viewer being a key motivating feature. However, viewers have traditionally had limited opportunity to directly influence the streamed content; even when such interactions are possible, it has been reliant on text-based chat. We investigate the potential of spatial interaction on the livestreamed video content as a form of direct, real-time input for livestreamed applications. We developed VIBES, a flexible digital system that registers viewers' mouse interactions on the streamed video, i.e., clicks or movements, and transmits it directly into the streamed application. We used VIBES as a technology probe; first designing possible demonstrative interactions and using these interactions to explore streamers' perception of viewer influence and possible challenges and opportunities. We then deployed applications built using VIBES in two livestreams to explore its effects on audience engagement and investigate their relationships with the stream, the streamer, and fellow audience members. The use of spatial interactions enhances engagement and participation and opens up new avenues for both streamer-viewer and viewer-viewer participation. We contextualize our findings around a broader understanding of motivations and engagement in livestreaming, and we propose design guidelines and extensions for future research.

Authors:Joshua Hatherley, Robert Sparrow
Title: Diachronic and synchronic variation in the performance of adaptive machine learning systems: The ethical challenges
Abstract:
Objectives: Machine learning (ML) has the potential to facilitate "continual learning" in medicine, in which an ML system continues to evolve in response to exposure to new data over time, even after being deployed in a clinical setting. In this paper, we provide a tutorial on the range of ethical issues raised by the use of such "adaptive" ML systems in medicine that have, thus far, been neglected in the literature. Target audience: The target audiences for this tutorial are the developers of machine learning AI systems, healthcare regulators, the broader medical informatics community, and practicing clinicians. Scope: Discussions of adaptive ML systems to date have overlooked the distinction between two sorts of variance that such systems may exhibit -- diachronic evolution (change over time) and synchronic variation (difference between cotemporaneous instantiations of the algorithm at different sites) -- and under-estimated the significance of the latter. We highlight the challenges that diachronic evolution and synchronic variation present for the quality of patient care, informed consent, and equity, and discuss the complex ethical trade-offs involved in the design of such systems.

Authors:Kenneth C. Arnold, Jiho Kim
Title: Interaction-Required Suggestions for Control, Ownership, and Awareness in Human-AI Co-Writing
Abstract:
This paper explores interaction designs for generative AI interfaces that necessitate human involvement throughout the generation process. We argue that such interfaces can promote cognitive engagement, agency, and thoughtful decision-making. Through a case study in text revision, we present and analyze two interaction techniques: (1) using a predictive-text interaction to type the assistant's response to a revision request, and (2) highlighting potential edit opportunities in a document. Our implementations demonstrate how these approaches reveal the landscape of writing possibilities and enable fine-grained control. We discuss implications for human-AI writing partnerships and future interaction design directions.

Authors:Chenge Tang, Karthikeya Puttur Venkatraj, Hongbo Liu, Christina Schneegass, Gijs Huisman, Abdallah El Ali
Title: Dark Haptics: Exploring Manipulative Haptic Design in Mobile User Interfaces
Abstract:
Mobile user interfaces abundantly feature so-called 'dark patterns'. These deceptive design practices manipulate users' decision making to profit online service providers. While past research on dark patterns mainly focus on visual design, other sensory modalities such as audio and touch remain largely unexplored. In this early work, we investigate the manipulative side of haptics, which we term as 'Dark Haptics', as a strategy to manipulate users. We designed a study to empirically showcase the potential of using a dark haptic pattern in a mobile device to manipulate user actions in a survey. Our findings indicate that our dark haptic design successfully influenced participants to forego their privacy after experiencing an alarming feedback for rejecting intrusive requests in the survey. As a first exploration of manipulative qualities of dark haptic designs, we attempt to lay the groundwork for future research and tools to mitigate harms and risks of dark haptics.

Authors:Ilhan Aslan, Timothy Merritt, Stine S. Johansen, Niels van Berkel
Title: Speech Command + Speech Emotion: Exploring Emotional Speech Commands as a Compound and Playful Modality
Abstract:
In an era of human-computer interaction with increasingly agentic AI systems capable of connecting with users conversationally, speech is an important modality for commanding agents. By recognizing and using speech emotions (i.e., how a command is spoken), we can provide agents with the ability to emotionally accentuate their responses and socially enrich users' perceptions and experiences. To explore the concept and impact of speech emotion commands on user perceptions, we realized a prototype and conducted a user study (N = 14) where speech commands are used to steer two vehicles in a minimalist and retro game style implementation. While both agents execute user commands, only one of the agents uses speech emotion information to adapt its execution behavior. We report on differences in how users perceived each agent, including significant differences in stimulation and dependability, outline implications for designing interactions with agents using emotional speech commands, and provide insights on how users consciously emote, which we describe as "voice acting".

Authors:Zefan Sramek, Koji Yatani
Title: Research as Resistance: Recognizing and Reconsidering HCI's Role in Technology Hype Cycles
Abstract:
The history of information technology development has been characterized by consecutive waves of boom and bust, as new technologies come to market, fuel surges of investment, and then stabilize towards maturity. However, in recent decades, the acceleration of such technology hype cycles has resulted in the prioritization of massive capital generation at the expense of longterm sustainability, resulting in a cascade of negative social, political, and environmental consequences. Despite the negative impacts of this pattern, academic research, and in particular HCI research, is not immune from such hype cycles, often contributing substantial amounts of literature to the discourse surrounding a wave of hype. In this paper, we discuss the relationship between technology and capital, offer a critique of the technology hype cycle using generative AI as an example, and finally suggest an approach and a set of strategies for how we can counteract such cycles through research as resistance.

Authors:Denielle Oliva, Joshua Knight, Tyler J Becker, Heather Amistani, Monica Nicolescu, David Feil-Seifer
Title: Design Activity for Robot Faces: Evaluating Child Responses To Expressive Faces
Abstract:
Facial expressiveness plays a crucial role in a robot's ability to engage and interact with children. Prior research has shown that expressive robots can enhance child engagement during human-robot interactions. However, many robots used in therapy settings feature non-personalized, static faces designed with traditional facial feature considerations, which can limit the depth of interactions and emotional connections. Digital faces offer opportunities for personalization, yet the current landscape of robot face design lacks a dynamic, user-centered approach. Specifically, there is a significant research gap in designing robot faces based on child preferences. Instead, most robots in child-focused therapy spaces are developed from an adult-centric perspective. We present a novel study investigating the influence of child-drawn digital faces in child-robot interactions. This approach focuses on a design activity with children instructed to draw their own custom robot faces. We compare the perceptions of social intelligence (PSI) of two implementations: a generic digital face and a robot face, personalized using the user's drawn robot faces. The results of this study show the perceived social intelligence of a child-drawn robot was significantly higher compared to a generic face.

Authors:Sahar Niknam, Saravanakumar Duraisamy, Jean Botev, Luis A. Leiva
Title: Brain Signatures of Time Perception in Virtual Reality
Abstract:
Achieving a high level of immersion and adaptation in virtual reality (VR) requires precise measurement and representation of user state. While extrinsic physical characteristics such as locomotion and pose can be accurately tracked in real-time, reliably capturing mental states is more challenging. Quantitative psychology allows considering more intrinsic features like emotion, attention, or cognitive load. Time perception, in particular, is strongly tied to users' mental states, including stress, focus, and boredom. However, research on objectively measuring the pace at which we perceive the passage of time is scarce. In this work, we investigate the potential of electroencephalography (EEG) as an objective measure of time perception in VR, exploring neural correlates with oscillatory responses and time-frequency analysis. To this end, we implemented a variety of time perception modulators in VR, collected EEG recordings, and labeled them with overestimation, correct estimation, and underestimation time perception states. We found clear EEG spectral signatures for these three states, that are persistent across individuals, modulators, and modulation duration. These signatures can be integrated and applied to monitor and actively influence time perception in VR, allowing the virtual environment to be purposefully adapted to the individual to increase immersion further and improve user experience. A free copy of this paper and all supplemental materials are available at https://vrarlab.uni.lu/pub/brain-signatures.

Authors:Daniel Hove Paludan, Julie Fredsgård, Kasper Patrick Bährentz, Ilhan Aslan, Niels van Berkel
Title: Towards Sustainable Creativity Support: An Exploratory Study on Prompt Based Image Generation
Abstract:
Creativity is a valuable human skill that has long been augmented through both analog and digital tools. Recent progress in generative AI, such as image generation, provides a disruptive technological solution to supporting human creativity further and helping humans generate solutions faster. While AI image generators can help to rapidly visualize ideas based on user prompts, the use of such AI systems has also been critiqued due to their considerable energy usage. In this paper, we report on a user study (N = 24) to understand whether energy consumption can be reduced without impeding on the tool's perceived creativity support. Our results highlight that, for example, a main effect of (image generation) condition on energy consumption, and index of creativity support per prompt but not per task, which seem mainly attributed to image quantity per prompt. We provide details of our analysis on the relation between energy usage, creativity support, and prompting behavior, including attitudes towards designing with AI and its environmental impact.

Authors:Niklas Elmqvist, Clemens Nylandsted Klokmose
Title: Automating the Path: An R&D Agenda for Human-Centered AI and Visualization
Abstract:
The emergence of generative AI, large language models (LLMs), and foundation models is fundamentally reshaping computer science, and visualization and visual analytics are no exception. We present a systematic framework for understanding how human-centered AI (HCAI) can transform the visualization discipline. Our framework maps four key HCAI tool capabilities -- amplify, augment, empower, and enhance -- onto the four phases of visual sensemaking: view, explore, schematize, and report. For each combination, we review existing tools, envision future possibilities, identify challenges and pitfalls, and examine ethical considerations. This design space can serve as an R\&D agenda for both visualization researchers and practitioners to integrate AI into their work as well as understanding how visualization can support HCAI research.

Authors:Thomas M. Kwok, Jiaan Li, Yue Hu
Title: Leveraging GCN-based Action Recognition for Teleoperation in Daily Activity Assistance
Abstract:
Caregiving of older adults is an urgent global challenge, with many older adults preferring to age in place rather than enter residential care. However, providing adequate home-based assistance remains difficult, particularly in geographically vast regions. Teleoperated robots offer a promising solution, but conventional motion-mapping teleoperation imposes unnatural movement constraints on operators, leading to muscle fatigue and reduced usability. This paper presents a novel teleoperation framework that leverages action recognition to enable intuitive remote robot control. Using our simplified Spatio-Temporal Graph Convolutional Network (S-ST-GCN), the system recognizes human actions and executes corresponding preset robot trajectories, eliminating the need for direct motion synchronization. A finite-state machine (FSM) is integrated to enhance reliability by filtering out misclassified actions. Our experiments demonstrate that the proposed framework enables effortless operator movement while ensuring accurate robot execution. This proof-of-concept study highlights the potential of teleoperation with action recognition for enabling caregivers to remotely assist older adults during activities of daily living (ADLs). Future work will focus on improving the S-ST-GCN's recognition accuracy and generalization, integrating advanced motion planning techniques to further enhance robotic autonomy in older adult care, and conducting a user study to evaluate the system's telepresence and ease of control.

Authors:Zhehui Liao, Hanwen Zhao, Ayush Kulkarni, Shaan Singh Chattrath, Amy X. Zhang
Title: Building Proactive and Instant-Reactive Safety Designs to Address Harassment in Social Virtual Reality
Abstract:
Social Virtual Reality (VR) games offer immersive socialization experiences but pose significant challenges of harassment. Common solutions, such as reporting and moderation, address harassment after it happens but fail to prevent or stop harassment in the moment. In this study, we explore and design proactive and instant-reactive safety designs to mitigate harassment in social VR. Proactive designs prevent harassment from occurring, while instant-reactive designs minimize harm during incidents. We explore three directions for design: user-initiated personal bubbles, clarifying social norms, and encouraging bystander intervention. Through an iterative process, we first conducted a formative interview study to determine design goals for making these features effective, fit user needs, and robust to manipulation. We then implemented Puffer, an integrated safety system that includes a suite of proactive and instant-reactive features, as a social VR prototype. From an evaluation using simulated scenarios with participants, we find evidence that Puffer can help protect players during emergencies, foster prosocial norms, and create more positive social interactions. We conclude by discussing how system safety features can be designed to complement existing proactive and instant-reactive strategies, particularly for people with marginalized identities.

Authors:Rui Qiu, Yamei Tu, Po-Yin Yen, Han-Wei Shen
Title: VADIS: A Visual Analytics Pipeline for Dynamic Document Representation and Information-Seeking
Abstract:
In the biomedical domain, visualizing the document embeddings of an extensive corpus has been widely used in information-seeking tasks. However, three key challenges with existing visualizations make it difficult for clinicians to find information efficiently. First, the document embeddings used in these visualizations are generated statically by pretrained language models, which cannot adapt to the user's evolving interest. Second, existing document visualization techniques cannot effectively display how the documents are relevant to users' interest, making it difficult for users to identify the most pertinent information. Third, existing embedding generation and visualization processes suffer from a lack of interpretability, making it difficult to understand, trust and use the result for decision-making. In this paper, we present a novel visual analytics pipeline for user driven document representation and iterative information seeking (VADIS). VADIS introduces a prompt-based attention model (PAM) that generates dynamic document embedding and document relevance adjusted to the user's query. To effectively visualize these two pieces of information, we design a new document map that leverages a circular grid layout to display documents based on both their relevance to the query and the semantic similarity. Additionally, to improve the interpretability, we introduce a corpus-level attention visualization method to improve the user's understanding of the model focus and to enable the users to identify potential oversight. This visualization, in turn, empowers users to refine, update and introduce new queries, thereby facilitating a dynamic and iterative information-seeking experience. We evaluated VADIS quantitatively and qualitatively on a real-world dataset of biomedical research papers to demonstrate its effectiveness.

Authors:Jinhe Wen, Yingxi Zhao, Wenqian Xu, Yaxing Yao, Haojian Jin
Title: Teaching Data Science Students to Sketch Privacy Designs through Heuristics (Extended Technical Report)
Abstract:
Recent studies reveal that experienced data practitioners often draw sketches to facilitate communication around privacy design concepts. However, there is limited understanding of how we can help novice students develop such communication skills. This paper studies methods for lowering novice data science students' barriers to creating high-quality privacy sketches. We first conducted a need-finding study (N=12) to identify barriers students face when sketching privacy designs. We then used a human-centered design approach to guide the method development, culminating in three simple, text-based heuristics. Our user studies with 24 data science students revealed that simply presenting three heuristics to the participants at the beginning of the study can enhance the coverage of privacy-related design decisions in sketches, reduce the mental effort required for creating sketches, and improve the readability of the final sketches.

Authors:Uri Menkes, Assaf Hallak, Ofra Amir
Title: "Trust me on this" Explaining Agent Behavior to a Human Terminator
Abstract:
Consider a setting where a pre-trained agent is operating in an environment and a human operator can decide to temporarily terminate its operation and take-over for some duration of time. These kind of scenarios are common in human-machine interactions, for example in autonomous driving, factory automation and healthcare. In these settings, we typically observe a trade-off between two extreme cases -- if no take-overs are allowed, then the agent might employ a sub-optimal, possibly dangerous policy. Alternatively, if there are too many take-overs, then the human has no confidence in the agent, greatly limiting its usefulness. In this paper, we formalize this setup and propose an explainability scheme to help optimize the number of human interventions.

Authors:Shiyan Liu, Rui Qu, Yan Jin
Title: FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency
Abstract:
Generating consecutive images of lip movements that align with a given speech in audio-driven lip synthesis is a challenging task. While previous studies have made strides in synchronization and visual quality, lip intelligibility and video fluency remain persistent challenges. This work proposes FluentLip, a two-stage approach for audio-driven lip synthesis, incorporating three featured strategies. To improve lip synchronization and intelligibility, we integrate a phoneme extractor and encoder to generate a fusion of audio and phoneme information for multimodal learning. Additionally, we employ optical flow consistency loss to ensure natural transitions between image frames. Furthermore, we incorporate a diffusion chain during the training of Generative Adversarial Networks (GANs) to improve both stability and efficiency. We evaluate our proposed FluentLip through extensive experiments, comparing it with five state-of-the-art (SOTA) approaches across five metrics, including a proposed metric called Phoneme Error Rate (PER) that evaluates lip pose intelligibility and video fluency. The experimental results demonstrate that our FluentLip approach is highly competitive, achieving significant improvements in smoothness and naturalness. In particular, it outperforms these SOTA approaches by approximately $\textbf{16.3%}$ in Fréchet Inception Distance (FID) and $\textbf{35.2%}$ in PER.

Authors:Jakob Schoeffer, Maria De-Arteaga, Jonathan Elmer
Title: Perils of Label Indeterminacy: A Case Study on Prediction of Neurological Recovery After Cardiac Arrest
Abstract:
The design of AI systems to assist human decision-making typically requires the availability of labels to train and evaluate supervised models. Frequently, however, these labels are unknown, and different ways of estimating them involve unverifiable assumptions or arbitrary choices. In this work, we introduce the concept of label indeterminacy and derive important implications in high-stakes AI-assisted decision-making. We present an empirical study in a healthcare context, focusing specifically on predicting the recovery of comatose patients after resuscitation from cardiac arrest. Our study shows that label indeterminacy can result in models that perform similarly when evaluated on patients with known labels, but vary drastically in their predictions for patients where labels are unknown. After demonstrating crucial ethical implications of label indeterminacy in this high-stakes context, we discuss takeaways for evaluation, reporting, and design.

Authors:Kazuya Izumi, Akihisa Shitara, Yoichi Ochiai
Title: See-Through Face Display for DHH People: Enhancing Gaze Awareness in Remote Sign Language Conversations with Camera-Behind Displays
Abstract:
This paper presents a sign language conversation system based on the See-Through Face Display to address the challenge of maintaining eye contact in remote sign language interactions. A camera positioned behind a transparent display allows users to look at the face of their conversation partner while appearing to maintain direct eye contact. Unlike conventional methods that rely on software-based gaze correction or large-scale half-mirror setups, this design reduces visual distortions and simplifies installation. We implemented and evaluated a videoconferencing system that integrates See-Through Face Display, comparing it to traditional videoconferencing methods. We explore its potential applications for Deaf and Hard of Hearing (DHH), including multi-party sign language conversations, corpus collection, remote interpretation, and AI-driven sign language avatars. Collaboration with DHH communities will be key to refining the system for real-world use and ensuring its practical deployment.

Authors:Manh Pham Hung, Matthew Yiwen Ho, Yiming Zhang, Dimitris Spathis, Aaqib Saeed, Dong Ma
Title: Reliable Physiological Monitoring on the Wrist Using Generative Deep Learning to Address Poor Skin-Sensor Contact
Abstract:
Photoplethysmography (PPG) is a widely adopted, non-invasive technique for monitoring cardiovascular health and physiological parameters in both consumer and clinical settings. While motion artifacts in dynamic environments have been extensively studied, suboptimal skin-sensor contact in sedentary conditions - a critical yet underexplored issue - can distort PPG waveform morphology, leading to the loss or misalignment of key features and compromising sensing accuracy. In this work, we propose CP-PPG, a novel framework that transforms Contact Pressure-distorted PPG signals into high-fidelity waveforms with ideal morphology. CP-PPG integrates a custom data collection protocol, a carefully designed signal processing pipeline, and a novel deep adversarial model trained with a custom PPG-aware loss function. We validated CP-PPG through comprehensive evaluations, including 1) morphology transformation performance on our self-collected dataset, 2) downstream physiological monitoring performance on public datasets, and 3) in-the-wild study. Extensive experiments demonstrate substantial and consistent improvements in signal fidelity (Mean Absolute Error: 0.09, 40% improvement over the original signal) as well as downstream performance across all evaluations in Heart Rate (HR), Heart Rate Variability (HRV), Respiration Rate (RR), and Blood Pressure (BP) estimation (on average, 21% improvement in HR; 41-46% in HRV; 6% in RR; and 4-5% in BP). These findings highlight the critical importance of addressing skin-sensor contact issues to enhance the reliability and effectiveness of PPG-based physiological monitoring. CP-PPG thus holds significant potential to improve the accuracy of wearable health technologies in clinical and consumer applications.

Authors:Juliett Suárez Ferreira, Marija Slavkovik, Jorge Casillas
Title: Am I Being Treated Fairly? A Conceptual Framework for Individuals to Ascertain Fairness
Abstract:
Current fairness metrics and mitigation techniques provide tools for practitioners to asses how non-discriminatory Automatic Decision Making (ADM) systems are. What if I, as an individual facing a decision taken by an ADM system, would like to know: Am I being treated fairly? We explore how to create the affordance for users to be able to ask this question of ADM. In this paper, we argue for the reification of fairness not only as a property of ADM, but also as an epistemic right of an individual to acquire information about the decisions that affect them and use that information to contest and seek effective redress against those decisions, in case they are proven to be discriminatory. We examine key concepts from existing research not only in algorithmic fairness but also in explainable artificial intelligence, accountability, and contestability. Integrating notions from these domains, we propose a conceptual framework to ascertain fairness by combining different tools that empower the end-users of ADM systems. Our framework shifts the focus from technical solutions aimed at practitioners to mechanisms that enable individuals to understand, challenge, and verify the fairness of decisions, and also serves as a blueprint for organizations and policymakers, bridging the gap between technical requirements and practical, user-centered accountability.

Authors:Jacy Reese Anthis, Ryan Liu, Sean M. Richardson, Austin C. Kozlowski, Bernard Koch, James Evans, Erik Brynjolfsson, Michael Bernstein
Title: LLM Social Simulations Are a Promising Research Method
Abstract:
Accurate and verifiable large language model (LLM) simulations of human research subjects promise an accessible data source for understanding human behavior and training new AI systems. However, results to date have been limited, and few social scientists have adopted this method. In this position paper, we argue that the promise of LLM social simulations can be achieved by addressing five tractable challenges. We ground our argument in a review of empirical comparisons between LLMs and human research subjects, commentaries on the topic, and related work. We identify promising directions, including context-rich prompting and fine-tuning with social science datasets. We believe that LLM social simulations can already be used for pilot and exploratory studies, and more widespread use may soon be possible with rapidly advancing LLM capabilities. Researchers should prioritize developing conceptual models and iterative evaluations to make the best use of new AI systems.

Authors:Alexandre L. S. Filipowicz, David A. Shamma, Vikram Mohanty, Candice L. Hogan
Title: Demystifying CO2: lessons from nutrition labeling and step counting
Abstract:
There is growing concern about climate change and increased interest in taking action. However, people have difficulty understanding abstract units like CO2 and the relative environmental impact of different behaviors. This position piece explores findings from nutritional labeling and step counting research, two domains aimed at making abstract concepts (i.e., calories and exercise) more familiar to the general public. Research in these two domains suggests that consistent, widespread communication can make people more familiar and think more precisely about abstract units, but that better communication and understanding does not guarantee behavior change. These findings suggest that consistent and ubiquitous communication can make CO2 units more familiar to people, which in turn could help interventions aimed at encouraging more sustainable behaviors.

Authors:Alexa Siu, Raymond Fok
Title: Augmenting Expert Cognition in the Age of Generative AI: Insights from Document-Centric Knowledge Work
Abstract:
As Generative AI (GenAI) capabilities expand, understanding how to preserve and develop human expertise while leveraging AI's benefits becomes increasingly critical. Through empirical studies in two contexts -- survey article authoring in scholarly research and business document sensemaking -- we examine how domain expertise shapes patterns of AI delegation and information processing among knowledge workers. Our findings reveal that while experts welcome AI assistance with repetitive information foraging tasks, they prefer to retain control over complex synthesis and interpretation activities that require nuanced domain understanding. We identify implications for designing GenAI systems that support expert cognition. These include enabling selective delegation aligned with expertise levels, preserving expert agency over critical analytical tasks, considering varying levels of domain expertise in system design, and supporting verification mechanisms that help users calibrate their reliance while deepening expertise. We discuss the inherent tension between reducing cognitive load through automation and maintaining the deliberate practice necessary for expertise development. Lastly, we suggest approaches for designing systems that provide metacognitive support, moving beyond simple task automation toward actively supporting expertise development. This work contributes to our understanding of how to design AI systems that augment rather than diminish human expertise in document-centric workflows.

Authors:Kailas Vodrahalli, Wei Wei, James Zou
Title: Learning a Canonical Basis of Human Preferences from Binary Ratings
Abstract:
Recent advances in generative AI have been driven by alignment techniques such as reinforcement learning from human feedback (RLHF). RLHF and related techniques typically involve constructing a dataset of binary or ranked choice human preferences and subsequently fine-tuning models to align with these preferences. This paper shifts the focus to understanding the preferences encoded in such datasets and identifying common human preferences. We find that a small subset of 21 preference categories (selected from a set of nearly 5,000 distinct preferences) captures >89% of preference variation across individuals. This small set of preferences is analogous to a canonical basis of human preferences, similar to established findings that characterize human variation in psychology or facial recognition studies. Through both synthetic and empirical evaluations, we confirm that our low-rank, canonical set of human preferences generalizes across the entire dataset and within specific topics. We further demonstrate our preference basis' utility in model evaluation, where our preference categories offer deeper insights into model alignment, and in model training, where we show that fine-tuning on preference-defined subsets successfully aligns the model accordingly.

Authors:Yuing Sun, Sam Addison Ankenbauer, Zhifan Guo, Yuchen Chen, Xiaojuan Ma, Liang He
Title: Rethinking Technological Solutions for Community-Based Older Adult Care: Insights from 'Older Partners' in China
Abstract:
Aging in place refers to the enabling of individuals to age comfortably and securely within their own homes and communities. Aging in place relies on robust infrastructure, prompting the development and implementation of both human-led care services and information and communication technologies to provide support. Through a long-term ethnographic study that includes semi-structured interviews with 24 stakeholders, we consider these human- and technology-driven care infrastructures for aging in place, examining their origins, deployment, interactions with older adults, and challenges. In doing so, we reconsider the value of these different forms of older adult care, highlighting the various issues associated with using, for instance, health monitoring technology or appointment scheduling systems to care for older adults aging in place. We suggest that technology should take a supportive, not substitutive role in older adult care infrastructure. Furthermore, we note that designing for aging in place should move beyond a narrow focus on independence in one's home to instead encompass the broader community and its dynamics.

Authors:Jiaxin An, Siqi Yi, Yao Lyu, Houjiang Liu, Yan Zhang
Title: Conversational Agents for Older Adults' Health: A Systematic Literature Review
Abstract:
There has been vast literature that studies Conversational Agents (CAs) in facilitating older adults' health. The vast and diverse studies warrants a comprehensive review that concludes the main findings and proposes research directions for future studies, while few literature review did it from human-computer interaction (HCI) perspective. In this study, we present a survey of existing studies on CAs for older adults' health. Through a systematic review of 72 papers, this work reviewed previously studied older adults' characteristics and analyzed participants' experiences and expectations of CAs for health. We found that (1) Past research has an increasing interest on chatbots and voice assistants and applied CA as multiple roles in older adults' health. (2) Older adults mainly showed low acceptance CAs for health due to various reasons, such as unstable effects, harm to independence, and privacy concerns. (3) Older adults expect CAs to be able to support multiple functions, to communicate using natural language, to be personalized, and to allow users full control. We also discuss the implications based on the findings.

Authors:Sharon Heung, Lucy Jiang, Shiri Azenkot, Aditya Vashistha
Title: "Ignorance is Not Bliss": Designing Personalized Moderation to Address Ableist Hate on Social Media
Abstract:
Disabled people on social media often experience ableist hate and microaggressions. Prior work has shown that platform moderation often fails to remove ableist hate leaving disabled users exposed to harmful content. This paper examines how personalized moderation can safeguard users from viewing ableist comments. During interviews and focus groups with 23 disabled social media users, we presented design probes to elicit perceptions on configuring their filters of ableist speech (e.g. intensity of ableism and types of ableism) and customizing the presentation of the ableist speech to mitigate the harm (e.g. AI rephrasing the comment and content warnings). We found that participants preferred configuring their filters through types of ableist speech and favored content warnings. We surface participants distrust in AI-based moderation, skepticism in AI's accuracy, and varied tolerances in viewing ableist hate. Finally we share design recommendations to support users' agency, mitigate harm from hate, and promote safety.

Authors:Ching Hei Cheng, Jonathan Eden, Denny Oetomo, Ying Tan
Title: Exploring Interference between Concurrent Skin Stretches
Abstract:
Proprioception is essential for coordinating human movements and enhancing the performance of assistive robotic devices. Skin stretch feedback, which closely aligns with natural proprioception mechanisms, presents a promising method for conveying proprioceptive information. To better understand the impact of interference on skin stretch perception, we conducted a user study with 30 participants that evaluated the effect of two simultaneous skin stretches on user perception. We observed that when participants experience simultaneous skin stretch stimuli, a masking effect occurs which deteriorates perception performance in the collocated skin stretch configurations. However, the perceived workload stays the same. These findings show that interference can affect the perception of skin stretch such that multi-channel skin stretch feedback designs should avoid locating modules in close proximity.

Authors:Alexandra Watkins, Ritam Ghosh, Evan Chow, Nilanjan Sarkar
Title: Immersive and Wearable Thermal Rendering for Augmented Reality
Abstract:
In augmented reality (AR), where digital content is overlaid onto the real world, realistic thermal feedback has been shown to enhance immersion. Yet current thermal feedback devices, heavily influenced by the needs of virtual reality, often hinder physical interactions and are ineffective for immersion in AR. To bridge this gap, we have identified three design considerations relevant for AR thermal feedback: indirect feedback to maintain dexterity, thermal passthrough to preserve real-world temperature perception, and spatiotemporal rendering for dynamic sensations. We then created a unique and innovative thermal feedback device that satisfies these criteria. Human subject experiments assessing perceptual sensitivity, object temperature matching, spatial pattern recognition, and moving thermal stimuli demonstrated the impact of our design, enabling realistic temperature discrimination, virtual object perception, and enhanced immersion. These findings demonstrate that carefully designed thermal feedback systems can bridge the sensory gap between physical and virtual interactions, enhancing AR realism and usability.

Authors:Sian Gooding, Lucia Lopez-Rivilla, Edward Grefenstette
Title: Writing as a testbed for open ended agents
Abstract:
Open-ended tasks are particularly challenging for LLMs due to the vast solution space, demanding both expansive exploration and adaptable strategies, especially when success lacks a clear, objective definition. Writing, with its vast solution space and subjective evaluation criteria, provides a compelling testbed for studying such problems. In this paper, we investigate the potential of LLMs to act as collaborative co-writers, capable of suggesting and implementing text improvements autonomously. We analyse three prominent LLMs - Gemini 1.5 Pro, Claude 3.5 Sonnet, and GPT-4o - focusing on how their action diversity, human alignment, and iterative improvement capabilities impact overall performance. This work establishes a framework for benchmarking autonomous writing agents and, more broadly, highlights fundamental challenges and potential solutions for building systems capable of excelling in diverse open-ended domains.

Authors:Amogh Inamdar, Uzay Macar, Michel Vazirani, Michael Tarnow, Zarina Mustapha, Natalia Dittren, Sam Sadeh, Nakul Verma, Ansaf Salleb-Aouissi
Title: LogicLearner: A Tool for the Guided Practice of Propositional Logic Proofs
Abstract:
The study of propositional logic -- fundamental to the theory of computing -- is a cornerstone of the undergraduate computer science curriculum. Learning to solve logical proofs requires repeated guided practice, but undergraduate students often lack access to on-demand tutoring in a judgment-free environment. In this work, we highlight the need for guided practice tools in undergraduate mathematics education and outline the desiderata of an effective practice tool. We accordingly develop LogicLearner, a web application for guided logic proof practice. LogicLearner consists of an interface to attempt logic proofs step-by-step and an automated proof solver to generate solutions on the fly, allowing users to request guidance as needed. We pilot LogicLearner as a practice tool in two semesters of an undergraduate discrete mathematics course and receive strongly positive feedback for usability and pedagogical value in student surveys. To the best of our knowledge, LogicLearner is the only learning tool that provides an end-to-end practice environment for logic proofs with immediate, judgment-free feedback.

Authors:Jungjae Lee, Dongjae Lee, Chihun Choi, Youngmin Im, Jaeyoung Wi, Kihong Heo, Sangeun Oh, Sunjae Lee, Insik Shin
Title: VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification
Abstract:
Large Foundation Models (LFMs) have unlocked new possibilities in human-computer interaction, particularly with the rise of mobile Graphical User Interface (GUI) Agents capable of interacting with mobile GUIs. These agents allow users to automate complex mobile tasks through simple natural language instructions. However, the inherent probabilistic nature of LFMs, coupled with the ambiguity and context-dependence of mobile tasks, makes LFM-based automation unreliable and prone to errors. To address this critical challenge, we introduce VeriSafe Agent (VSA): a formal verification system that serves as a logically grounded safeguard for Mobile GUI Agents. VSA deterministically ensures that an agent's actions strictly align with user intent before executing the action. At its core, VSA introduces a novel autoformalization technique that translates natural language user instructions into a formally verifiable specification. This enables runtime, rule-based verification of agent's actions, detecting erroneous actions even before they take effect. To the best of our knowledge, VSA is the first attempt to bring the rigor of formal verification to GUI agents, bridging the gap between LFM-driven actions and formal software verification. We implement VSA using off-the-shelf LFM services (GPT-4o) and evaluate its performance on 300 user instructions across 18 widely used mobile apps. The results demonstrate that VSA achieves 94.33%-98.33% accuracy in verifying agent actions, outperforming existing LFM-based verification methods by 30.00%-16.33%, and increases the GUI agent's task completion rate by 90%-130%.

Authors:Calvin Bao, Yow-Ting Shiue, Marine Carpuat, Joel Chan
Title: Words as Bridges: Exploring Computational Support for Cross-Disciplinary Translation Work
Abstract:
Scholars often explore literature outside of their home community of study. This exploration process is frequently hampered by field-specific jargon. Past computational work often focuses on supporting translation work by removing jargon through simplification and summarization; here, we explore a different approach that preserves jargon as useful bridges to new conceptual spaces. Specifically, we cast different scholarly domains as different language-using communities, and explore how to adapt techniques from unsupervised cross-lingual alignment of word embeddings to explore conceptual alignments between domain-specific word embedding spaces.We developed a prototype cross-domain search engine that uses aligned domain-specific embeddings to support conceptual exploration, and tested this prototype in two case studies. We discuss qualitative insights into the promises and pitfalls of this approach to translation work, and suggest design insights for future interfaces that provide computational support for cross-domain information seeking.

Authors:Yiwen Xu, Monideep Chakraborti, Tianyi Zhang, Katelyn Eng, Aanchan Mohan, Mirjana Prpa
Title: Your voice is your voice: Supporting Self-expression through Speech Generation and LLMs in Augmented and Alternative Communication
Abstract:
In this paper, we present Speak Ease: an augmentative and alternative communication (AAC) system to support users' expressivity by integrating multimodal input, including text, voice, and contextual cues (conversational partner and emotional tone), with large language models (LLMs). Speak Ease combines automatic speech recognition (ASR), context-aware LLM-based outputs, and personalized text-to-speech technologies to enable more personalized, natural-sounding, and expressive communication. Through an exploratory feasibility study and focus group evaluation with speech and language pathologists (SLPs), we assessed Speak Ease's potential to enable expressivity in AAC. The findings highlight the priorities and needs of AAC users and the system's ability to enhance user expressivity by supporting more personalized and contextually relevant communication. This work provides insights into the use of multimodal inputs and LLM-driven features to improve AAC systems and support expressivity.

Authors:Dongsheng Yang, Qianying Liu, Wataru Sato, Takashi Minato, Chaoran Liu, Shin'ya Nishida
Title: HAPI: A Model for Learning Robot Facial Expressions from Human Preferences
Abstract:
Automatic robotic facial expression generation is crucial for human-robot interaction, as handcrafted methods based on fixed joint configurations often yield rigid and unnatural behaviors. Although recent automated techniques reduce the need for manual tuning, they tend to fall short by not adequately bridging the gap between human preferences and model predictions-resulting in a deficiency of nuanced and realistic expressions due to limited degrees of freedom and insufficient perceptual integration. In this work, we propose a novel learning-to-rank framework that leverages human feedback to address this discrepancy and enhanced the expressiveness of robotic faces. Specifically, we conduct pairwise comparison annotations to collect human preference data and develop the Human Affective Pairwise Impressions (HAPI) model, a Siamese RankNet-based approach that refines expression evaluation. Results obtained via Bayesian Optimization and online expression survey on a 35-DOF android platform demonstrate that our approach produces significantly more realistic and socially resonant expressions of Anger, Happiness, and Surprise than those generated by baseline and expert-designed methods. This confirms that our framework effectively bridges the gap between human preferences and model predictions while robustly aligning robotic expression generation with human affective responses.

Authors:Jiin Choi, Yugyeong Jang, Kyung Hoon Hyun
Title: Toward AI-driven Multimodal Interfaces for Industrial CAD Modeling
Abstract:
AI-driven multimodal interfaces have the potential to revolutionize industrial 3D CAD modeling by improving workflow efficiency and user experience. However, the integration of these technologies remains challenging due to software constraints, user adoption barriers, and limitations in AI model adaptability. This paper explores the role of multimodal AI in CAD environments, examining its current applications, key challenges, and future research directions. We analyze Bayesian workflow inference, multimodal input strategies, and collaborative AI-driven interfaces to identify areas where AI can enhance CAD design processes while addressing usability concerns in industrial manufacturing settings.

Authors:Nilay Kushawaha, Radan Pathan, Niccolò Pagliarani, Matteo Cianchetti, Egidio Falotico
Title: Adaptive Drift Compensation for Soft Sensorized Finger Using Continual Learning
Abstract:
Strain sensors are gaining popularity in soft robotics for acquiring tactile data due to their flexibility and ease of integration. Tactile sensing plays a critical role in soft grippers, enabling them to safely interact with unstructured environments and precisely detect object properties. However, a significant challenge with these systems is their high non-linearity, time-varying behavior, and long-term signal drift. In this paper, we introduce a continual learning (CL) approach to model a soft finger equipped with piezoelectric-based strain sensors for proprioception. To tackle the aforementioned challenges, we propose an adaptive CL algorithm that integrates a Long Short-Term Memory (LSTM) network with a memory buffer for rehearsal and includes a regularization term to keep the model's decision boundary close to the base signal while adapting to time-varying drift. We conduct nine different experiments, resetting the entire setup each time to demonstrate signal drift. We also benchmark our algorithm against two other methods and conduct an ablation study to assess the impact of different components on the overall performance.

Authors:Claudia Flores-Saviaga, Benjamin V. Hanrahan, Kashif Imteyaz, Steven Clarke, Saiph Savage
Title: The Impact of Generative AI Coding Assistants on Developers Who Are Visually Impaired
Abstract:
The rapid adoption of generative AI in software development has impacted the industry, yet its effects on developers with visual impairments remain largely unexplored. To address this gap, we used an Activity Theory framework to examine how developers with visual impairments interact with AI coding assistants. For this purpose, we conducted a study where developers who are visually impaired completed a series of programming tasks using a generative AI coding assistant. We uncovered that, while participants found the AI assistant beneficial and reported significant advantages, they also highlighted accessibility challenges. Specifically, the AI coding assistant often exacerbated existing accessibility barriers and introduced new challenges. For example, it overwhelmed users with an excessive number of suggestions, leading developers who are visually impaired to express a desire for ``AI timeouts.'' Additionally, the generative AI coding assistant made it more difficult for developers to switch contexts between the AI-generated content and their own code. Despite these challenges, participants were optimistic about the potential of AI coding assistants to transform the coding experience for developers with visual impairments. Our findings emphasize the need to apply activity-centered design principles to generative AI assistants, ensuring they better align with user behaviors and address specific accessibility needs. This approach can enable the assistants to provide more intuitive, inclusive, and effective experiences, while also contributing to the broader goal of enhancing accessibility in software development.

Authors:Tsvetomila Mihaylova, Stefan Reitmann, Elin A. Topp, Ville Kyrki
Title: Injecting Conflict Situations in Autonomous Driving Simulation using CARLA
Abstract:
Simulation of conflict situations for autonomous driving research is crucial for understanding and managing interactions between Automated Vehicles (AVs) and human drivers. This paper presents a set of exemplary conflict scenarios in CARLA that arise in shared autonomy settings, where both AVs and human drivers must navigate complex traffic environments. We explore various conflict situations, focusing on the impact of driver behavior and decision-making processes on overall traffic safety and efficiency. We build a simple extendable toolkit for situation awareness research, in which the implemented conflicts can be demonstrated.

Authors:Zhijin Meng, Mohammed Althubyani, Shengyuan Xie, Imran Razzak, Eduardo B. Sandoval, Mahdi Bamdad, Francisco Cruz
Title: PERCY: Personal Emotional Robotic Conversational System
Abstract:
Traditional rule-based conversational robots, constrained by predefined scripts and static response mappings, fundamentally lack adaptability for personalized, long-term human interaction. While Large Language Models (LLMs) like GPT-4 have revolutionized conversational AI through open-domain capabilities, current social robots implementing LLMs still lack emotional awareness and continuous personalization. This dual limitation hinders their ability to sustain engagement across multiple interaction sessions. We bridge this gap with PERCY (Personal Emotional Robotic Conversational sYstem), a system designed to enable open-domain, multi-turn dialogues by dynamically analyzing users' real-time facial expressions and vocabulary to tailor responses based on their emotional state. Built on a ROS-based multimodal framework, PERCY integrates a fine-tuned GPT-4 reasoning engine, combining textual sentiment analysis with visual emotional cues to accurately assess and respond to user emotions. We evaluated PERCY's performance through various dialogue quality metrics, showing strong coherence, relevance, and diversity. Human evaluations revealed PERCY's superior personalization and comparable naturalness to other models. This work highlights the potential for integrating advanced multimodal perception and personalization in social robot dialogue systems.

Authors:Tong Zhang, Mengao Zhang, Wei Yan Low, X. Jessie Yang, Boyang Li
Title: Conversational Explanations: Discussing Explainable AI with Non-AI Experts
Abstract:
Explainable AI (XAI) aims to provide insights into the decisions made by AI models. To date, most XAI approaches provide only one-time, static explanations, which cannot cater to users' diverse knowledge levels and information needs. Conversational explanations have been proposed as an effective method to customize XAI explanations. However, building conversational explanation systems is hindered by the scarcity of training data. Training with synthetic data faces two main challenges: lack of data diversity and hallucination in the generated data. To alleviate these issues, we introduce a repetition penalty to promote data diversity and exploit a hallucination detector to filter out untruthful synthetic conversation turns. We conducted both automatic and human evaluations on the proposed system, fEw-shot Multi-round ConvErsational Explanation (EMCEE). For automatic evaluation, EMCEE achieves relative improvements of 81.6% in BLEU and 80.5% in ROUGE compared to the baselines. EMCEE also mitigates the degeneration of data quality caused by training on synthetic data. In human evaluations (N=60), EMCEE outperforms baseline models and the control group in improving users' comprehension, acceptance, trust, and collaboration with static explanations by large margins. Through a fine-grained analysis of model responses, we further demonstrate that training on self-generated synthetic data improves the model's ability to generate more truthful and understandable answers, leading to better user interactions. To the best of our knowledge, this is the first conversational explanation method that can answer free-form user questions following static explanations.

Authors:Nikolai Bahr, Christoph Zetzsche, Jaime Maldonado, Kerstin Schill
Title: Cause-effect perception in an object place task
Abstract:
Algorithmic causal discovery is based on formal reasoning and provably converges toward the optimal solution. However, since some of the underlying assumptions are often not met in practice no applications for autonomous everyday life competence are yet available. Humans on the other hand possess full everyday competence and develop cognitive models in a data efficient manner with the ability to transfer knowledge between and to new situations. Here we investigate the causal discovery capabilities of humans in an object place task in virtual reality (VR) with haptic feedback and compare the results to the state of the art causal discovery algorithms FGES, PC and FCI. In addition we use the algorithms to analyze causal relations between sensory information and the kinematic parameters of human behavior. Our findings show that the majority of participants were able to determine which variables are causally related. This is in line with causal discovery algorithms like PC, which recover causal dependencies in the first step. However, unlike such algorithms which can identify causes and effects in our test configuration, humans are unsure in determining a causal direction. Regarding the relation between the sensory information provided to the participants and their placing actions (i.e. their kinematic parameters) the data yields a surprising dissociation of the subjects knowledge and the sensorimotor level. Knowledge of the cause-effect pairs, though undirected, should suffice to improve subject's movements. Yet a detailed causal analysis provides little evidence for any such influence. This, together with the reports of the participants, implies that instead of exploiting their consciously perceived information they leave it to the sensorimotor level to control the movement.

Authors:Hatice Gurdil, Hatice Ozlem Anadol, Yesim Beril Soguksu
Title: The Use of Artificial Intelligence Tools in Assessing Content Validity: A Comparative Study with Human Experts
Abstract:
In this study, it was investigated whether AI evaluators assess the content validity of B1-level English reading comprehension test items in a manner similar to human evaluators. A 25-item multiple-choice test was developed, and these test items were evaluated by four human and four AI evaluators. No statistically significant difference was found between the scores given by human and AI evaluators, with similar evaluation trends observed. The Content Validity Ratio (CVR) and the Item Content Validity Index (I-CVI) were calculated and analyzed using the Wilcoxon Signed-Rank Test, with no statistically significant difference. The findings revealed that in some cases, AI evaluators could replace human evaluators. However, differences in specific items were thought to arise from varying interpretations of the evaluation criteria. Ensuring linguistic clarity and clearly defining criteria could contribute to more consistent evaluations. In this regard, the development of hybrid evaluation systems, in which AI technologies are used alongside human experts, is recommended.

Authors:Ho Chit Siu, Jaime D. Peña, Yutai Zhou, Ross E. Allen
Title: In Pursuit of Predictive Models of Human Preferences Toward AI Teammates
Abstract:
We seek measurable properties of AI agents that make them better or worse teammates from the subjective perspective of human collaborators. Our experiments use the cooperative card game Hanabi -- a common benchmark for AI-teaming research. We first evaluate AI agents on a set of objective metrics based on task performance, information theory, and game theory, which are measurable without human interaction. Next, we evaluate subjective human preferences toward AI teammates in a large-scale (N=241) human-AI teaming experiment. Finally, we correlate the AI-only objective metrics with the human subjective preferences. Our results refute common assumptions from prior literature on reinforcement learning, revealing new correlations between AI behaviors and human preferences. We find that the final game score a human-AI team achieves is less predictive of human preferences than esoteric measures of AI action diversity, strategic dominance, and ability to team with other AI. In the future, these correlations may help shape reward functions for training human-collaborative AI.

Authors:Sara Beschi, Davide Falessi, Silvia Golia, Angela Locoro
Title: Characterizing Data Visualization Literacy: a Systematic Literature Review
Abstract:
With the advent of the data era, and of new, more intelligent interfaces for supporting decision making, there is a growing need to define, model and assess human ability and data visualizations usability for a better encoding and decoding of data patterns. Data Visualization Literacy (DVL) is the ability of encoding and decoding data into and from a visual language. Although this ability and its measurement are crucial for advancing human knowledge and decision capacity, they have seldom been investigated, let alone systematically. To address this gap, this paper presents a systematic literature review comprising 43 reports on DVL, analyzed using the PRISMA methodology. Our results include the identification of the purposes of DVL, its satellite aspects, the models proposed, and the assessments designed to evaluate the degree of DVL of people. Eventually, we devise many research directions including, among the most challenging, the definition of a (standard) unifying construct of DVL.

Authors:Youssef Abdalla, Elia Gatti, Mine Orlu, Marianna Obrist
Title: Sensory-driven microinterventions for improved health and wellbeing
Abstract:
The five senses are gateways to our wellbeing and their decline is considered a significant public health challenge which is linked to multiple conditions that contribute significantly to morbidity and mortality. Modern technology, with its ubiquitous nature and fast data processing has the ability to leverage the power of the senses to transform our approach to day to day healthcare, with positive effects on our quality of life. Here, we introduce the idea of sensory-driven microinterventions for preventative, personalised healthcare. Microinterventions are targeted, timely, minimally invasive strategies that seamlessly integrate into our daily life. This idea harnesses human's sensory capabilities, leverages technological advances in sensory stimulation and real-time processing ability for sensing the senses. The collection of sensory data from our continuous interaction with technology - for example the tone of voice, gait movement, smart home behaviour - opens up a shift towards personalised technology-enabled, sensory-focused healthcare interventions, coupled with the potential of early detection and timely treatment of sensory deficits that can signal critical health insights, especially for neurodegenerative diseases such as Parkinson's disease.

Authors:Jiin Choi, Seung Won Lee, Kyung Hoon Hyun
Title: GenPara: Enhancing the 3D Design Editing Process by Inferring Users' Regions of Interest with Text-Conditional Shape Parameters
Abstract:
In 3D design, specifying design objectives and visualizing complex shapes through text alone proves to be a significant challenge. Although advancements in 3D GenAI have significantly enhanced part assembly and the creation of high-quality 3D designs, many systems still to dynamically generate and edit design elements based on the shape parameters. To bridge this gap, we propose GenPara, an interactive 3D design editing system that leverages text-conditional shape parameters of part-aware 3D designs and visualizes design space within the Exploration Map and Design Versioning Tree. Additionally, among the various shape parameters generated by LLM, the system extracts and provides design outcomes within the user's regions of interest based on Bayesian inference. A user study N = 16 revealed that \textit{GenPara} enhanced the comprehension and management of designers with text-conditional shape parameters, streamlining design exploration and concretization. This improvement boosted efficiency and creativity of the 3D design process.

Authors:Matthew Zent, Seraphina Yong, Dhruv Bala, Stevie Chancellor, Joseph A. Konstan, Loren Terveen, Svetlana Yarosh
Title: Beyond the Individual: A Community-Engaged Framework for Ethical Online Community Research
Abstract:
Online community research routinely poses minimal risk to individuals, but does the same hold true for online communities? In response to high-profile breaches of online community trust and increased debate in the social computing research community on the ethics of online community research, this paper investigates community-level harms and benefits of research. Through 9 participatory-inspired workshops with four critical online communities (Wikipedia, InTheRooms, CaringBridge, and r/AskHistorians) we found researchers should engage more directly with communities' primary purpose by rationalizing their methods and contributions in the context of community goals to equalize the beneficiaries of community research. To facilitate deeper alignment of these expectations, we present the FACTORS (Functions for Action with Communities: Teaching, Overseeing, Reciprocating, and Sustaining) framework for ethical online community research. Finally, we reflect on our findings by providing implications for researchers and online communities to identify and implement functions for navigating community-level harms and benefits.

Authors:Xinyu Jessica Wang, Christine Lee, Bilge Mutlu
Title: LearnMate: Enhancing Online Education with LLM-Powered Personalized Learning Plans and Support
Abstract:
With the increasing prevalence of online learning, adapting education to diverse learner needs remains a persistent challenge. Recent advancements in artificial intelligence (AI), particularly large language models (LLMs), promise powerful tools and capabilities to enhance personalized learning in online educational environments. In this work, we explore how LLMs can improve personalized learning experiences by catering to individual user needs toward enhancing the overall quality of online education. We designed personalization guidelines based on the growing literature on personalized learning to ground LLMs in generating tailored learning plans. To operationalize these guidelines, we implemented LearnMate, an LLM-based system that generates personalized learning plans and provides users with real-time learning support. We discuss the implications and future directions of this work, aiming to move beyond the traditional one-size-fits-all approach by integrating LLM-based personalized support into online learning environments.

Authors:Thu Tran, Kenny Tsu Wei Choo, Shaohui Foong, Hitesh Bhardwaj, Shane Kyi Hla Win, Wei Jun Ang, Kenneth Goh, Rajesh Krishna Balan
Title: Analyzing Swimming Performance Using Drone Captured Aerial Videos
Abstract:
Monitoring swimmer performance is crucial for improving training and enhancing athletic techniques. Traditional methods for tracking swimmers, such as above-water and underwater cameras, face limitations due to the need for multiple cameras and obstructions from water splashes. This paper presents a novel approach for tracking swimmers using a moving UAV. The proposed system employs a UAV equipped with a high-resolution camera to capture aerial footage of the swimmers. The footage is then processed using computer vision algorithms to extract the swimmers' positions and movements. This approach offers several advantages, including single camera use and comprehensive coverage. The system's accuracy is evaluated with both training and in competition videos. The results demonstrate the system's ability to accurately track swimmers' movements, limb angles, stroke duration and velocity with the maximum error of 0.3 seconds and 0.35~m/s for stroke duration and velocity, respectively.

Authors:Wonjung Kim, Kenny Tsu Wei Choo, Youngki Lee, Archan Misra, Rajesh Krishna Balan
Title: Empath-D: VR-based Empathetic App Design for Accessibility
Abstract:
With app-based interaction increasingly permeating all aspects of daily living, it is essential to ensure that apps are designed to be \emph{inclusive} and are usable by a wider audience such as the elderly, with various impairments (e.g., visual, audio and motor). We propose \names, a system that fosters empathetic design, by allowing app designers, \emph{in-situ}, to rapidly evaluate the usability of their apps, from the perspective of impaired users. To provide a truly authentic experience, \name carefully orchestrates the interaction between a smartphone and a VR device, allowing the user to experience simulated impairments in a virtual world while interacting naturally with the app, using a real smartphone. By carefully orchestrating the VR-smartphone interaction, \name tackles challenges such as preserving low-latency app interaction, accurate visualization of hand movement and low-overhead perturbation of I/O streams. Experimental results show that user interaction with \name is comparable (both in accuracy and user perception) to real-world app usage, and that it can simulate impairment effects as effectively as a custom hardware simulator.

Authors:Kenny Tsu Wei Choo, Rajesh Krishna Balan, Youngki Lee
Title: Examining Augmented Virtuality Impairment Simulation for Mobile App Accessibility Design
Abstract:
With mobile apps rapidly permeating all aspects of daily living with use by all segments of the population, it is crucial to support the evaluation of app usability for specific impaired users to improve app accessibility. In this work, we examine the effects of using our \textit{augmented virtuality} impairment simulation system--\textit{Empath-D}--to support experienced designer-developers to redesign a mockup of commonly used mobile application for cataract-impaired users, comparing this with existing tools that aid designing for accessibility. We show that the use of augmented virtuality for assessing usability supports enhanced usability challenge identification, finding more defects and doing so more accurately than with existing methods. Through our user interviews, we also show that augmented virtuality impairment simulation supports realistic interaction and evaluation to provide a concrete understanding over the usability challenges that impaired users face, and complements the existing guidelines-based approaches meant for general accessibility.

Authors:Christine Lee, Jihye Choi, Bilge Mutlu
Title: MAP: Multi-user Personalization with Collaborative LLM-powered Agents
Abstract:
The widespread adoption of Large Language Models (LLMs) and LLM-powered agents in multi-user settings underscores the need for reliable, usable methods to accommodate diverse preferences and resolve conflicting directives. Drawing on conflict resolution theory, we introduce a user-centered workflow for multi-user personalization comprising three stages: Reflection, Analysis, and Feedback. We then present MAP -- a \textbf{M}ulti-\textbf{A}gent system for multi-user \textbf{P}ersonalization -- to operationalize this workflow. By delegating subtasks to specialized agents, MAP (1) retrieves and reflects on relevant user information, while enhancing reliability through agent-to-agent interactions, (2) provides detailed analysis for improved transparency and usability, and (3) integrates user feedback to iteratively refine results. Our user study findings (n=12) highlight MAP's effectiveness and usability for conflict resolution while emphasizing the importance of user involvement in resolution verification and failure management. This work highlights the potential of multi-agent systems to implement user-centered, multi-user personalization workflows and concludes by offering insights for personalization in multi-user contexts.

Authors:Matt Gottsacker, Nels Numan, Anthony Steed, Gerd Bruder, Gregory F. Welch, Steve Feiner
Title: Decoupled Hands: An Approach for Aligning Perspectives in Collaborative Mixed Reality
Abstract:
When collaborating relative to a shared 3D virtual object in mixed reality (MR), users may experience communication issues arising from differences in perspective. These issues include occlusion (e.g., one user not being able to see what the other is referring to) and inefficient spatial references (e.g., "to the left of this" may be confusing when users are positioned opposite to each other). This paper presents a novel technique for automatic perspective alignment in collaborative MR involving co-located interaction centered around a shared virtual object. To align one user's perspective on the object with a collaborator's, a local copy of the object and any other virtual elements that reference it (e.g., the collaborator's hands) are dynamically transformed. The technique does not require virtual travel and preserves face-to-face interaction. We created a prototype application to demonstrate our technique and present an evaluation methodology for related MR collaboration and perspective alignment scenarios.

Authors:Chi-Lan Yang, Alarith Uhde, Naomi Yamashita, Hideaki Kuzuoka
Title: Understanding and Supporting Peer Review Using AI-reframed Positive Summary
Abstract:
While peer review enhances writing and research quality, harsh feedback can frustrate and demotivate authors. Hence, it is essential to explore how critiques should be delivered to motivate authors and enable them to keep iterating their work. In this study, we explored the impact of appending an automatically generated positive summary to the peer reviews of a writing task, alongside varying levels of overall evaluations (high vs. low), on authors' feedback reception, revision outcomes, and motivation to revise. Through a 2x2 online experiment with 137 participants, we found that adding an AI-reframed positive summary to otherwise harsh feedback increased authors' critique acceptance, whereas low overall evaluations of their work led to increased revision efforts. We discuss the implications of using AI in peer feedback, focusing on how AI-driven critiques can influence critique acceptance and support research communities in fostering productive and friendly peer feedback practices.

Authors:Chen Liang, Yuxuan Liu, Martez Mott, Anhong Guo
Title: HandProxy: Expanding the Affordances of Speech Interfaces in Immersive Environments with a Virtual Proxy Hand
Abstract:
Hand interactions are increasingly used as the primary input modality in immersive environments, but they are not always feasible due to situational impairments, motor limitations, and environmental constraints. Speech interfaces have been explored as an alternative to hand input in research and commercial solutions, but are limited to initiating basic hand gestures and system controls. We introduce HandProxy, a system that expands the affordances of speech interfaces to support expressive hand interactions. Instead of relying on predefined speech commands directly mapped to possible interactions, HandProxy enables users to control the movement of a virtual hand as an interaction proxy, allowing them to describe the intended interactions naturally while the system translates speech into a sequence of hand controls for real-time execution. A user study with 20 participants demonstrated that HandProxy effectively enabled diverse hand interactions in virtual environments, achieving a 100% task completion rate with an average of 1.09 attempts per speech command and 91.8% command execution accuracy, while supporting flexible, natural speech input with varying levels of control and granularity.

Authors:Qirui Sun, Yunyi Ni, Teli Yuan, Jingjing Zhang, Fan Yang, Zhihao Yao, Haipeng Mi
Title: Spiritus: An AI-Assisted Tool for Creating 2D Characters and Animations
Abstract:
This research presents Spiritus, an AI-assisted creation tool designed to streamline 2D character animation creation while enhancing creative flexibility. By integrating natural language processing and diffusion models, users can efficiently transform natural language descriptions into personalized 2D characters and animations. The system employs automated segmentation, layered costume techniques, and dynamic mesh-skeleton binding solutions to support flexible adaptation of complex costumes and additional components. Spiritus further achieves real-time animation generation and efficient animation resource reuse between characters through the integration of BVH data and motion diffusion models. Experimental results demonstrate Spiritus's effectiveness in reducing technical barriers, enhancing creative freedom, and supporting resource universality. Future work will focus on optimizing user experience and further exploring the system's human-computer collaboration potential.

Authors:Qianyu Liu, Xinran Li, Xiaocong Du, Quan Li
Title: TSConnect: An Enhanced MOOC Platform for Bridging Communication Gaps Between Instructors and Students in Light of the Curse of Knowledge
Abstract:
Knowledge dissemination in educational settings is profoundly influenced by the curse of knowledge, a cognitive bias that causes experts to underestimate the challenges faced by learners due to their own in-depth understanding of the subject. This bias can hinder effective knowledge transfer and pedagogical effectiveness, and may be exacerbated by inadequate instructor-student communication. To encourage more effective feedback and promote empathy, we introduce TSConnect, a bias-aware, adaptable interactive MOOC (Massive Open Online Course) learning system, informed by a need-finding survey involving 129 students and 6 instructors. TSConnect integrates instructors, students, and Artificial Intelligence (AI) into a cohesive platform, facilitating diverse and targeted communication channels while addressing previously overlooked information needs. A notable feature is its dynamic knowledge graph, which enhances learning support and fosters a more interconnected educational experience. We conducted a between-subjects user study with 30 students comparing TSConnect to a baseline system. Results indicate that TSConnect significantly encourages students to provide more feedback to instructors. Additionally, interviews with 4 instructors reveal insights into how they interpret and respond to this feedback, potentially leading to improvements in teaching strategies and the development of broader pedagogical skills.

Authors:Ryota Takamido, Jun Ota, Hiroki Nakamoto
Title: PassAI: explainable artificial intelligence algorithm for soccer pass analysis using multimodal information resources
Abstract:
This study developed a new explainable artificial intelligence algorithm called PassAI, which classifies successful or failed passes in a soccer game and explains its rationale using both tracking and passer's seasonal stats information. This study aimed to address two primary challenges faced by artificial intelligence and machine learning algorithms in the sports domain: how to use different modality data for the analysis and how to explain the rationale of the outcome from multimodal perspectives. To address these challenges, PassAI has two processing streams for multimodal information: tracking image data and passer's stats and classifying pass success and failure. After completing the classification, it provides a rationale by either calculating the relative contribution between the different modality data or providing more detailed contribution factors within the modality. The results of the experiment with 6,349 passes of data obtained from professional soccer games revealed that PassAI showed higher classification performance than state-of-the-art algorithms by >5% and could visualize the rationale of the pass success/failure for both tracking and stats data. These results highlight the importance of using multimodality data in the sports domain to increase the performance of the artificial intelligence algorithm and explainability of the outcomes.

Authors:Yongle Zhang, Phuong-Anh Nguyen-Le, Kriti Singh, Ge Gao
Title: The News Says, the Bot Says: How Immigrants and Locals Differ in Chatbot-Facilitated News Reading
Abstract:
News reading helps individuals stay informed about events and developments in society. Local residents and new immigrants often approach the same news differently, prompting the question of how technology, such as LLM-powered chatbots, can best enhance a reader-oriented news experience. The current paper presents an empirical study involving 144 participants from three groups in Virginia, United States: local residents born and raised there (N=48), Chinese immigrants (N=48), and Vietnamese immigrants (N=48). All participants read local housing news with the assistance of the Copilot chatbot. We collected data on each participant's Q&A interactions with the chatbot, along with their takeaways from news reading. While engaging with the news content, participants in both immigrant groups asked the chatbot fewer analytical questions than the local group. They also demonstrated a greater tendency to rely on the chatbot when formulating practical takeaways. These findings offer insights into technology design that aims to serve diverse news readers.

Authors:Lan Gao, Elana B Blinder, Abigail Barnes, Kevin Song, Tamara Clegg, Jessica Vitak, Marshini Chetty
Title: Creating and Evaluating Privacy and Security Micro-Lessons for Elementary School Children
Abstract:
The growing use of technology in K--8 classrooms highlights a parallel need for formal learning opportunities aimed at helping children use technology safely and protect their personal information. Even the youngest students are now using tablets, laptops, and apps to support their learning; however, there are limited curricular materials available for elementary and middle school children on digital privacy and security topics. To bridge this gap, we developed a series of micro-lessons to help K--8 children learn about digital privacy and security at school. We first conducted a formative study by interviewing elementary school teachers to identify the design needs for digital privacy and security lessons. We then developed micro-lessons -- multiple 15-20 minute activities designed to be easily inserted into the existing curriculum -- using a co-design approach with multiple rounds of developing and revising the micro-lessons in collaboration with teachers. Throughout the process, we conducted evaluation sessions where teachers implemented or reviewed the micro-lessons. Our study identifies strengths, challenges, and teachers' tailoring strategies when incorporating micro-lessons for K--8 digital privacy and security topics, providing design implications for facilitating learning about these topics in school classrooms.

Authors:Tianyang Wen, Xucheng Zhang, Zhirong Wan, Jing Zhao, Yicheng Zhu, Ning Su, Xiaolan Peng, Jin Huang, Wei Sun, Feng Tian, Franklin Mingzhe Li
Title: PANDA: Parkinson's Assistance and Notification Driving Aid
Abstract:
Parkinson's Disease (PD) significantly impacts driving abilities, often leading to early driving cessation or accidents due to reduced motor control and increasing reaction times. To diminish the impact of these symptoms, we developed PANDA (Parkinson's Assistance and Notification Driving Aid), a multi-modality real-time alert system designed to monitor driving patterns continuously and provide immediate alerts for irregular driving behaviors, enhancing driver safety of individuals with PD. The system was developed through a participatory design process with 9 people with PD and 13 non-PD individuals using a driving simulator, which allowed us to identify critical design characteristics and collect detailed data on driving behavior. A user study involving individuals with PD evaluated the effectiveness of PANDA, exploring optimal strategies for delivering alerts and ensuring they are timely and helpful. Our findings demonstrate that PANDA has the potential to enhance the driving safety of individuals with PD, offering a valuable tool for maintaining independence and confidence behind the wheel.

Authors:Panagiotis Kourtesis, Andrea Lizarraga, Sarah E. MacPherson
Title: Immersive Virtual Reality Assessments of Working Memory and Psychomotor Skills: A Comparison between Immersive and Non-Immersive Assessments
Abstract:
Objective: Immersive virtual reality (VR) enhances ecologically validity and facilitates intuitive and ergonomic hand interactions for performing neuropsychological assessments. However, its comparability to traditional computerized methods remains unclear. This study investigates the convergent validity, user experience, and usability of VR-based versus PC-based assessments of short-term and working memory, and psychomotor skills, while also examining how demographic and IT-related skills influence performance in both modalities. Methods: Sixty-six participants performed the Digit Span Task (DST), Corsi Block Task (CBT), and Deary-Liewald Reaction Time Task (DLRTT) in both VR- and PC-based formats. Participants' experience in using computers and smartphones, and playing videogames, was considered. User experience and system usability of the formats were also evaluated. Results: While performance on DST was similar across modalities, PC assessments enabled better performance on CBT and faster reaction times in DLRTT. Moderate-to-strong correlations between VR and PC versions supported convergent validity. Regression analyses revealed that performance on PC versions was influenced by age, computing, and gaming experience, whereas performance on VR versions was largely independent of these factors, except for gaming experience predicting performance on CBT backward recall. Moreover, VR assessments received higher ratings for user experience and usability than PC-based assessments. Conclusion: Immersive VR assessments provide an engaging alternative to traditional computerized methods, with minimal reliance on prior IT experience and demographic factors. This resilience to individual differences suggests that VR may offer a more equitable and accessible platform for cognitive assessment. Future research should explore the long-term reliability of VR-based assessments.

Authors:Kazuya Izumi, Shuhey Koyama, Yoichi Ochiai
Title: AnimeGaze: Real-Time Mutual Gaze Synthesis for Anime-Style Avatars in Physical Environments via Behind-Display Camera
Abstract:
Avatars on displays lack the ability to engage with the physical environment through gaze. To address this limitation, we propose a gaze synthesis method that enables animated avatars to establish gaze communication with the physical environment using a camera-behind-the-display system. The system uses a display that rapidly alternates between visible and transparent states. During the transparent state, a camera positioned behind the display captures the physical environment. This configuration physically aligns the position of the avatar's eyes with the camera, enabling two-way gaze communication with people and objects in the physical environment. Building on this system, we developed a framework for mutual gaze communication between avatars and people. The framework detects the user's gaze and dynamically synthesizes the avatar's gaze towards people or objects in the environment. This capability was integrated into an AI agent system to generate real-time, context-aware gaze behaviors during conversations, enabling more seamless and natural interactions. To evaluate the system, we conducted a user study to assess its effectiveness in supporting physical gaze awareness and generating human-like gaze behaviors. The results show that the behind-display approach significantly enhances the user's perception of being observed and attended to by the avatar. By bridging the gap between virtual avatars and the physical environment through enhanced gaze interactions, our system offers a promising avenue for more immersive and human-like AI-mediated communication in everyday environments.

Authors:Deepika Raman, Nada Madkour, Evan R. Murphy, Krystal Jackson, Jessica Newman
Title: Intolerable Risk Threshold Recommendations for Artificial Intelligence
Abstract:
Frontier AI models -- highly capable foundation models at the cutting edge of AI development -- may pose severe risks to public safety, human rights, economic stability, and societal value in the coming years. These risks could arise from deliberate adversarial misuse, system failures, unintended cascading effects, or simultaneous failures across multiple models. In response to such risks, at the AI Seoul Summit in May 2024, 16 global AI industry organizations signed the Frontier AI Safety Commitments, and 27 nations and the EU issued a declaration on their intent to define these thresholds. To fulfill these commitments, organizations must determine and disclose ``thresholds at which severe risks posed by a model or system, unless adequately mitigated, would be deemed intolerable.'' To assist in setting and operationalizing intolerable risk thresholds, we outline key principles and considerations; for example, to aim for ``good, not perfect'' thresholds in the face of limited data on rapidly advancing AI capabilities and consequently evolving risks. We also propose specific threshold recommendations, including some detailed case studies, for a subset of risks across eight risk categories: (1) Chemical, Biological, Radiological, and Nuclear (CBRN) Weapons, (2) Cyber Attacks, (3) Model Autonomy, (4) Persuasion and Manipulation, (5) Deception, (6) Toxicity, (7) Discrimination, and (8) Socioeconomic Disruption. Our goal is to serve as a starting point or supplementary resource for policymakers and industry leaders, encouraging proactive risk management that prioritizes preventing intolerable risks (ex ante) rather than merely mitigating them after they occur (ex post).

Authors:Julián Méndez, Marc Satkowski
Title: ARbiter: Generating Dialogue Options and Communication Support in Augmented Reality
Abstract:
In this position paper, we propose researching the combination of Augmented Reality (AR) and Artificial Intelligence (AI) to support conversations, inspired by the interfaces of dialogue systems commonly found in videogames. AR-capable devices are becoming more powerful and conventional in looks, as seen in head-mounted displays (HMDs) like the Snapchat Spectacles, the XREAL glasses, or the recently presented Meta Orion. This development reduces possible ergonomic, appearance, and runtime concerns, thus allowing a more straightforward integration and extended use of AR in our everyday lives, both in private and at work. At the same time, we can observe an immense surge in AI development (also at CHI). Recently notorious Large Language Models (LLMs) like OpenAI's o3-mini or DeepSeek-R1 soar over their precursors in their ability to sustain conversations, provide suggestions, and handle complex topics in (almost) real time. In combination with natural language recognition systems, which are nowadays a standard component of smartphones and similar devices (including modern AR-HMDs), it is easy to imagine a combined system that integrates into daily conversations and provides various types of assistance. Such a system would enable many opportunities for research in AR+AI, which, as stated by Hirzle et al., remains scarce. In the following, we describe how the design of a conversational AR+AI system can learn from videogame dialogue systems, and we propose use cases and research questions that can be investigated thanks to this AR+AI combination.

Authors:Chenhao Yang, Siwei Huang, Chuan Hu
Title: Research on a Driver's Perceived Risk Prediction Model Considering Traffic Scene Interaction
Abstract:
In the field of conditional autonomous driving technology, driver perceived risk prediction plays a crucial role in reducing traffic risks and ensuring passenger safety. This study introduces an innovative perceived risk prediction model for human-machine interaction in intelligent driving systems. The model aims to enhance prediction accuracy and, thereby, ensure passenger safety. Through a comprehensive analysis of risk impact mechanisms, we identify three key categories of factors, both subjective and objective, influencing perceived risk: driver's personal characteristics, ego-vehicle motion, and surrounding environment characteristics. We then propose a deep-learning-based risk prediction network that uses the first two categories of factors as inputs. The network captures the interactive relationships among traffic participants in dynamic driving scenarios. Additionally, we design a personalized modeling strategy that incorporates driver-specific traits to improve prediction accuracy. To ensure high-quality training data, we conducted a rigorous video rating experiment. Experimental results show that the proposed network achieves a 10.0% performance improvement over state-of-the-art methods. These findings suggest that the proposed network has significant potential to enhance the safety of conditional autonomous driving systems.

Authors:Yining Cao, Yiyi Huang, Anh Truong, Hijung Valentina Shin, Haijun Xia
Title: Compositional Structures as Substrates for Human-AI Co-creation Environment: A Design Approach and A Case Study
Abstract:
It has been increasingly recognized that effective human-AI co-creation requires more than prompts and results, but an environment with empowering structures that facilitate exploration, planning, iteration, as well as control and inspection of AI generation. Yet, a concrete design approach to such an environment has not been established. Our literature analysis highlights that compositional structures-which organize and visualize individual elements into meaningful wholes-are highly effective in granting creators control over the essential aspects of their content. However, efficiently aggregating and connecting these structures to support the full creation process remains challenging. Therefore, we propose a design approach of leveraging compositional structures as the substrates and infusing AI within and across these structures to enable a controlled and fluid creation process. We evaluate this approach through a case study of developing a video co-creation environment using this approach. User evaluation shows that such an environment allowed users to stay oriented in their creation activity, remain aware and in control of AI's generation, and enable flexible human-AI collaborative workflows.

Authors:Anas Buhayh, Elizabeth McKinnie, Robin Burke
Title: Decoupled Recommender Systems: Exploring Alternative Recommender Ecosystem Designs
Abstract:
Recommender ecosystems are an emerging subject of research. Such research examines how the characteristics of algorithms, recommendation consumers, and item providers influence system dynamics and long-term outcomes. One architectural possibility that has not yet been widely explored in this line of research is the consequences of a configuration in which recommendation algorithms are decoupled from the platforms they serve. This is sometimes called "the friendly neighborhood algorithm store" or "middleware" model. We are particularly interested in how such architectures might offer a range of different distributions of utility across consumers, providers, and recommendation platforms. In this paper, we create a model of a recommendation ecosystem that incorporates algorithm choice and examine the outcomes of such a design.

Authors:Zhuoyue Lyu, Per Ola Kristensson
Title: Objestures: Everyday Objects Meet Mid-Air Gestures for Expressive Interaction
Abstract:
Everyday objects and mid-air gestures have been explored as input modalities, but each has its strengths and limitations - for example, objects offer tangibility but rely on their physical presence; gestures are convenient but lack haptic feedback. We introduce Objestures ("Obj" + "Gestures"), five interaction types that utilize both modalities for a design space of expressive and playful interaction. To evaluate its usefulness, we conducted a user study (N = 12) assessing whether it can effectively support basic 3D tasks such as rotation and scaling and found it has performance comparable to or better than the headset's native freehand manipulation. To understand its user experience, we conducted case studies on three example applications - Sound, Draw, and Shadow - with the same participants, who found it intuitive, engaging, and expressive, and were interested in its everyday use. We further illustrate 30 examples to showcase how Objestures can enrich everyday interactions and discuss its limitations and implications. https://www.zhuoyuelyu.com/objestures

Authors:Luis Marquez-Carpintero, Sergio Suescun-Ferrandiz, Monica Pina-Navarro, Miguel Cazorla, Francisco Gomez-Donoso
Title: CADDI: An in-Class Activity Detection Dataset using IMU data from low-cost sensors
Abstract:
The monitoring and prediction of in-class student activities is of paramount importance for the comprehension of engagement and the enhancement of pedagogical efficacy. The accurate detection of these activities enables educators to modify their lessons in real time, thereby reducing negative emotional states and enhancing the overall learning experience. To this end, the use of non-intrusive devices, such as inertial measurement units (IMUs) embedded in smartwatches, represents a viable solution. The development of reliable predictive systems has been limited by the lack of large, labeled datasets in education. To bridge this gap, we present a novel dataset for in-class activity detection using affordable IMU sensors. The dataset comprises 19 diverse activities, both instantaneous and continuous, performed by 12 participants in typical classroom scenarios. It includes accelerometer, gyroscope, rotation vector data, and synchronized stereo images, offering a comprehensive resource for developing multimodal algorithms using sensor and visual data. This dataset represents a key step toward scalable solutions for activity recognition in educational settings.

Authors:Alexander Karpekov, Sonia Chernova, Thomas Plötz
Title: DISCOVER: Data-driven Identification of Sub-activities via Clustering and Visualization for Enhanced Activity Recognition in Smart Homes
Abstract:
Human Activity Recognition (HAR) using ambient sensors has great potential for practical applications, particularly in elder care and independent living. However, deploying HAR systems in real-world settings remains challenging due to the high cost of labeled data, the need for pre-segmented sensor streams, and the lack of flexibility in activity granularity. To address these limitations, we introduce DISCOVER, a method designed to discover fine-grained human sub-activities from unlabeled sensor data without relying on pre-segmentation. DISCOVER combines unsupervised feature extraction and clustering with a user-friendly visualization tool to streamline the labeling process. DISCOVER enables domain experts to efficiently annotate only a minimal set of representative cluster centroids, reducing the annotation workload to a small number of samples (0.05% of our dataset). We demonstrate DISCOVER's effectiveness through a re-annotation exercise on widely used HAR datasets, showing that it uncovers finer-grained activities and produces more nuanced annotations than traditional coarse labels. DISCOVER represents a step toward practical, deployable HAR systems that adapt to diverse real environments.

Authors:Zaid Hakami, Ashfaq Ali Shafin, Peter J. Clarke, Niki Pissinou, Bogdan Carbunar
Title: Victim-Centred Abuse Investigations and Defenses for Social Media Platforms
Abstract:
Online abuse, a persistent aspect of social platform interactions, impacts user well-being and exposes flaws in platform designs that include insufficient detection efforts and inadequate victim protection measures. Ensuring safety in platform interactions requires the integration of victim perspectives in the design of abuse detection and response systems. In this paper, we conduct surveys (n = 230) and semi-structured interviews (n = 15) with students at a minority-serving institution in the US, to explore their experiences with abuse on a variety of social platforms, their defense strategies, and their recommendations for social platforms to improve abuse responses. We build on study findings to propose design requirements for abuse defense systems and discuss the role of privacy, anonymity, and abuse attribution requirements in their implementation. We introduce ARI, a blueprint for a unified, transparent, and personalized abuse response system for social platforms that sustainably detects abuse by leveraging the expertise of platform users, incentivized with proceeds obtained from abusers.

Authors:Liang Lyu, James Siderius, Hannah Li, Daron Acemoglu, Daniel Huttenlocher, Asuman Ozdaglar
Title: Wikipedia Contributions in the Wake of ChatGPT
Abstract:
How has Wikipedia activity changed for articles with content similar to ChatGPT following its introduction? We estimate the impact using differences-in-differences models, with dissimilar Wikipedia articles as a baseline for comparison, to examine how changes in voluntary knowledge contributions and information-seeking behavior differ by article content. Our analysis reveals that newly created, popular articles whose content overlaps with ChatGPT 3.5 saw a greater decline in editing and viewership after the November 2022 launch of ChatGPT than dissimilar articles did. These findings indicate heterogeneous substitution effects, where users selectively engage less with existing platforms when AI provides comparable content. This points to potential uneven impacts on the future of human-driven online knowledge contributions.

Authors:Haoyu Li, Srikanth Kandula, Maria Angels de Luis Balaguer, Aditya Akella, Venkat Arun
Title: Speculative Ad-hoc Querying
Abstract:
Analyzing large datasets requires responsive query execution, but executing SQL queries on massive datasets can be slow. This paper explores whether query execution can begin even before the user has finished typing, allowing results to appear almost instantly. We propose SpeQL, a system that leverages Large Language Models (LLMs) to predict likely queries based on the database schema, the user's past queries, and their incomplete query. Since exact query prediction is infeasible, SpeQL speculates on partial queries in two ways: 1) it predicts the query structure to compile and plan queries in advance, and 2) it precomputes smaller temporary tables that are much smaller than the original database, but are still predicted to contain all information necessary to answer the user's final query. Additionally, SpeQL continuously displays results for speculated queries and subqueries in real time, aiding exploratory analysis. A utility/user study showed that SpeQL improved task completion time, and participants reported that its speculative display of results helped them discover patterns in the data more quickly. In the study, SpeQL improves user's query latency by up to $289\times$ and kept the overhead reasonable, at $\$4$ per hour.

Authors:Daniil Filienko, Mahek Nizar, Javier Roberti, Denise Galdamez, Haroon Jakher, Sarah Iribarren, Weichao Yuwen, Martine De Cock
Title: Transforming Tuberculosis Care: Optimizing Large Language Models For Enhanced Clinician-Patient Communication
Abstract:
Tuberculosis (TB) is the leading cause of death from an infectious disease globally, with the highest burden in low- and middle-income countries. In these regions, limited healthcare access and high patient-to-provider ratios impede effective patient support, communication, and treatment completion. To bridge this gap, we propose integrating a specialized Large Language Model into an efficacious digital adherence technology to augment interactive communication with treatment supporters. This AI-powered approach, operating within a human-in-the-loop framework, aims to enhance patient engagement and improve TB treatment outcomes.

Authors:Xinyu Shi, Yinghou Wang, Ryan Rossi, Jian Zhao
Title: Brickify: Enabling Expressive Design Intent Specification through Direct Manipulation on Design Tokens
Abstract:
Expressing design intent using natural language prompts requires designers to verbalize the ambiguous visual details concisely, which can be challenging or even impossible. To address this, we introduce Brickify, a visual-centric interaction paradigm -- expressing design intent through direct manipulation on design tokens. Brickify extracts visual elements (e.g., subject, style, and color) from reference images and converts them into interactive and reusable design tokens that can be directly manipulated (e.g., resize, group, link, etc.) to form the visual lexicon. The lexicon reflects users' intent for both what visual elements are desired and how to construct them into a whole. We developed Brickify to demonstrate how AI models can interpret and execute the visual lexicon through an end-to-end pipeline. In a user study, experienced designers found Brickify more efficient and intuitive than text-based prompts, allowing them to describe visual details, explore alternatives, and refine complex designs with greater ease and control.

Authors:Ayano Okoso, Mingzhe Yang, Yukino Baba
Title: Do Expressions Change Decisions? Exploring the Impact of AI's Explanation Tone on Decision-Making
Abstract:
Explanatory information helps users to evaluate the suggestions offered by AI-driven decision support systems. With large language models, adjusting explanation expressions has become much easier. However, how these expressions influence human decision-making remains largely unexplored. This study investigated the effect of explanation tone (e.g., formal or humorous) on decision-making, focusing on AI roles and user attributes. We conducted user experiments across three scenarios depending on AI roles (assistant, second-opinion provider, and expert) using datasets designed with varying tones. The results revealed that tone significantly influenced decision-making regardless of user attributes in the second-opinion scenario, whereas its impact varied by user attributes in the assistant and expert scenarios. In addition, older users were more influenced by tone, and highly extroverted users exhibited discrepancies between their perceptions and decisions. Furthermore, open-ended questionnaires highlighted that users expect tone adjustments to enhance their experience while emphasizing the importance of tone consistency and ethical considerations. Our findings provide crucial insights into the design of explanation expressions.

Authors:Nicolai Hejlesen Jørgensen, Sarmilan Tharmabalan, Ilhan Aslan, Nicolai Brodersen Hansen, Timothy Merritt
Title: Static Vs. Agentic Game Master AI for Facilitating Solo Role-Playing Experiences
Abstract:
This paper presents a game master AI for single-player role-playing games. The AI is designed to deliver interactive text-based narratives and experiences typically associated with multiplayer tabletop games like Dungeons & Dragons. We report on the design process and the series of experiments to improve the functionality and experience design, resulting in two functional versions of the system. While v1 of our system uses simplified prompt engineering, v2 leverages a multi-agent architecture and the ReAct framework to include reasoning and action. A comparative evaluation demonstrates that v2 as an agentic system maintains play while significantly improving modularity and game experience, including immersion and curiosity. Our findings contribute to the evolution of AI-driven interactive fiction, highlighting new avenues for enhancing solo role-playing experiences.

Authors:Xinru Wang, Mengjie Yu, Hannah Nguyen, Michael Iuzzolino, Tianyi Wang, Peiqi Tang, Natasha Lynova, Co Tran, Ting Zhang, Naveen Sendhilnathan, Hrvoje Benko, Haijun Xia, Tanya Jonker
Title: Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices
Abstract:
Large Language Models (LLMs) have shown remarkable potential in recommending everyday actions as personal AI assistants, while Explainable AI (XAI) techniques are being increasingly utilized to help users understand why a recommendation is given. Personal AI assistants today are often located on ultra-small devices such as smartwatches, which have limited screen space. The verbosity of LLM-generated explanations, however, makes it challenging to deliver glanceable LLM explanations on such ultra-small devices. To address this, we explored 1) spatially structuring an LLM's explanation text using defined contextual components during prompting and 2) presenting temporally adaptive explanations to users based on confidence levels. We conducted a user study to understand how these approaches impacted user experiences when interacting with LLM recommendations and explanations on ultra-small devices. The results showed that structured explanations reduced users' time to action and cognitive load when reading an explanation. Always-on structured explanations increased users' acceptance of AI recommendations. However, users were less satisfied with structured explanations compared to unstructured ones due to their lack of sufficient, readable details. Additionally, adaptively presenting structured explanations was less effective at improving user perceptions of the AI compared to the always-on structured explanations. Together with users' interview feedback, the results led to design implications to be mindful of when personalizing the content and timing of LLM explanations that are displayed on ultra-small devices.

Authors:Arnavi Chheda-Kothary, Ritesh Kanchi, Chris Sanders, Kevin Xiao, Aditya Sengupta, Melanie Kneitmix, Jacob O. Wobbrock, Jon E. Froehlich
Title: ArtInsight: Enabling AI-Powered Artwork Engagement for Mixed Visual-Ability Families
Abstract:
We introduce ArtInsight, a novel AI-powered system to facilitate deeper engagement with child-created artwork in mixed visual-ability families. ArtInsight leverages large language models (LLMs) to craft a respectful and thorough initial description of a child's artwork, and provides: creative AI-generated descriptions for a vivid overview, audio recording to capture the child's own description of their artwork, and a set of AI-generated questions to facilitate discussion between blind or low-vision (BLV) family members and their children. Alongside ArtInsight, we also contribute a new rubric to score AI-generated descriptions of child-created artwork and an assessment of state-of-the-art LLMs. We evaluated ArtInsight with five groups of BLV family members and their children, and as a case study with one BLV child therapist. Our findings highlight a preference for ArtInsight's longer, artistically-tailored descriptions over those generated by existing BLV AI tools. Participants highlighted the creative description and audio recording components as most beneficial, with the former helping ``bring a picture to life'' and the latter centering the child's narrative to generate context-aware AI responses. Our findings reveal different ways that AI can be used to support art engagement, including before, during, and after interaction with the child artist, as well as expectations that BLV adults and their sighted children have about AI-powered tools.

Authors:Hayeon Jeon, Suhwoo Yoon, Keyeun Lee, Seo Hyeong Kim, Esther Hehsun Kim, Seonghye Cho, Yena Ko, Soeun Yang, Laura Dabbish, John Zimmerman, Eun-mee Kim, Hajin Lim
Title: Letters from Future Self: Augmenting the Letter-Exchange Exercise with LLM-based Agents to Enhance Young Adults' Career Exploration
Abstract:
Young adults often encounter challenges in career exploration. Self-guided interventions, such as the letter-exchange exercise, where participants envision and adopt the perspective of their future selves by exchanging letters with their envisioned future selves, can support career development. However, the broader adoption of such interventions may be limited without structured guidance. To address this, we integrated Large Language Model (LLM)-based agents that simulate participants' future selves into the letter-exchange exercise and evaluated their effectiveness. A one-week experiment (N=36) compared three conditions: (1) participants manually writing replies to themselves from the perspective of their future selves (baseline), (2) future-self agents generating letters to participants, and (3) future-self agents engaging in chat conversations with participants. Results indicated that exchanging letters with future-self agents enhanced participants' engagement during the exercise, while overall benefits of the intervention on future orientation, career self-concept, and psychological support remained comparable across conditions. We discuss design implications for AI-augmented interventions for supporting young adults' career exploration.

Authors:Bich Ngoc, Doan, Joseph Seering
Title: The Design Space for Online Restorative Justice Tools: A Case Study with ApoloBot
Abstract:
Volunteer moderators use various strategies to address online harms within their communities. Although punitive measures like content removal or account bans are common, recent research has explored the potential for restorative justice as an alternative framework to address the distinct needs of victims, offenders, and community members. In this study, we take steps toward identifying a more concrete design space for restorative justice-oriented tools by developing ApoloBot, a Discord bot designed to facilitate apologies when harm occurs in online communities. We present results from two rounds of interviews: first, with moderators giving feedback about the design of ApoloBot, and second, after a subset of these moderators have deployed ApoloBot in their communities. This study builds on prior work to yield more detailed insights regarding the potential of adopting online restorative justice tools, including opportunities, challenges, and implications for future designs.

Authors:Ali Ladak, Matti Wilks, Steve Loughnan, Jacy Reese Anthis
Title: Robots, Chatbots, Self-Driving Cars: Perceptions of Mind and Morality Across Artificial Intelligences
Abstract:
AI systems have rapidly advanced, diversified, and proliferated, but our knowledge of people's perceptions of mind and morality in them is limited, despite its importance for outcomes such as whether people trust AIs and how they assign responsibility for AI-caused harms. In a preregistered online study, 975 participants rated 26 AI and non-AI entities. Overall, AIs were perceived to have low-to-moderate agency (e.g., planning, acting), between inanimate objects and ants, and low experience (e.g., sensing, feeling). For example, ChatGPT was rated only as capable of feeling pleasure and pain as a rock. The analogous moral faculties, moral agency (doing right or wrong) and moral patiency (being treated rightly or wrongly) were higher and more varied, particularly moral agency: The highest-rated AI, a Tesla Full Self-Driving car, was rated as morally responsible for harm as a chimpanzee. We discuss how design choices can help manage perceptions, particularly in high-stakes moral contexts.

Authors:Jingying Wang, Jingjing Zhang, Juana Nicoll Capizzano, Matthew Sigakis, Xu Wang, Vitaliy Popov
Title: eXplainMR: Generating Real-time Textual and Visual eXplanations to Facilitate UltraSonography Learning in MR
Abstract:
eXplainMR is a Mixed Reality tutoring system designed for basic cardiac surface ultrasound training. Trainees wear a head-mounted display (HMD) and hold a controller, mimicking a real ultrasound probe, while treating a desk surface as the patient's body for low-cost and anywhere training. eXplainMR engages trainees with troubleshooting questions and provides automated feedback through four key mechanisms: 1) subgoals that break down tasks into single-movement steps, 2) textual explanations comparing the current incorrect view with the target view, 3) real-time segmentation and annotation of ultrasound images for direct visualization, and 4) the 3D visual cues provide further explanations on the intersection between the slicing plane and anatomies.

Authors:Ziyue Lin, Siqi Shen, Zichen Cheng, Cheok Lam Lai, Siming Chen
Title: Carbon and Silicon, Coexist or Compete? A Survey on Human-AI Interactions in Agent-based Modeling and Simulation
Abstract:
Recent interest in human-AI interactions in agent-based modeling and simulation (ABMS) has grown rapidly due to the widespread utilization of large language models (LLMs). ABMS is an intelligent approach that simulates autonomous agents' behaviors within a defined environment to research emergent phenomena. Integrating LLMs into ABMS enables natural language interaction between humans and models. Meanwhile, it introduces new challenges that rely on human interaction to address. Human involvement can assist ABMS in adapting to flexible and complex research demands. However, systematic reviews of interactions that examine how humans and AI interact in ABMS are lacking. In this paper, we investigate existing works and propose a novel taxonomy to categorize the interactions derived from them. Specifically, human users refer to researchers who utilize ABMS tools to conduct their studies in our survey. We decompose interactions into five dimensions: the goals that users want to achieve (Why), the phases that users are involved (When), the components of the system (What), the roles of users (Who), and the means of interactions (How). Our analysis summarizes the findings that reveal existing interaction patterns. They provide researchers who develop interactions with comprehensive guidance on how humans and AI interact. We further discuss the unexplored interactions and suggest future research directions.

Authors:Yujin Kim, Suhyun Kim, Yeojin Kim, Soyeon Lee, Uran Oh
Title: I am not thinking anymore, just following the path.: Investigating Task Delegation Trend of Author-AI Co-Creation with Generative AIs
Abstract:
This paper investigates the task delegation trends of digital comic authors to generative AIs during the creation process. We observed 16 digital comic authors using generative AIs during the drafting stage. We categorized authors delegation levels and examined the extent of delegation, variations in AI usage, and calibration of delegation in co-creation. Our findings show that most authors delegate significant tasks to AI, with higher delegation linked to less time spent on creation and more detailed questions to AI. After co-creation, about 60% of authors adjusted their delegation levels, mostly calibrating to less delegation due to loss of agency and AIs unoriginal outputs. We suggest strategies for calibrating delegation to an appropriate level, redefine trust in human-AI co-creation, and propose novel measurements for trust in these contexts. Our study provides insights into how authors can effectively collaborate with generative AIs, balance delegation, and navigate AIs role in the creative process.

Authors:Adit Gupta, Christopher MacLellan
Title: Intelligent Tutors Beyond K-12: An Observational Study of Adult Learner Engagement and Academic Impact
Abstract:
Intelligent tutors have proven to be effective in K-12 education, though their impact on adult learners -- especially as a supplementary resource -- remains underexplored. Understanding how adults voluntarily engage with educational technologies can inform the design of tools that support skill re-learning and enhancement. More critically, it helps determine whether tutoring systems, which are typically built for K-12 learners, can also support adult populations. This study examines the adoption, usage patterns, and effectiveness of a novel tutoring system, Apprentice Tutors, among adult learners at a state technical college. We analyze three types of data including, user demographics, grades, and tutor interactions, to assess whether voluntary tutor usage translates into measurable learning gains. Our findings reveal key temporal patterns in tutor engagement and provide evidence of learning within tutors, as determined through skill improvement in knowledge components across tutors. We also found evidence that this learning transferred outside the tutor, as observed through higher course assessment scores following tutor usage. These results suggest that intelligent tutors are a viable tool for adult learners, warranting further research into their long-term impact on this population.

Authors:Ayae Ide, Tanusree Sharma
Title: Personhood Credentials: Human-Centered Design Recommendation Balancing Security, Usability, and Trust
Abstract:
Building on related concepts, like, decentralized identifiers (DIDs), proof of personhood, anonymous credentials, personhood credentials (PHCs) emerged as an alternative approach, enabling individuals to verify to digital service providers that they are a person without disclosing additional information. However, new technologies might introduce some friction due to users misunderstandings and mismatched expectations. Despite their growing importance, limited research has been done on users perceptions and preferences regarding PHCs. To address this gap, we conducted competitive analysis, and semi-structured online user interviews with 23 participants from US and EU to provide concrete design recommendations for PHCs that incorporate user needs, adoption rules, and preferences. Our study -- (a)surfaces how people reason about unknown privacy and security guarantees of PHCs compared to current verification methods -- (b) presents the impact of several factors on how people would like to onboard and manage PHCs, including, trusted issuers (e.g. gov), ground truth data to issue PHC (e.g biometrics, physical id), and issuance system (e.g. centralized vs decentralized). In a think-aloud conceptual design session, participants recommended -- conceptualized design, such as periodic biometrics verification, time-bound credentials, visually interactive human-check, and supervision of government for issuance system. We propose actionable designs reflecting users preferences.

Authors:Haoxiang Fan, Changshuang Zhou, Hao Yu, Xueyang Wu, Jiangyu Gu, Zhenhui Peng
Title: LitLinker: Supporting the Ideation of Interdisciplinary Contexts with Large Language Models for Teaching Literature in Elementary Schools
Abstract:
Teaching literature under interdisciplinary contexts (e.g., science, art) that connect reading materials has become popular in elementary schools. However, constructing such contexts is challenging as it requires teachers to explore substantial amounts of interdisciplinary content and link it to the reading materials. In this paper, we develop LitLinker via an iterative design process involving 13 teachers to facilitate the ideation of interdisciplinary contexts for teaching literature. Powered by a large language model (LLM), LitLinker can recommend interdisciplinary topics and contextualize them with the literary elements (e.g., paragraphs, viewpoints) in the reading materials. A within-subjects study (N=16) shows that compared to an LLM chatbot, LitLinker can improve the integration depth of different subjects and reduce workload in this ideation task. Expert interviews (N=9) also demonstrate LitLinker's usefulness for supporting the ideation of interdisciplinary contexts for teaching literature. We conclude with concerns and design considerations for supporting interdisciplinary teaching with LLMs.

Authors:Zahra Aref, Sheng Wei, Narayan B. Mandayam
Title: Human-AI Collaboration in Cloud Security: Cognitive Hierarchy-Driven Deep Reinforcement Learning
Abstract:
Given the complexity of multi-tenant cloud environments and the growing need for real-time threat mitigation, Security Operations Centers (SOCs) must adopt AI-driven adaptive defense mechanisms to counter Advanced Persistent Threats (APTs). However, SOC analysts face challenges in handling adaptive adversarial tactics, requiring intelligent decision-support frameworks. We propose a Cognitive Hierarchy Theory-driven Deep Q-Network (CHT-DQN) framework that models interactive decision-making between SOC analysts and AI-driven APT bots. The SOC analyst (defender) operates at cognitive level-1, anticipating attacker strategies, while the APT bot (attacker) follows a level-0 policy. By incorporating CHT into DQN, our framework enhances adaptive SOC defense using Attack Graph (AG)-based reinforcement learning. Simulation experiments across varying AG complexities show that CHT-DQN consistently achieves higher data protection and lower action discrepancies compared to standard DQN. A theoretical lower bound further confirms its superiority as AG complexity increases. A human-in-the-loop (HITL) evaluation on Amazon Mechanical Turk (MTurk) reveals that SOC analysts using CHT-DQN-derived transition probabilities align more closely with adaptive attackers, leading to better defense outcomes. Moreover, human behavior aligns with Prospect Theory (PT) and Cumulative Prospect Theory (CPT): participants are less likely to reselect failed actions and more likely to persist with successful ones. This asymmetry reflects amplified loss sensitivity and biased probability weighting -- underestimating gains after failure and overestimating continued success. Our findings highlight the potential of integrating cognitive models into deep reinforcement learning to improve real-time SOC decision-making for cloud security.

Authors:Manvi S, Roshini Deva, Neha Madhiwalla, Azra Ismail
Title: "Who Has the Time?": Understanding Receptivity to Health Chatbots among Underserved Women in India
Abstract:
Access to health information and services among women continues to be a major challenge in many communities globally. In recent years, there has been a growing interest in the potential of chatbots to address this information and access gap. We conducted interviews and focus group discussions with underserved women in urban India to understand their receptivity towards the use of chatbots for maternal and child health, as well as barriers to their adoption. Our findings uncover gaps in digital access and literacies, and perceived conflict with various responsibilities that women are burdened with, which shape their interactions with digital technology. Our paper offers insights into the design of chatbots for community health that can meet the lived realities of women in underserved settings.

Authors:Roshini Deva, Dhruv Ramani, Tanvi Divate, Suhani Jalota, Azra Ismail
Title: "Kya family planning after marriage hoti hai?": Integrating Cultural Sensitivity in an LLM Chatbot for Reproductive Health
Abstract:
Access to sexual and reproductive health information remains a challenge in many communities globally, due to cultural taboos and limited availability of healthcare providers. Public health organizations are increasingly turning to Large Language Models (LLMs) to improve access to timely and personalized information. However, recent HCI scholarship indicates that significant challenges remain in incorporating context awareness and mitigating bias in LLMs. In this paper, we study the development of a culturally-appropriate LLM-based chatbot for reproductive health with underserved women in urban India. Through user interactions, focus groups, and interviews with multiple stakeholders, we examine the chatbot's response to sensitive and highly contextual queries on reproductive health. Our findings reveal strengths and limitations of the system in capturing local context, and complexities around what constitutes "culture". Finally, we discuss how local context might be better integrated, and present a framework to inform the design of culturally-sensitive chatbots for community health.

Authors:Sukrit Kumar, Drishti Goel, Thomas Zimmermann, Brian Houck, B. Ashok, Chetan Bansal
Title: Time Warp: The Gap Between Developers' Ideal vs Actual Workweeks in an AI-Driven Era
Abstract:
Software developers balance a variety of different tasks in a workweek, yet the allocation of time often differs from what they consider ideal. Identifying and addressing these deviations is crucial for organizations aiming to enhance the productivity and well-being of the developers. In this paper, we present the findings from a survey of 484 software developers at Microsoft, which aims to identify the key differences between how developers would like to allocate their time during an ideal workweek versus their actual workweek. Our analysis reveals significant deviations between a developer's ideal workweek and their actual workweek, with a clear correlation: as the gap between these two workweeks widens, we observe a decline in both productivity and satisfaction. By examining these deviations in specific activities, we assess their direct impact on the developers' satisfaction and productivity. Additionally, given the growing adoption of AI tools in software engineering, both in the industry and academia, we identify specific tasks and areas that could be strong candidates for automation. In this paper, we make three key contributions: 1) We quantify the impact of workweek deviations on developer productivity and satisfaction 2) We identify individual tasks that disproportionately affect satisfaction and productivity 3) We provide actual data-driven insights to guide future AI automation efforts in software engineering, aligning them with the developers' requirements and ideal workflows for maximizing their productivity and satisfaction.

Authors:Andrew Shaw, Andre Ye, Ranjay Krishna, Amy X. Zhang
Title: Agonistic Image Generation: Unsettling the Hegemony of Intention
Abstract:
Current image generation paradigms prioritize actualizing user intention - "see what you intend" - but often neglect the sociopolitical dimensions of this process. However, it is increasingly evident that image generation is political, contributing to broader social struggles over visual meaning. This sociopolitical aspect was highlighted by the March 2024 Gemini controversy, where Gemini faced criticism for inappropriately injecting demographic diversity into user prompts. Although the developers sought to redress image generation's sociopolitical dimension by introducing diversity "corrections," their opaque imposition of a standard for "diversity" ultimately proved counterproductive. In this paper, we present an alternative approach: an image generation interface designed to embrace open negotiation along the sociopolitical dimensions of image creation. Grounded in the principles of agonistic pluralism (from the Greek agon, meaning struggle), our interface actively engages users with competing visual interpretations of their prompts. Through a lab study with 29 participants, we evaluate our agonistic interface on its ability to facilitate reflection - engagement with other perspectives and challenging dominant assumptions - a core principle that underpins agonistic contestation. We compare it to three existing paradigms: a standard interface, a Gemini-style interface that produces "diverse" images, and an intention-centric interface suggesting prompt refinements. Our findings demonstrate that the agonistic interface enhances reflection across multiple measures, but also that reflection depends on users perceiving the interface as both appropriate and empowering; introducing diversity without grounding it in relevant political contexts was perceived as inauthentic. Our results suggest that diversity and user intention should not be treated as opposing values to be balanced.

Authors:Jiawei Fang, Ruonan Zheng, Yuanyao, Xiaoxia Gao, Chengxu Zuo, Shihui Guo, Yiyue Luo
Title: FIP: Endowing Robust Motion Capture on Daily Garment by Fusing Flex and Inertial Sensors
Abstract:
What if our clothes could capture our body motion accurately? This paper introduces Flexible Inertial Poser (FIP), a novel motion-capturing system using daily garments with two elbow-attached flex sensors and four Inertial Measurement Units (IMUs). To address the inevitable sensor displacements in loose wearables which degrade joint tracking accuracy significantly, we identify the distinct characteristics of the flex and inertial sensor displacements and develop a Displacement Latent Diffusion Model and a Physics-informed Calibrator to compensate for sensor displacements based on such observations, resulting in a substantial improvement in motion capture accuracy. We also introduce a Pose Fusion Predictor to enhance multimodal sensor fusion. Extensive experiments demonstrate that our method achieves robust performance across varying body shapes and motions, significantly outperforming SOTA IMU approaches with a 19.5% improvement in angular error, a 26.4% improvement in elbow angular error, and a 30.1% improvement in positional error. FIP opens up opportunities for ubiquitous human-computer interactions and diverse interactive applications such as Metaverse, rehabilitation, and fitness analysis.

Authors:Wen-Fan Wang, Chien-Ting Lu, Nil Ponsa CampanyÃ, Bing-Yu Chen, Mike Y. Chen
Title: AIdeation: Designing a Human-AI Collaborative Ideation System for Concept Designers
Abstract:
Concept designers in the entertainment industry create highly detailed, often imaginary environments for movies, games, and TV shows. Their early ideation phase requires intensive research, brainstorming, visual exploration, and combination of various design elements to form cohesive designs. However, existing AI tools focus on image generation from user specifications, lacking support for the unique needs and complexity of concept designers' workflows. Through a formative study with 12 professional designers, we captured their workflows and identified key requirements for AI-assisted ideation tools. Leveraging these insights, we developed AIdeation to support early ideation by brainstorming design concepts with flexible searching and recombination of reference images. A user study with 16 professional designers showed that AIdeation significantly enhanced creativity, ideation efficiency, and satisfaction (all p<.01) compared to current tools and workflows. A field study with 4 studios for 1 week provided insights into AIdeation's benefits and limitations in real-world projects. After the completion of the field study, two studios, covering films, television, and games, have continued to use AIdeation in their commercial projects to date, further validating AIdeation's improvement in ideation quality and efficiency.

Authors:Paula Akemi Aoyagui, Kelsey Stemmler, Sharon Ferguson, Young-ho Kim, Anastasia Kuzminykh
Title: A Matter of Perspective(s): Contrasting Human and LLM Argumentation in Subjective Decision-Making on Subtle Sexism
Abstract:
In subjective decision-making, where decisions are based on contextual interpretation, Large Language Models (LLMs) can be integrated to present users with additional rationales to consider. The diversity of these rationales is mediated by the ability to consider the perspectives of different social actors. However, it remains unclear whether and how models differ in the distribution of perspectives they provide. We compare the perspectives taken by humans and different LLMs when assessing subtle sexism scenarios. We show that these perspectives can be classified within a finite set (perpetrator, victim, decision-maker), consistently present in argumentations produced by humans and LLMs, but in different distributions and combinations, demonstrating differences and similarities with human responses, and between models. We argue for the need to systematically evaluate LLMs' perspective-taking to identify the most suitable models for a given decision-making task. We discuss the implications for model evaluation.

Authors:Zixin Zhao, Damien Masson, Young-Ho Kim, Gerald Penn, Fanny Chevalier
Title: Making the Write Connections: Linking Writing Support Tools with Writer's Needs
Abstract:
This work sheds light on whether and how creative writers' needs are met by existing research and commercial writing support tools (WST). We conducted a need finding study to gain insight into the writers' process during creative writing through a qualitative analysis of the response from an online questionnaire and Reddit discussions on r/Writing. Using a systematic analysis of 115 tools and 67 research papers, we map out the landscape of how digital tools facilitate the writing process. Our triangulation of data reveals that research predominantly focuses on the writing activity and overlooks pre-writing activities and the importance of visualization. We distill 10 key takeaways to inform future research on WST and point to opportunities surrounding underexplored areas. Our work offers a holistic and up-to-date account of how tools have transformed the writing process, guiding the design of future tools that address writers' evolving and unmet needs.

Authors:Gali Noti, Kate Donahue, Jon Kleinberg, Sigal Oren
Title: AI-Assisted Decision Making with Human Learning
Abstract:
AI systems increasingly support human decision-making. In many cases, despite the algorithm's superior performance, the final decision remains in human hands. For example, an AI may assist doctors in determining which diagnostic tests to run, but the doctor ultimately makes the diagnosis. This paper studies such AI-assisted decision-making settings, where the human learns through repeated interactions with the algorithm. In our framework, the algorithm -- designed to maximize decision accuracy according to its own model -- determines which features the human can consider. The human then makes a prediction based on their own less accurate model. We observe that the discrepancy between the algorithm's model and the human's model creates a fundamental tradeoff. Should the algorithm prioritize recommending more informative features, encouraging the human to recognize their importance, even if it results in less accurate predictions in the short term until learning occurs? Or is it preferable to forgo educating the human and instead select features that align more closely with their existing understanding, minimizing the immediate cost of learning? This tradeoff is shaped by the algorithm's time-discounted objective and the human's learning ability. Our results show that optimal feature selection has a surprisingly clean combinatorial characterization, reducible to a stationary sequence of feature subsets that is tractable to compute. As the algorithm becomes more "patient" or the human's learning improves, the algorithm increasingly selects more informative features, enhancing both prediction accuracy and the human's understanding. Notably, early investment in learning leads to the selection of more informative features than a later investment. We complement our analysis by showing that the impact of errors in the algorithm's knowledge is limited as it does not make the prediction directly.

Authors:Marie Muehlhaus, Alexander Liggesmeyer, Jürgen Steimle
Title: ExoKit: A Toolkit for Rapid Prototyping of Interactions for Arm-based Exoskeletons
Abstract:
Exoskeletons open up a unique interaction space that seamlessly integrates users' body movements with robotic actuation. Despite its potential, human-exoskeleton interaction remains an underexplored area in HCI, largely due to the lack of accessible prototyping tools that enable designers to easily develop exoskeleton designs and customized interactive behaviors. We present ExoKit, a do-it-yourself toolkit for rapid prototyping of low-fidelity, functional exoskeletons targeted at novice roboticists. ExoKit includes modular hardware components for sensing and actuating shoulder and elbow joints, which are easy to fabricate and (re)configure for customized functionality and wearability. To simplify the programming of interactive behaviors, we propose functional abstractions that encapsulate high-level human-exoskeleton interactions. These can be readily accessed either through ExoKit's command-line or graphical user interface, a Processing library, or microcontroller firmware, each targeted at different experience levels. Findings from implemented application cases and two usage studies demonstrate the versatility and accessibility of ExoKit for early-stage interaction design.

Authors:Shiya Tsang, Ruiyao Miao, Junren Xiao, Hui Xiong
Title: AnimAlte:Designing AI-Infused Cartoon Videos to Improve Preschoolers' Language Learning with Family Engagement at Home
Abstract:
Cartoon videos have proven to be effective in learning vocabulary to preschool children.However, we have little knowledge about integrating AI into cartoon videos to provide systematic, multimodal vocabulary learning support. This late-breaking work present \name{}, an AI-powered cartoon video system that enables real-time Q\&A, vocabulary review, and contextual learning. Preliminary findings contextualized how families interact with \name{} to support vocabulary learning. Parents appreciated the system for its personalized, engaging experiences, fostering collaboration, and encouraging self-reflection on parenting. This study offers valuable design implications for informing future video systems to support vocabulary learning.

Authors:Yuan Sun, Ting Wang
Title: Be Friendly, Not Friends: How LLM Sycophancy Shapes User Trust
Abstract:
Recent studies have revealed that large language model (LLM)-powered conversational agents often exhibit `sycophancy', a tendency to adapt their responses to align with user perspectives, even at the expense of factual accuracy. However, users' perceptions of LLM sycophancy and its interplay with other anthropomorphic features (e.g., friendliness) in shaping user trust remains understudied. To bridge this gap, we conducted a 2 (Sycophancy: presence vs. absence) x 2 (Friendliness: high vs. low) between-subjects experiment (N = 224). Our study uncovered, for the first time, the intricate dynamics between LLM sycophancy and friendliness: When an LLM agent already exhibits a friendly demeanor, being sycophantic reduces perceived authenticity, thereby lowering user trust; Conversely, when the agent is less friendly, aligning its responses with user opinions makes it appear more genuine, leading to higher user trust. Our findings entail profound implications for AI persuasion through exploiting human psychological tendencies and highlight the imperative for responsible designs in user-LLM agent interactions.

Authors:Karan Taneja, Ashok K. Goel
Title: MuDoC: An Interactive Multimodal Document-grounded Conversational AI System
Abstract:
Multimodal AI is an important step towards building effective tools to leverage multiple modalities in human-AI communication. Building a multimodal document-grounded AI system to interact with long documents remains a challenge. Our work aims to fill the research gap of directly leveraging grounded visuals from documents alongside textual content in documents for response generation. We present an interactive conversational AI agent 'MuDoC' based on GPT-4o to generate document-grounded responses with interleaved text and figures. MuDoC's intelligent textbook interface promotes trustworthiness and enables verification of system responses by allowing instant navigation to source text and figures in the documents. We also discuss qualitative observations based on MuDoC responses highlighting its strengths and limitations.

Authors:Wiktoria Mieleszczenko-Kowszewicz, Beata Bajcar, Jolanta Babiak, Berenika Dyczek, Jakub Świstak, Przemysław Biecek
Title: Mind What You Ask For: Emotional and Rational Faces of Persuasion by Large Language Models
Abstract:
Be careful what you ask for, you just might get it. This saying fits with the way large language models (LLMs) are trained, which, instead of being rewarded for correctness, are increasingly rewarded for pleasing the recipient. So, they are increasingly effective at persuading us that their answers are valuable. But what tricks do they use in this persuasion? In this study, we examine what are the psycholinguistic features of the responses used by twelve different language models. By grouping response content according to rational or emotional prompts and exploring social influence principles employed by LLMs, we ask whether and how we can mitigate the risks of LLM-driven mass misinformation. We position this study within the broader discourse on human-centred AI, emphasizing the need for interdisciplinary approaches to mitigate cognitive and societal risks posed by persuasive AI responses.

Authors:Nels Numan, Gabriel Brostow, Suhyun Park, Simon Julier, Anthony Steed, Jessica Van Brummelen
Title: CoCreatAR: Enhancing Authoring of Outdoor Augmented Reality Experiences Through Asymmetric Collaboration
Abstract:
Authoring site-specific outdoor augmented reality (AR) experiences requires a nuanced understanding of real-world context to create immersive and relevant content. Existing ex-situ authoring tools typically rely on static 3D models to represent spatial information. However, in our formative study (n=25), we identified key limitations of this approach: models are often outdated, incomplete, or insufficient for capturing critical factors such as safety considerations, user flow, and dynamic environmental changes. These issues necessitate frequent on-site visits and additional iterations, making the authoring process more time-consuming and resource-intensive. To mitigate these challenges, we introduce CoCreatAR, an asymmetric collaborative mixed reality authoring system that integrates the flexibility of ex-situ workflows with the immediate contextual awareness of in-situ authoring. We conducted an exploratory study (n=32) comparing CoCreatAR to an asynchronous workflow baseline, finding that it enhances engagement, creativity, and confidence in the authored output while also providing preliminary insights into its impact on task load. We conclude by discussing the implications of our findings for integrating real-world context into site-specific AR authoring systems.

Authors:Masaki Kuribayashi, Kohei Uehara, Allan Wang, Shigeo Morishima, Chieko Asakawa
Title: WanderGuide: Indoor Map-less Robotic Guide for Exploration by Blind People
Abstract:
Blind people have limited opportunities to explore an environment based on their interests. While existing navigation systems could provide them with surrounding information while navigating, they have limited scalability as they require preparing prebuilt maps. Thus, to develop a map-less robot that assists blind people in exploring, we first conducted a study with ten blind participants at a shopping mall and science museum to investigate the requirements of the system, which revealed the need for three levels of detail to describe the surroundings based on users' preferences. Then, we developed WanderGuide, with functionalities that allow users to adjust the level of detail in descriptions and verbally interact with the system to ask questions about the environment or to go to points of interest. The study with five blind participants revealed that WanderGuide could provide blind people with the enjoyable experience of wandering around without a specific destination in their minds.

Authors:Shikha Soneji, Sourav Panda, Sameer Neve, Jonathan Dodge
Title: Signed, Sealed,... Confused: Exploring the Understandability and Severity of Policy Documents
Abstract:
In general, Terms of Service (ToS) and other policy documents are verbose and full of legal jargon, which poses challenges for users to understand. To improve user accessibility and transparency, the "Terms of Service; Didn't Read" (ToS;DR) project condenses intricate legal terminology into summaries and overall grades for the website's policy documents. Nevertheless, uncertainties remain about whether users could truly grasp the implications of simplified presentations. We conducted an online survey to assess the perceived understandability and severity of randomly chosen cases from the ToS;DR taxonomy. Preliminary results indicate that, although most users report understanding the cases, they find a bias towards service providers in about two-thirds of the cases. The findings of our study emphasize the necessity of prioritizing user-centric policy formulation. This study has the potential to reveal the extent of information imbalance in digital services and promote more well-informed user consent.

Authors:Qifu Wen, Prishita Kochhar, Sherif Zeyada, Tahereh Javaheri, Reza Rawassizadeh
Title: From Clicks to Conversations: Evaluating the Effectiveness of Conversational Agents in Statistical Analysis
Abstract:
The rapid proliferation of data science forced different groups of individuals with different backgrounds to adapt to statistical analysis. We hypothesize that conversational agents are better suited for statistical analysis than traditional graphical user interfaces (GUI). In this work, we propose a novel conversational agent, StatZ, for statistical analysis. We evaluate the efficacy of StatZ relative to established statistical software:SPSS, SAS, Stata, and JMP in terms of accuracy, task completion time, user experience, and user satisfaction. We combined the proposed analysis question from state-of-the-art language models with suggestions from statistical analysis experts and tested with 51 participants from diverse backgrounds. Our experimental design assessed each participant's ability to perform statistical analysis tasks using traditional statistical analysis tools with GUI and our conversational agent. Results indicate that the proposed conversational agents significantly outperform GUI statistical software in all assessed metrics, including quantitative (task completion time, accuracy, and user experience), and qualitative (user satisfaction) metrics. Our findings underscore the potential of using conversational agents to enhance statistical analysis processes, reducing cognitive load and learning curves and thereby proliferating data analysis capabilities, to individuals with limited knowledge of statistics.

Authors:David Black, Maria Tirindelli, Septimiu Salcudean, Wolfgang Wein, Marco Esposito
Title: Visual-Haptic Model Mediated Teleoperation for Remote Ultrasound
Abstract:
Tele-ultrasound has the potential greatly to improve health equity for countless remote communities. However, practical scenarios involve potentially large time delays which cause current implementations of telerobotic ultrasound (US) to fail. Using a local model of the remote environment to provide haptics to the expert operator can decrease teleoperation instability, but the delayed visual feedback remains problematic. This paper introduces a robotic tele-US system in which the local model is not only haptic, but also visual, by re-slicing and rendering a pre-acquired US sweep in real time to provide the operator a preview of what the delayed image will resemble. A prototype system is presented and tested with 15 volunteer operators. It is found that visual-haptic model-mediated teleoperation (MMT) compensates completely for time delays up to 1000 ms round trip in terms of operator effort and completion time while conventional MMT does not. Visual-haptic MMT also significantly outperforms MMT for longer time delays in terms of motion accuracy and force control. This proof-of-concept study suggests that visual-haptic MMT may facilitate remote robotic tele-US.

Authors:Sebastin Santy, Prasanta Bhattacharya, Manoel Horta Ribeiro, Kelsey Allen, Sewoong Oh
Title: When Incentives Backfire, Data Stops Being Human
Abstract:
Progress in AI has relied on human-generated data, from annotator marketplaces to the wider Internet. However, the widespread use of large language models now threatens the quality and integrity of human-generated data on these very platforms. We argue that this issue goes beyond the immediate challenge of filtering AI-generated content -- it reveals deeper flaws in how data collection systems are designed. Existing systems often prioritize speed, scale, and efficiency at the cost of intrinsic human motivation, leading to declining engagement and data quality. We propose that rethinking data collection systems to align with contributors' intrinsic motivations -- rather than relying solely on external incentives -- can help sustain high-quality data sourcing at scale while maintaining contributor trust and long-term participation.

Authors:Arjun Srinivasan, Vidya Setlur, Arvind Satyanarayan
Title: Pluto: Authoring Semantically Aligned Text and Charts for Data-Driven Communication
Abstract:
Textual content (including titles, annotations, and captions) plays a central role in helping readers understand a visualization by emphasizing, contextualizing, or summarizing the depicted data. Yet, existing visualization tools provide limited support for jointly authoring the two modalities of text and visuals such that both convey semantically-rich information and are cohesively integrated. In response, we introduce Pluto, a mixed-initiative authoring system that uses features of a chart's construction (e.g., visual encodings) as well as any textual descriptions a user may have drafted to make suggestions about the content and presentation of the two modalities. For instance, a user can begin to type out a description and interactively brush a region of interest in the chart, and Pluto will generate a relevant auto-completion of the sentence. Similarly, based on a written description, Pluto may suggest lifting a sentence out as an annotation or the visualization's title, or may suggest applying a data transformation (e.g., sort) to better align the two modalities. A preliminary user study revealed that Pluto's recommendations were particularly useful for bootstrapping the authoring process and helped identify different strategies participants adopt when jointly authoring text and charts. Based on study feedback, we discuss design implications for integrating interactive verification features between charts and text, offering control over text verbosity and tone, and enhancing the bidirectional flow in unified text and chart authoring tools.

Authors:Johnny Chan, Yuming Li
Title: Enhancing Higher Education with Generative AI: A Multimodal Approach for Personalised Learning
Abstract:
This research explores the opportunities of Generative AI (GenAI) in the realm of higher education through the design and development of a multimodal chatbot for an undergraduate course. Leveraging the ChatGPT API for nuanced text-based interactions and Google Bard for advanced image analysis and diagram-to-code conversions, we showcase the potential of GenAI in addressing a broad spectrum of educational queries. Additionally, the chatbot presents a file-based analyser designed for educators, offering deep insights into student feedback via sentiment and emotion analysis, and summarising course evaluations with key metrics. These combinations highlight the crucial role of multimodal conversational AI in enhancing teaching and learning processes, promising significant advancements in educational adaptability, engagement, and feedback analysis. By demonstrating a practical web application, this research underlines the imperative for integrating GenAI technologies to foster more dynamic and responsive educational environments, ultimately contributing to improved educational outcomes and pedagogical strategies.

Authors:Alice Williams, Boris Kovalerchuk
Title: Boosting of Classification Models with Human-in-the-Loop Computational Visual Knowledge Discovery
Abstract:
High-risk artificial intelligence and machine learning classification tasks, such as healthcare diagnosis, require accurate and interpretable prediction models. However, classifier algorithms typically sacrifice individual case-accuracy for overall model accuracy, limiting analysis of class overlap areas regardless of task significance. The Adaptive Boosting meta-algorithm, which won the 2003 Gödel Prize, analytically assigns higher weights to misclassified cases to reclassify. However, it relies on weaker base classifiers that are iteratively strengthened, limiting improvements from base classifiers. Combining visual and computational approaches enables selecting stronger base classifiers before boosting. This paper proposes moving boosting methodology from focusing on only misclassified cases to all cases in the class overlap areas using Computational and Interactive Visual Learning (CIVL) with a Human-in-the-Loop. It builds classifiers in lossless visualizations integrating human domain expertise and visual insights. A Divide and Classify process splits cases to simple and complex, classifying these individually through computational analysis and data visualization with lossless visualization spaces of Parallel Coordinates or other General Line Coordinates. After finding pure and overlap class areas simple cases in pure areas are classified, generating interpretable sub-models like decision rules in Propositional and First-order Logics. Only multidimensional cases in the overlap areas are losslessly visualized simplifying end-user cognitive tasks to identify difficult case patterns, including engineering features to form new classifiable patterns. Demonstration shows a perfectly accurate and losslessly interpretable model of the Iris dataset, and simulated data shows generalized benefits to accuracy and interpretability of models, increasing end-user confidence in discovered models.

Authors:Xuyu Yang, Wengxi Li, Matthew G. Lee, Zhuoyang Li, J. D. Zamfirescu-Pereira, Can Liu
Title: Rambler in the Wild: A Diary Study of LLM-Assisted Writing With Speech
Abstract:
Speech-to-text technologies have been shown to improve text input efficiency and potentially lower the barriers to writing. Recent LLM-assisted dictation tools aim to support writing with speech by bridging the gaps between speaking and traditional writing. This case study reports on the real-world writing experiences of twelve academic or creative writers using one such tool, Rambler, to write various pieces such as blog posts, diaries, screenplays, notes, or fictional stories, etc. Through a ten-day diary study, we identified the participants' in-context writing strategies using Rambler, such as how they expanded from an outline or organized their loose thoughts for different writing goals. The interviews uncovered the psychological and productivity affordances of writing with speech, pointing to future directions of designing for this writing modality and the utilization of AI support.

Authors:Ilhan Aslan, Carla F. Griggio, Henning Pohl, Timothy Merritt, Niels van Berkel
Title: Speejis: Enhancing User Experience of Mobile Voice Messaging with Automatic Visual Speech Emotion Cues
Abstract:
Mobile messaging apps offer an increasing range of emotional expressions, such as emojis to help users manually augment their texting experiences. Accessibility of such augmentations is limited in voice messaging. With the term "speejis" we refer to accessible emojis and other visual speech emotion cues that are created automatically from speech input alone. The paper presents an implementation of speejis and reports on a user study (N=12) comparing the UX of voice messaging with and without speejis. Results show significant differences in measures such as attractiveness and stimulation and a clear preference of all participants for messaging with speejis. We highlight the benefits of using paralinguistic speech processing and continuous emotion models to enable finer grained augmentations of emotion changes and transitions within a single message in addition to augmentations of the overall tone of the message.

Authors:Grace Li, Yuanyang Teng, Juna Kawai-Yue, Unaisah Ahmed, Anatta S. Tantiwongse, Jessica Y. Liang, Dorothy Zhang, Kynnedy Simone Smith, Tao Long, Mina Lee, Lydia B Chilton
Title: Audience Impressions of Narrative Structures and Personal Language Style in Science Communication on Social Media
Abstract:
Science communication increases public interest in science by educating, engaging, and encouraging everyday people to participate in the sciences. But traditional science communication is often too formal and inaccessible for general audiences. However, there is a growing trend on social media to make it more approachable using three techniques: relatable examples to make explanations concrete, step-by-step walkthroughs to improve understanding, and personal language to drive engagement. These techniques are flashy and often garner more engagement from social media users, but the effectiveness of these techniques in actually explaining the science is unknown. Furthermore, many scientists struggle with adopting these science communication strategies for social media, fearing it might undermine their authority. We conduct a reader study to understand how these science communication techniques on social media affect readers' understanding and engagement of the science. We found that while most readers prefer these techniques, they had diverse preferences for when and where these techniques are used. With these findings, we conducted a writer study to understand how scientists' varying comfort levels with these strategies can be supported by presenting different structure and style options. We found that the side-by-side comparison of options helped writers make editorial decisions. Instead of adhering to one direction of science communication, writers explored a continuum of options which helped them identify which communication strategies they wanted to implement.

Authors:Xingyu Lan, Yifan Wang, Lingyu Peng, Xiaofan Ma
Title: More Than Beautiful: Exploring Design Features, Practical Perspectives, and Implications of Artistic Data Visualization
Abstract:
Standing at the intersection of science and art, artistic data visualization has gained popularity in recent years and emerged as a significant domain. Despite more than a decade since the field's conceptualization, a noticeable gap remains in research concerning the design features of artistic data visualizations, the aesthetic goals they pursue, and their potential to inspire our community. To address these gaps, we analyzed 220 data artworks to understand their design paradigms and intents, and construct a design taxonomy to characterize their design techniques (e.g., sensation, interaction, narrative, physicality). We also conducted in-depth interviews with twelve data artists to explore their practical perspectives, such as their understanding of artistic data visualization and the challenges they encounter. In brief, we found that artistic data visualization is deeply rooted in art discourse, with its own distinctive characteristics in both inner pursuits and outer presentations. Based on our research, we outline seven prospective paths for future work.

Authors:Shivani Guptasarma, Allison M. Okamura, Monroe Kennedy
Title: Localization of Vibrotactile Stimuli on the Face
Abstract:
The face remains relatively unexplored as a target region for haptic feedback, despite providing a considerable surface area consisting of highly sensitive skin. There are promising applications for facial haptic feedback, especially in cases of severe upper limb loss or spinal cord injury, where the face is typically less impacted than other body parts. Moreover, the neural representation of the face is adjacent to that of the hand, and phantom maps have been discovered between the fingertips and the cheeks. However, there is a dearth of compact devices for facial haptic feedback, and vibrotactile stimulation, a common modality of haptic feedback, has not been characterized for localization acuity on the face. We performed a localization experiment on the cheek, with an arrangement of off-the-shelf coin vibration motors. The study follows the methods of prior work studying other skin regions, in which participants attempt to identify the sites of discrete vibrotactile stimuli. We intend for our results to inform the future development of systems using vibrotactile feedback to convey information via the face.

Authors:Zitong Shen, Sineng Yan, Youqian Zhang, Xiapu Luo, Grace Ngai, Eugene Yujun Fu
Title: "It Warned Me Just at the Right Moment": Exploring LLM-based Real-time Detection of Phone Scams
Abstract:
Despite living in the era of the internet, phone-based scams remain one of the most prevalent forms of scams. These scams aim to exploit victims for financial gain, causing both monetary losses and psychological distress. While governments, industries, and academia have actively introduced various countermeasures, scammers also continue to evolve their tactics, making phone scams a persistent threat. To combat these increasingly sophisticated scams, detection technologies must also advance. In this work, we propose a framework for modeling scam calls and introduce an LLM-based real-time detection approach, which assesses fraudulent intent in conversations, further providing immediate warnings to users to mitigate harm. Through experiments, we evaluate the method's performance and analyze key factors influencing its effectiveness. This analysis enables us to refine the method to improve precision while exploring the trade-off between recall and timeliness, paving the way for future directions in this critical area of research.

Authors:ShunYi Yeo, Zhuoqun Jiang, Anthony Tang, Simon Tangi Perrault
Title: Enhancing Deliberativeness: Evaluating the Impact of Multimodal Reflection Nudges
Abstract:
Nudging participants with text-based reflective nudges enhances deliberation quality on online deliberation platforms. The effectiveness of multimodal reflective nudges, however, remains largely unexplored. Given the multi-sensory nature of human perception, incorporating diverse modalities into self-reflection mechanisms has the potential to better support various reflective styles. This paper explores how presenting reflective nudges of different types (direct: persona and indirect: storytelling) in different modalities (text, image, video and audio) affects deliberation quality. We conducted two user studies with 20 and 200 participants respectively. The first study identifies the preferred modality for each type of reflective nudges, revealing that text is most preferred for persona and video is most preferred for storytelling. The second study assesses the impact of these modalities on deliberation quality. Our findings reveal distinct effects associated with each modality, providing valuable insights for developing more inclusive and effective online deliberation platforms.

Authors:Yusuke Miura, Chi-Lan Yang, Masaki Kuribayashi, Keigo Matsumoto, Hideaki Kuzuoka, Shigeo Morishima
Title: Understanding and Supporting Formal Email Exchange by Answering AI-Generated Questions
Abstract:
Replying to formal emails is time-consuming and cognitively demanding, as it requires crafting polite phrasing and providing an adequate response to the sender's demands. Although systems with Large Language Models (LLMs) were designed to simplify the email replying process, users still need to provide detailed prompts to obtain the expected output. Therefore, we proposed and evaluated an LLM-powered question-and-answer (QA)-based approach for users to reply to emails by answering a set of simple and short questions generated from the incoming email. We developed a prototype system, ResQ, and conducted controlled and field experiments with 12 and 8 participants. Our results demonstrated that the QA-based approach improves the efficiency of replying to emails and reduces workload while maintaining email quality, compared to a conventional prompt-based approach that requires users to craft appropriate prompts to obtain email drafts. We discuss how the QA-based approach influences the email reply process and interpersonal relationship dynamics, as well as the opportunities and challenges associated with using a QA-based approach in AI-mediated communication.

Authors:Zijian Ding, Qinshi Zhang, Mohan Chi, Ziyi Wang
Title: Frontend Diffusion: Empowering Self-Representation of Junior Researchers and Designers Through Multi-agent System
Abstract:
With the continuous development of generative AI's logical reasoning abilities, AI's growing code-generation potential poses challenges for both technical and creative professionals. But how can these advances be directed toward empowering junior researchers and designers who often require additional help to build and express their professional and personal identities? We introduce Frontend Diffusion, a multi-agent coding system transforming user-drawn layouts and textual prompts into refined website code, thereby supporting self-representation goals. A user study with 13 junior researchers and designers shows AI as a human capability enhancer rather than a replacement, and highlights the importance of bidirectional human-AI alignment. We then discuss future work such as leveraging AI for career development and fostering bidirectional human-AI alignment of multi-agent systems.

Authors:Roshini Deva, Manvi S, Jasmine Zhou, Elizabeth Britton Chahine, Agena Davenport-Nicholson, Nadi Nina Kaonga, Selen Bozkurt, Azra Ismail
Title: A Mixed-Methods Evaluation of LLM-Based Chatbots for Menopause
Abstract:
The integration of Large Language Models (LLMs) into healthcare settings has gained significant attention, particularly for question-answering tasks. Given the high-stakes nature of healthcare, it is essential to ensure that LLM-generated content is accurate and reliable to prevent adverse outcomes. However, the development of robust evaluation metrics and methodologies remains a matter of much debate. We examine the performance of publicly available LLM-based chatbots for menopause-related queries, using a mixed-methods approach to evaluate safety, consensus, objectivity, reproducibility, and explainability. Our findings highlight the promise and limitations of traditional evaluation metrics for sensitive health topics. We propose the need for customized and ethically grounded evaluation frameworks to assess LLMs to advance safe and effective use in healthcare.

Authors:Yancheng Cao, Yangyang HE, Yonglin Chen, Menghan Chen, Shanhe You, Yulin Qiu, Min Liu, Chuan Luo, Chen Zheng, Xin Tong, Jing Liang, Jiangtao Gong
Title: Designing LLM-simulated Immersive Spaces to Enhance Autistic Children's Social Affordances Understanding
Abstract:
One of the key challenges faced by autistic children is understanding social affordances in complex environments, which further impacts their ability to respond appropriately to social signals. In traffic scenarios, this impairment can even lead to safety concerns. In this paper, we introduce an LLM-simulated immersive projection environment designed to improve this ability in autistic children while ensuring their safety. We first propose 17 design considerations across four major categories, derived from a comprehensive review of previous research. Next, we developed a system called AIroad, which leverages LLMs to simulate drivers with varying social intents, expressed through explicit multimodal social signals. AIroad helps autistic children bridge the gap in recognizing the intentions behind behaviors and learning appropriate responses through various stimuli. A user study involving 14 participants demonstrated that this technology effectively engages autistic children and leads to significant improvements in their comprehension of social affordances in traffic scenarios. Additionally, parents reported high perceived usability of the system. These findings highlight the potential of combining LLM technology with immersive environments for the functional rehabilitation of autistic children in the future.

Authors:Brandon Woodard, Margarita Geleta, Joseph J. LaViola, Andrea Fanelli, Rhonda Wilson
Title: AudioMiXR: Spatial Audio Object Manipulation with 6DoF for Sound Design in Augmented Reality
Abstract:
We present AudioMiXR, an augmented reality (AR) interface intended to assess how users manipulate virtual audio objects situated in their physical space using six degrees of freedom (6DoF) deployed on a head-mounted display (Apple Vision Pro) for 3D sound design. Existing tools for 3D sound design are typically constrained to desktop displays, which may limit spatial awareness of mixing within the execution environment. Utilizing an XR HMD to create soundscapes may provide a real-time test environment for 3D sound design, as modern HMDs can provide precise spatial localization assisted by cross-modal interactions. However, there is no research on design guidelines specific to sound design with 6DoF in XR. To provide a first step toward identifying design-related research directions in this space, we conducted an exploratory study where we recruited 27 participants, consisting of expert and non-expert sound designers. The goal was to assess design lessons that can be used to inform future research venues in 3D sound design. We ran a within-subjects study where users designed both a music and cinematic soundscapes. After thematically analyzing participant data, we constructed two design lessons: (1) Proprioception for AR Sound Design, and (2) Balancing Audio-Visual Modalities in AR GUIs. Additionally, we provide application domains that can benefit most from 6DoF sound design based on our results. To expand on these insights, we conducted a second within-subjects study comparing AudioMiXR to a 2D panner baseline. Results show that AudioMiXR significantly improved usability (SUS), reduced frustration and mental workload (NASA-TLX), and enhanced creativity across all subscales. These findings demonstrate that 6DoF AR interaction yields measurable gains in user experience and creative output, positioning AudioMiXR as a promising foundation for future AR-based sound design tools.

Authors:Benjamin Lira, Todd Rogers, Daniel G. Goldstein, Lyle Ungar, Angela L. Duckworth
Title: Learning not cheating: AI assistance can enhance rather than hinder skill development
Abstract:
It is widely believed that outsourcing cognitive work to AI boosts immediate productivity at the expense of long-term human capital development. An overlooked possibility is that AI tools can support skill development by providing just-in-time, high-quality, personalized examples. In this investigation, lay forecasters predicted that practicing writing cover letters with an AI tool would impair learning compared to practicing writing letters without the tool. However, in a highly-powered pre-registered experiment, participants randomly assigned to practice writing with AI improved more on a writing test one day later compared to writers assigned to practice without AI. Notably, writers given access to the AI tool improved more despite exerting less effort, whether measured by time on task, keystrokes, or subjective ratings. We replicated and extended these results in a second pre-registered experiment, showing that writers given access to the AI tool again outperformed those who practiced on their own -- but performed no better than writers merely shown an AI-generated cover letter that they could not edit. Collectively, these findings constitute an existence proof that by providing personalized examples of high-quality work, AI tools can improve, rather than undermine, learning.

Authors:Ravi Tejwani, Karl Velazquez, John Payne, Paolo Bonato, Harry Asada
Title: Cross-modality Force and Language Embeddings for Natural Human-Robot Communication
Abstract:
A method for cross-modality embedding of force profile and words is presented for synergistic coordination of verbal and haptic communication. When two people carry a large, heavy object together, they coordinate through verbal communication about the intended movements and physical forces applied to the object. This natural integration of verbal and physical cues enables effective coordination. Similarly, human-robot interaction could achieve this level of coordination by integrating verbal and haptic communication modalities. This paper presents a framework for embedding words and force profiles in a unified manner, so that the two communication modalities can be integrated and coordinated in a way that is effective and synergistic. Here, it will be shown that, although language and physical force profiles are deemed completely different, the two can be embedded in a unified latent space and proximity between the two can be quantified. In this latent space, a force profile and words can a) supplement each other, b) integrate the individual effects, and c) substitute in an exchangeable manner. First, the need for cross-modality embedding is addressed, and the basic architecture and key building block technologies are presented. Methods for data collection and implementation challenges will be addressed, followed by experimental results and discussions.

Authors:Nick Le Large, David Brecht, Willi Poh, Jan-Hendrik Pauls, Martin Lauer, Frank Diermeyer
Title: Human-Aided Trajectory Planning for Automated Vehicles through Teleoperation and Arbitration Graphs
Abstract:
Teleoperation enables remote human support of automated vehicles in scenarios where the automation is not able to find an appropriate solution. Remote assistance concepts, where operators provide discrete inputs to aid specific automation modules like planning, is gaining interest due to its reduced workload on the human remote operator and improved safety. However, these concepts are challenging to implement and maintain due to their deep integration and interaction with the automated driving system. In this paper, we propose a solution to facilitate the implementation of remote assistance concepts that intervene on planning level and extend the operational design domain of the vehicle at runtime. Using arbitration graphs, a modular decision-making framework, we integrate remote assistance into an existing automated driving system without modifying the original software components. Our simulative implementation demonstrates this approach in two use cases, allowing operators to adjust planner constraints and enable trajectory generation beyond nominal operational design domains.

Authors:Anna Leschanowsky, Farnaz Salamatjoo, Zahra Kolagar, Birgit Popp
Title: Expert-Generated Privacy Q&A Dataset for Conversational AI and User Study Insights
Abstract:
Conversational assistants process personal data and must comply with data protection regulations that require providers to be transparent with users about how their data is handled. Transparency, in a legal sense, demands preciseness, comprehensibility and accessibility, yet existing solutions fail to meet these requirements. To address this, we introduce a new human-expert-generated dataset for Privacy Question-Answering (Q&A), developed through an iterative process involving legal professionals and conversational designers. We evaluate this dataset through linguistic analysis and a user study, comparing it to privacy policy excerpts and state-of-the-art responses from Amazon Alexa. Our findings show that the proposed answers improve usability and clarity compared to existing solutions while achieving legal preciseness, thereby enhancing the accessibility of data processing information for Conversational AI and Natural Language Processing applications.

Authors:Chungman Lim, Gyeongdeok Kim, Su-Yeon Kang, Hasti Seifi, Gunhyuk Park
Title: Can a Machine Feel Vibrations?: A Framework for Vibrotactile Sensation and Emotion Prediction via a Neural Network
Abstract:
Vibrotactile signals offer new possibilities for conveying sensations and emotions in various applications. Yet, designing vibrotactile tactile icons (i.e., Tactons) to evoke specific feelings often requires a trial-and-error process and user studies. To support haptic design, we propose a framework for predicting sensory and emotional ratings from vibration signals. We created 154 Tactons and conducted a study to collect acceleration data from smartphones and roughness, valence, and arousal user ratings (n=36). We converted the Tacton signals into two-channel spectrograms reflecting the spectral sensitivities of mechanoreceptors, then input them into VibNet, our dual-stream neural network. The first stream captures sequential features using recurrent networks, while the second captures temporal-spectral features using 2D convolutional networks. VibNet outperformed baseline models, with 82% of its predictions falling within the standard deviations of ground truth user ratings for two new Tacton sets. We discuss the efficacy of our mechanoreceptive processing and dual-stream neural network and present future research directions.

Authors:Eirini Schoinas, Adyah Rastogi, Anissa Carter, Jacob Granley, Michael Beyeler
Title: Evaluating Deep Human-in-the-Loop Optimization for Retinal Implants Using Sighted Participants
Abstract:
Human-in-the-loop optimization (HILO) is a promising approach for personalizing visual prostheses by iteratively refining stimulus parameters based on user feedback. Previous work demonstrated HILO's efficacy in simulation, but its performance with human participants remains untested. Here we evaluate HILO using sighted participants viewing simulated prosthetic vision to assess its ability to optimize stimulation strategies under realistic conditions. Participants selected between phosphenes generated by competing encoders to iteratively refine a deep stimulus encoder (DSE). We tested HILO in three conditions: standard optimization, threshold misspecifications, and out-of-distribution parameter sampling. Participants consistently preferred HILO-generated stimuli over both a naive encoder and the DSE alone, with log odds favoring HILO across all conditions. We also observed key differences between human and simulated decision-making, highlighting the importance of validating optimization strategies with human participants. These findings support HILO as a viable approach for adapting visual prostheses to individuals. Clinical relevance: Validating HILO with sighted participants viewing simulated prosthetic vision is an important step toward personalized calibration of future visual prostheses.

Authors:Yoonha Cha, Victoria Jackson, Karina Kohl, Rafael Prikladnicki, André van der Hoek, Stacy M. Branham
Title: The Dilemma of Building Do-It-Yourself (DIY) Solutions for Workplace Accessibility
Abstract:
Existing commercial and in-house software development tools are often inaccessible to Blind and Low Vision Software Professionals (BLVSPs), hindering their participation and career growth at work. Building on existing research on Do-It-Yourself (DIY) Assistive Technologies and customized tools made by programmers, we shed light on the currently unexplored intersection of how DIY tools built and used by BLVSPs support accessible software development. Through semi-structured interviews with 30 BLVSPs, we found that such tools serve many different purposes and are driven by motivations such as desiring to maintain a professional image and a sense of dignity at work. These tools had significant impacts on workplace accessibility and revealed a need for a more centralized community for sharing tools, tips, and tricks. Based on our findings, we introduce the "Double Hacker Dilemma" and highlight a need for developing more effective peer and organizational platforms that support DIY tool sharing.

Authors:Min Hun Lee, Daniel P. Siewiorek, Alexandre Bernardino
Title: Investigating an Intelligent System to Monitor \& Explain Abnormal Activity Patterns of Older Adults
Abstract:
Despite the growing potential of older adult care technologies, the adoption of these technologies remains challenging. In this work, we conducted a focus-group session with family caregivers to scope designs of the older adult care technology. We then developed a high-fidelity prototype and conducted its qualitative study with professional caregivers and older adults to understand their perspectives on the system functionalities. This system monitors abnormal activity patterns of older adults using wireless motion sensors and machine learning models and supports interactive dialogue responses to explain abnormal activity patterns of older adults to caregivers and allow older adults proactively sharing their status with caregivers for an adequate intervention. Both older adults and professional caregivers appreciated that our system can provide a faster, personalized service while proactively controlling what information is to be shared through interactive dialogue responses. We further discuss other considerations to realize older adult technology in practice.

Authors:Jocelyn Shen, Jennifer King Chen, Leah Findlater, Griffin Dietz Smith
Title: eaSEL: Promoting Social-Emotional Learning and Parent-Child Interaction through AI-Mediated Content Consumption
Abstract:
As children increasingly consume media on devices, parents look for ways this usage can support learning and growth, especially in domains like social-emotional learning. We introduce eaSEL, a system that (a) integrates social-emotional learning (SEL) curricula into children's video consumption by generating reflection activities and (b) facilitates parent-child discussions around digital media without requiring co-consumption of videos. We present a technical evaluation of our system's ability to detect social-emotional moments within a transcript and to generate high-quality SEL-based activities for both children and parents. Through a user study with N=20 parent-child dyads, we find that after completing an eaSEL activity, children reflect more on the emotional content of videos. Furthermore, parents find that the tool promotes meaningful active engagement and could scaffold deeper conversations around content. Our work paves directions in how AI can support children's social-emotional reflection of media and family connections in the digital age.

Authors:Christian Eichenmüller, Lisa Kuhn, Zinaida Benenson
Title: "My Whereabouts, my Location, it's Directly Linked to my Physical Security": An Exploratory Qualitative Study of Location-Dependent Security and Privacy Perceptions among Activist Tech Users
Abstract:
Digital-safety research with at-risk users is particularly urgent. At-risk users are more likely to be digitally attacked or targeted by surveillance and could be disproportionately harmed by attacks that facilitate physical assaults. One group of such at-risk users are activists and politically active individuals. For them, as for other at-risk users, the rise of smart environments harbors new risks. Since digitization and datafication are no longer limited to a series of personal devices that can be switched on and off, but increasingly and continuously surround users, granular geolocation poses new safety challenges. Drawing on eight exploratory qualitative interviews of an ongoing research project, this contribution highlights what activists with powerful adversaries think about evermore data traces, including location data, and how they intend to deal with emerging risks. Responses of activists include attempts to control one's immediate technological surroundings and to more carefully manage device-related location data. For some activists, threat modeling has also shaped provider choices based on geopolitical considerations. Since many activists have not enough digital-safety knowledge for effective protection, feelings of insecurity and paranoia are widespread. Channeling the concerns and fears of our interlocutors, we call for more research on how activists can protect themselves against evermore fine-grained location data tracking.

Authors:Ivan Kayongo, Leonardo Malcotti, Haonan Zhao, Fausto Giunchiglia
Title: A methodology and a platform for high-quality rich personal data
Abstract:
In the last years the pervasive use of sensors, as they exist in smart devices, e.g., phones, watches, medical devices, has increased dramatically the availability of personal data. However, existing research on data collection primarily focuses on the objective view of reality, as provided, for instance, by sensors, often neglecting the integration of subjective human input, as provided, for instance, by user answers to questionnaires. This limits substantially the exploitability of the collected data. In this paper we present a methodology and a platform specifically designed for the collection of a combination of large-scale sensor data and qualitative human feedback. The methodology has been designed to be deployed on top, and enriches the functionalities of, an existing data collection APP, called iLog, which has been used in large scale, worldwide data collection experiments. The main goal is to put the key actors involved in an experiment, i.e., the researcher in charge, the participant, and iLog in better control of the experiment itself, thus enabling a much improved quality and richness of the data collected. The novel functionalities of the resulting platform are: (i) a time-wise representation of the situational context within which the data collection is performed, (ii) an explicit representation of the temporal context within which the data collection is performed, (iii) a calendar-based dashboard for the real-time monitoring of the data collection context(s), and, finally, (iv) a mechanism for the run-time revision of the data collection plan. The practicality and utility of the proposed functionalities are demonstrated by showing how they apply to a case study involving 350 University students.

Authors:Peng-Kai Hung, Janet Yi-Ching Huang, Rung-Huei Liang, Stephan Wensveen
Title: Generative AI as a Playful yet Offensive Tourist: Exploring Tensions Between Playful Features and Citizen Concerns in Designing Urban Play
Abstract:
Play is pivotal in fostering the emotional, social, and cultural dimensions of urban spaces. While generative AI (GAI) potentially supports playful urban interaction, a balanced and critical approach to the design opportunities and challenges is needed. This work develops iWonder, an image-to-image GAI tool engaging fourteen designers in urban explorations to identify GAI's playful features and create design ideas. Fourteen citizens then evaluated these ideas, providing expectations and critical concerns from a bottom-up perspective. Our findings reveal the dynamic interplay between users, GAI, and urban contexts, highlighting GAI's potential to facilitate playful urban experiences through generative agency, meaningful unpredictability, social performativity, and the associated offensive qualities. We propose design considerations to address citizen concerns and the `tourist metaphor' to deepen our understanding of GAI's impact, offering insights to enhance cities' socio-cultural fabric. Overall, this research contributes to the effort to harness GAI's capabilities for urban enrichment.

Authors:Ashwin Ram, Yue Gu, Bowen Wang, Sneha Jaikumar, Youqi Wu, Benjamin Tan Kuan Wei, Qingyang Xu, Haiming Liu, Shengdong Zhao
Title: SimulataR: Rapid Assisted Reality Prototyping using Design-Blended Videos
Abstract:
Assisted Reality (aR) is a subfield of Augmented Reality (AR) that overlays information onto a user's immediate view via see-through head-mounted displays (OST-HMDs). This technology has proven to be effective and energy-efficient to support the user and information interaction for everyday wearable intelligent systems. The aR viewing experience, however, is affected by varying real-world backgrounds, lighting, and user movements, which makes designing for aR challenging. Designers have to test their designs in-situ across multiple real-world settings, which can be time-consuming and labor-intensive. We propose SimulataR, a cost-effective desktop-based approach for rapid aR prototyping using first-person-view context videos blended with design prototypes to simulate an aR experience. A field study involving 12 AR users comparing SimulataR to real OST-HMDs found that SimulataR can approximate the aR experience, particularly for indoors and in low-to-moderate lit outdoor environments. Case studies with two designers who used SimulataR in their design process demonstrates the potential of design-blended videos for rapid aR prototyping.

Authors:Maha Sajid, Syed Ibrahim Mustafa Shah Bukhari, Bo Ji, Brendan David-John
Title: Just stop doing everything for now!: Understanding security attacks in remote collaborative mixed reality
Abstract:
Mixed Reality (MR) devices are being increasingly adopted across a wide range of real-world applications, ranging from education and healthcare to remote work and entertainment. However, the unique immersive features of MR devices, such as 3D spatial interactions and the encapsulation of virtual objects by invisible elements, introduce new vulnerabilities leading to interaction obstruction and misdirection. We implemented latency, click redirection, object occlusion, and spatial occlusion attacks within a remote collaborative MR platform using the Microsoft HoloLens 2 and evaluated user behavior and mitigations through a user study. We compared responses to MR-specific attacks, which exploit the unique characteristics of remote collaborative immersive environments, and traditional security attacks implemented in MR. Our findings indicate that users generally exhibit lower recognition rates for immersive attacks (e.g., spatial occlusion) compared to attacks inspired by traditional ones (e.g., click redirection). Our results demonstrate a clear gap in user awareness and responses when collaborating remotely in MR environments. Our findings emphasize the importance of training users to recognize potential threats and enhanced security measures to maintain trust in remote collaborative MR systems.

Authors:Maria Luce Lupetti, Elena Cavallin, Dave Murray-Rust
Title: The Unbearable Lightness of Prompting: A Critical Reflection on the Environmental Impact of genAI use in Design Education
Abstract:
Design educators are finding ways to support students in skillfully using GenAI tools in their practices while encouraging the critical scrutiny of the ethical and social issues around these technologies. However, the issue of environmental sustainability remains unaddressed. There is a lack of both resources to grasp the environmental costs of genAI in education and a lack of shared practices for engaging with the issue. This paper critically reflects on the energy costs of using genAI in design education, using a workshop held in 2023 with 49 students as a motivating example. Through this reflection, we develop a set of five alternative stances, with related actions, that support the conscious use of genAI in design education. The work contributes to the field of design and HCI by bringing together ways for educators to reflect on their practices, informing the future development of educational programs around genAI.

Authors:Leping Qiu, Erin Seongyoon Kim, Sangho Suh, Ludwig Sidenmark, Tovi Grossman
Title: MaRginalia: Enabling In-person Lecture Capturing and Note-taking Through Mixed Reality
Abstract:
Students often take digital notes during live lectures, but current methods can be slow when capturing information from lecture slides or the instructor's speech, and require them to focus on their devices, leading to distractions and missing important details. This paper explores supporting live lecture note-taking with mixed reality (MR) to quickly capture lecture information and take notes while staying engaged with the lecture. A survey and interviews with university students revealed common note-taking behaviors and challenges to inform the design. We present MaRginalia to provide digital note-taking with a stylus tablet and MR headset. Students can take notes with an MR representation of the tablet, lecture slides, and audio transcript without looking down at their device. When preferred, students can also perform detailed interactions by looking at the physical tablet. We demonstrate the feasibility and usefulness of MaRginalia and MR-based note-taking in a user study with 12 students.

Authors:Aku Visuri, Heli Koskimäki, Niels van Berkel, Andy Alorwu, Ella Peltonen, Saeed Abdullah, Simo Hosio
Title: Cognitive Performance Measurements and the Impact of Sleep Quality Using Wearable and Mobile Sensors
Abstract:
Human cognitive performance is an underlying factor in most of our daily lives, and numerous factors influence cognitive performance. In this work, we investigate how changes in sleep quality influence cognitive performance, measured from a dataset collected during a 2-month field study. We collected cognitive performance data (alertness) with the Psychomotor Vigilance Task (PVT), mobile keyboard typing metrics from participants' smartphones, and sleep quality metrics through a wearable sleep tracking ring. Our findings highlight that specific sleep metrics like night-time heart rate, sleep latency, sleep timing, sleep restfulness, and overall sleep quantity significantly influence cognitive performance. To strengthen the current research on cognitive measurements, we introduce smartphone typing metrics as a proxy or a complementary method for continuous passive measurement of cognitive performance. Together, our findings contribute to ubiquitous computing via a longitudinal case study with a novel wearable device, the resulting findings on the association between sleep and cognitive function, and the introduction of smartphone keyboard typing as a proxy of cognitive function.

Authors:Ming Xuan Chua, Shuhua Peng, Thanh Nho Do, Chun Hui Wang, Liao Wu
Title: A Wearable Strain-Sensor-Based Shoulder Patch for Fatigue Detection in Bicep Curls
Abstract:
A common challenge in home-based rehabilitation is muscle compensation induced by pain or fatigue, where patients with weakened primary muscles recruit secondary muscle groups to assist their movement, causing issues such as delayed rehabilitation progress or risk of further injury. In a home-based setting, the subtle compensatory actions may not be perceived since physiotherapists cannot directly observe patients. To address this problem, this study develops a novel wearable strain sensor-based shoulder patch to detect fatigue-induced muscle compensation during bicep curl exercises. Built on an observation that the amplitude of a strain sensor's resistance is correlated to the motion of a joint that the sensor is attached to, we develop an algorithm that can robustly detect the state when significant changes appear in the shoulder joint motion, which indicates fatigue-induced muscle compensation in bicep curls. The developed shoulder patch is tested on 13 subjects who perform bicep curl exercises with a 5 kg dumbbell until reaching fatigue. During the experiment, the performance of the shoulder patch is also benchmarked with optical tracking sensors and surface electromyography (sEMG) sensors. Results reveal that the proposed wearable sensor and detection methods effectively monitor fatigue-induced muscle compensation during bicep curl exercises in both Real-Time and Post Hoc modes. This development marks a significant step toward enhancing the effectiveness of home-based rehabilitation by providing physiotherapists with a tool to monitor and adjust treatment plans remotely.

Authors:Mika Setälä, Ville Heilala, Pieta Sikström, Tommi Kärkkäinen
Title: The Use of Generative Artificial Intelligence for Upper Secondary Mathematics Education Through the Lens of Technology Acceptance
Abstract:
This study investigated the students' perceptions of using Generative Artificial Intelligence (GenAI) in upper-secondary mathematics education. Data was collected from Finnish high school students to represent how key constructs of the Technology Acceptance Model (Perceived Usefulness, Perceived Ease of Use, Perceived Enjoyment, and Intention to Use) influence the adoption of AI tools. First, a structural equation model for a comparative study with a prior study was constructed and analyzed. Then, an extended model with the additional construct of Compatibility, which represents the alignment of AI tools with students' educational experiences and needs, was proposed and analyzed. The results demonstrated a strong influence of perceived usefulness on the intention to use GenAI, emphasizing the statistically significant role of perceived enjoyment in determining perceived usefulness and ease of use. The inclusion of compatibility improved the model's explanatory power, particularly in predicting perceived usefulness. This study contributes to a deeper understanding of how AI tools can be integrated into mathematics education and highlights key differences between the Finnish educational context and previous studies based on structural equation modeling.

Authors:Ipek Baris Schlicht, Zhixue Zhao, Burcu Sayin, Lucie Flek, Paolo Rosso
Title: Do LLMs Provide Consistent Answers to Health-Related Questions across Languages?
Abstract:
Equitable access to reliable health information is vital for public health, but the quality of online health resources varies by language, raising concerns about inconsistencies in Large Language Models (LLMs) for healthcare. In this study, we examine the consistency of responses provided by LLMs to health-related questions across English, German, Turkish, and Chinese. We largely expand the HealthFC dataset by categorizing health-related questions by disease type and broadening its multilingual scope with Turkish and Chinese translations. We reveal significant inconsistencies in responses that could spread healthcare misinformation. Our main contributions are 1) a multilingual health-related inquiry dataset with meta-information on disease categories, and 2) a novel prompt-based evaluation workflow that enables sub-dimensional comparisons between two languages through parsing. Our findings highlight key challenges in deploying LLM-based tools in multilingual contexts and emphasize the need for improved cross-lingual alignment to ensure accurate and equitable healthcare information.

Authors:Yongquan 'Owen' Hu, Jingyu Tang, Xinya Gong, Zhongyi Zhou, Shuning Zhang, Don Samitha Elvitigala, Florian 'Floyd' Mueller, Wen Hu, Aaron J. Quigley
Title: Vision-Based Multimodal Interfaces: A Survey and Taxonomy for Enhanced Context-Aware System Design
Abstract:
The recent surge in artificial intelligence, particularly in multimodal processing technology, has advanced human-computer interaction, by altering how intelligent systems perceive, understand, and respond to contextual information (i.e., context awareness). Despite such advancements, there is a significant gap in comprehensive reviews examining these advances, especially from a multimodal data perspective, which is crucial for refining system design. This paper addresses a key aspect of this gap by conducting a systematic survey of data modality-driven Vision-based Multimodal Interfaces (VMIs). VMIs are essential for integrating multimodal data, enabling more precise interpretation of user intentions and complex interactions across physical and digital environments. Unlike previous task- or scenario-driven surveys, this study highlights the critical role of the visual modality in processing contextual information and facilitating multimodal interaction. Adopting a design framework moving from the whole to the details and back, it classifies VMIs across dimensions, providing insights for developing effective, context-aware systems.

Authors:Anoop Mishra, Deepak Khazanchi
Title: Perceived Fairness of the Machine Learning Development Process: Concept Scale Development
Abstract:
In machine learning (ML) applications, unfairness is triggered due to bias in the data, the data curation process, erroneous assumptions, and implicit bias rendered during the development process. It is also well-accepted by researchers that fairness in ML application development is highly subjective, with a lack of clarity of what it means from an ML development and implementation perspective. Thus, in this research, we investigate and formalize the notion of the perceived fairness of ML development from a sociotechnical lens. Our goal in this research is to understand the characteristics of perceived fairness in ML applications. We address this research goal using a three-pronged strategy: 1) conducting virtual focus groups with ML developers, 2) reviewing existing literature on fairness in ML, and 3) incorporating aspects of justice theory relating to procedural and distributive justice. Based on our theoretical exposition, we propose operational attributes of perceived fairness to be transparency, accountability, and representativeness. These are described in terms of multiple concepts that comprise each dimension of perceived fairness. We use this operationalization to empirically validate the notion of perceived fairness of machine learning (ML) applications from both the ML practioners and users perspectives. The multidimensional framework for perceived fairness offers a comprehensive understanding of perceived fairness, which can guide the creation of fair ML systems with positive implications for society and businesses.

Authors:Shikhar Kumar, Yael Edan
Title: Improving robot understanding using conversational AI: demonstration and feasibility study
Abstract:
Explanations constitute an important aspect of successful human robot interactions and can enhance robot understanding. To improve the understanding of the robot, we have developed four levels of explanation (LOE) based on two questions: what needs to be explained, and why the robot has made a particular decision. The understandable robot requires a communicative action when there is disparity between the human s mental model of the robot and the robots state of mind. This communicative action was generated by utilizing a conversational AI platform to generate explanations. An adaptive dialog was implemented for transition from one LOE to another. Here, we demonstrate the adaptive dialog in a collaborative task with errors and provide results of a feasibility study with users.

Authors:Aayush Kumar, Daniel Prol, Amin Alipour, Sruti Srinivasa Ragavan
Title: To Google or To ChatGPT? A Comparison of CS2 Students' Information Gathering Approaches and Outcomes
Abstract:
LLMs such as ChatGPT have been widely adopted by students in higher education as tools for learning programming and related concepts. However, it remains unclear how effective students are and what strategies students use while learning with LLMs. Since the majority of students' experiences in online self-learning have come through using search engines such as Google, evaluating AI tools in this context can help us address these gaps. In this mixed methods research, we conducted an exploratory within-subjects study to understand how CS2 students learn programming concepts using both LLMs as well as traditional online methods such as educational websites and videos to examine how students approach learning within and across both scenarios. We discovered that students found it easier to learn a more difficult concept using traditional methods than using ChatGPT. We also found that students ask fewer follow-ups and use more keyword-based queries for search engines while their prompts to LLMs tend to explicitly ask for information.

Authors:Ting-Han Lin, Hannah Dinner, Tsz Long Leung, Bilge Mutlu, J. Gregory Trafton, Sarah Sebo
Title: Connection-Coordination Rapport (CCR) Scale: A Dual-Factor Scale to Measure Human-Robot Rapport
Abstract:
Robots, particularly in service and companionship roles, must develop positive relationships with people they interact with regularly to be successful. These positive human-robot relationships can be characterized as establishing "rapport," which indicates mutual understanding and interpersonal connection that form the groundwork for successful long-term human-robot interaction. However, the human-robot interaction research literature lacks scale instruments to assess human-robot rapport in a variety of situations. In this work, we developed the 18-item Connection-Coordination Rapport (CCR) Scale to measure human-robot rapport. We first ran Study 1 (N = 288) where online participants rated videos of human-robot interactions using a set of candidate items. Our Study 1 results showed the discovery of two factors in our scale, which we named "Connection" and "Coordination." We then evaluated this scale by running Study 2 (N = 201) where online participants rated a new set of human-robot interaction videos with our scale and an existing rapport scale from virtual agents research for comparison. We also validated our scale by replicating a prior in-person human-robot interaction study, Study 3 (N = 44), and found that rapport is rated significantly greater when participants interacted with a responsive robot (responsive condition) as opposed to an unresponsive robot (unresponsive condition). Results from these studies demonstrate high reliability and validity for the CCR scale, which can be used to measure rapport in both first-person and third-person perspectives. We encourage the adoption of this scale in future studies to measure rapport in a variety of human-robot interactions.

Authors:Leonardo Pavanatto, Jens Grubert, Doug Bowman
Title: Spatial Bar: Exploring Window Switching Techniques for Large Virtual Displays
Abstract:
Virtual displays provided through head-worn displays (HWDs) offer users large screen space for productivity, but managing this space effectively presents challenges. This paper explores how to enhance window-switching strategies for virtual displays by leveraging eye tracking provided by HWDs and underutilized spaces around the main display area. We investigate the efficiency and usability of different cursor behaviors and selection modes in a Spatial Bar interface for window-switching tasks in augmented reality environments. Results show gaze coupled with teleport led to the quickest window-switching times, particularly in tasks where the original cursor position or the target window was far from the Spatial Bar.

Authors:Dimitra Dritsa, Steven Houben
Title: The Data-Expectation Gap: A Vocabulary Describing Experiential Qualities of Data Inaccuracies in Smartwatches
Abstract:
Many users of wrist-worn wearable fitness trackers encounter the data-expectation gap - mismatches between data and expectations. While we know such discrepancies exist, we are no closer to designing technologies that can address their negative effects. This is largely because encounters with mismatches are typically treated unidimensionally, while they may differ in context and implications. This treatment does not allow the design of human-data interaction (HDI) mechanisms accounting for temporal, social, emotional, and other factors potentially influencing the perception of mismatches. To address this problem, we present a vocabulary that describes the breadth and context-bound character of encounters with the data-expectation gap, drawing from findings from two studies. Our work contributes to Personal Informatics research providing knowledge on how encounters with the data-expectation gap are embedded in people's daily lives, and a vocabulary encapsulating this knowledge, which can be used when designing HDI experiences in wearable fitness trackers.

Authors:Lin Kyi, Amruta Mahuli, M. Six Silberman, Reuben Binns, Jun Zhao, Asia J. Biega
Title: Governance of Generative AI in Creative Work: Consent, Credit, Compensation, and Beyond
Abstract:
Since the emergence of generative AI, creative workers have spoken up about the career-based harms they have experienced arising from this new technology. A common theme in these accounts of harm is that generative AI models are trained on workers' creative output without their consent and without giving credit or compensation to the original creators. This paper reports findings from 20 interviews with creative workers in three domains: visual art and design, writing, and programming. We investigate the gaps between current AI governance strategies, what creative workers want out of generative AI governance, and the nuanced role of creative workers' consent, compensation and credit for training AI models on their work. Finally, we make recommendations for how generative AI can be governed and how operators of generative AI systems might more ethically train models on creative output in the future.

Authors:Sobhan Teymouri, Fatemeh Alizadehziri, Mobina Zibandehpoor, Mehdi Delrobaei
Title: Algorithmic Derivation of Human Spatial Navigation Indices From Eye Movement Data
Abstract:
Spatial navigation is a complex cognitive function involving sensory inputs, such as visual, auditory, and proprioceptive information, to understand and move within space. This ability allows humans to create mental maps, navigate through environments, and process directional cues, crucial for exploring new places and finding one's way in unfamiliar surroundings. This study takes an algorithmic approach to extract indices relevant to human spatial navigation using eye movement data. Leveraging electrooculography signals, we analyzed statistical features and applied feature engineering techniques to study eye movements during navigation tasks. The proposed work combines signal processing and machine learning approaches to develop indices for navigation and orientation, spatial anxiety, landmark recognition, path survey, and path route. The analysis yielded five subscore indices with notable accuracy. Among these, the navigation and orientation subscore achieved an R2 score of 0.72, while the landmark recognition subscore attained an R2 score of 0.50. Additionally, statistical features highly correlated with eye movement metrics, including blinks, saccades, and fixations, were identified. The findings of this study can lead to more cognitive assessments and enable early detection of spatial navigation impairments, particularly among individuals at risk of cognitive decline.

Authors:Jessie J. Smith, Wesley Hanwen Deng, William H. Smith, Maarten Sap, Nicole DeCario, Jesse Dodge
Title: The Generative AI Ethics Playbook
Abstract:
The Generative AI Ethics Playbook provides guidance for identifying and mitigating risks of machine learning systems across various domains, including natural language processing, computer vision, and generative AI. This playbook aims to assist practitioners in diagnosing potential harms that may arise during the design, development, and deployment of datasets and models. It offers concrete strategies and resources for mitigating these risks, to help minimize negative impacts on users and society. Drawing on current best practices in both research and ethical considerations, this playbook aims to serve as a comprehensive resource for AI/ML practitioners. The intended audience of this playbook includes machine learning researchers, engineers, and practitioners who are involved in the creation and implementation of generative and multimodal models (e.g., text-to-text, image-to-image, text-to-image, text-to-video). Specifically, we provide transparency/documentation checklists, topics of interest, common questions, examples of harms through case studies, and resources and strategies to mitigate harms throughout the Generative AI lifecycle. This playbook was made collaboratively over the course of 16 months through extensive literature review of over 100 resources and peer-reviewed articles, as well as through an initial group brainstorming session with 18 interdisciplinary AI ethics experts from industry and academia, and with additional feedback from 8 experts (5 of whom were in the initial brainstorming session). We note that while this playbook provides examples, discussion, and harm mitigation strategies, research in this area is ongoing. Our playbook aims to be a practically useful survey, taking a high-level view rather than aiming for covering the entire existing body of research.

Authors:Takato Mizuho, Takuji Narumi, Hideaki Kuzuoka
Title: Effects of Social Contextual Variation Using Partner Avatars on Memory Acquisition and Retention
Abstract:
This study investigates how partner avatar design affects learning and memory when an avatar serves as a lecturer. Based on earlier research on the environmental context dependency of memory, we hypothesize that the use of diverse partner avatars results in a slower learning rate but better memory retention than that of a constant partner avatar. Accordingly, participants were tasked with memorizing Tagalog--Japanese word pairs. On the first day of the experiment, they repeatedly learned the pairs over six sessions from a partner avatar in an immersive virtual environment. One week later, on the second day of the experiment, they underwent a recall test in a real environment. We employed a between-participants design to compare the following conditions: the varied avatar condition, in which each repetition used a different avatar, and the constant avatar condition, in which the same avatar was used throughout the experiment. Results showed that, compared to the constant avatar condition, the varied avatar condition resulted in significantly lower recall performance in the repeated learning trials conducted on the first day. However, the avatar conditions showed no significant differences in the final recall test on the second day. We discuss these effects in relation to the social presence of the partner avatar. This study opens up a novel approach to optimizing the effectiveness of instructor avatars in immersive virtual environments.

Authors:Zihan Zhang, Black Sun, Pengcheng An
Title: Breaking Barriers or Building Dependency? Exploring Team-LLM Collaboration in AI-infused Classroom Debate
Abstract:
Classroom debates are a unique form of collaborative learning characterized by fast-paced, high-intensity interactions that foster critical thinking and teamwork. Despite the recognized importance of debates, the role of AI tools, particularly LLM-based systems, in supporting this dynamic learning environment has been under-explored in HCI. This study addresses this opportunity by investigating the integration of LLM-based AI into real-time classroom debates. Over four weeks, 22 students in a Design History course participated in three rounds of debates with support from ChatGPT. The findings reveal how learners prompted the AI to offer insights, collaboratively processed its outputs, and divided labor in team-AI interactions. The study also surfaces key advantages of AI usage, reducing social anxiety, breaking communication barriers, and providing scaffolding for novices, alongside risks, such as information overload and cognitive dependency, which could limit learners' autonomy. We thereby discuss a set of nuanced implications for future HCI exploration.

Authors:Han Qiao, Siyi Wu, Christoph Becker
Title: "Near Data" and "Far Data" for Urban Sustainability: How Do Community Advocates Envision Data Intermediaries?
Abstract:
In the densifying data ecosystem of today's cities, data intermediaries are crucial stakeholders in facilitating data access and use. Community advocates live in these sites of social injustices and opportunities for change. Highly experienced in working with data to enact change, they offer distinctive insights on data practices and tools. This paper examines the unique perspectives that community advocates offer on data intermediaries. Based on interviews with 17 advocates working with 23 grassroots and nonprofit organizations, we propose the quality of "near" and "far" to be seriously considered in data intermediaries' works and articulate advocates' vision of connecting "near data" and "far data." To pursue this vision, we identified three pathways for data intermediaries: align data exploration with ways of storytelling, communicate context and uncertainties, and decenter artifacts for relationship building. These pathways help data intermediaries to put data feminism into practice, surface design opportunities and tensions, and raise key questions for supporting the pursuit of the Right to the City.

Authors:Shiang Hu, Xiao Gong, Xiaolong Huang, Jie Ruan, Pedro Antonio Valdes-Sosa
Title: Exploring the distribution of connectivity weights in resting-state EEG networks
Abstract:
The resting-state brain networks (RSNs) reflects the functional connectivity patterns between brain modules, providing essential foundations for decoding intrinsic neural information within the brain. It serves as one of the primary tools for describing the spatial dynamics of the brain using various neuroimaging techniques, such as electroencephalography (EEG) and magnetoencephalography (MEG). However, the distribution rules or potential modes of functional connectivity weights in the resting state remain unclear. In this context, we first start from simulation, using forward solving model to generate scalp EEG with four channel densities (19, 32, 64, 128). Subsequently, we construct scalp brain networks using five coupling measures, aiming to explore whether different channel density or coupling measures affect the distribution pattern of functional connectivity weights. Next, we quantify the distribution pattern by calculating the skewness, kurtosis, and Shannon entropy of the functional connectivity network weights. Finally, the results of the simulation were validated in a normative database. We observed that: 1) The functional connection weights exhibit a right-skewed distribution, and are not influenced by channel density or coupling measures; 2) The functional connection weights exhibit a relatively uniform distribution, with the potential for volume conduction to affect the degree of uniformity in the distribution; 3) Networks constructed using coupling measures influenced by volume conduction exhibit significant correlations between the average connection weight and measures of skewness, kurtosis, and Shannon entropy. This study contributes to a deeper understanding of RSNs, providing valuable insights for research in the field of neuroscience, and holds promise for being associated with brain cognition and disease diagnosis.

Authors:Maria Micaela Fonseca, Nuno Fachada, Micael Sousa, Jorge Oliveira, Pedro Rodrigues, Sara Sousa, Claudia Quaresma, Phil Lopes
Title: Games! What are they good for? The Struggle of Serious Game Adoption for Rehabilitation
Abstract:
The field of serious games for health has grown significantly, demonstrating effectiveness in various clinical contexts such as stroke, spinal cord injury, and degenerative neurological diseases. Despite their potential benefits, therapists face barriers to adopting serious games in rehabilitation, including limited training and game literacy, concerns about cost and equipment availability, and a lack of evidence-based research on game effectiveness. Serious games for rehabilitation often involve repetitive exercises, which can be tedious and reduce motivation for continued rehabilitation, treating clients as passive recipients of clinical outcomes rather than players. This study identifies gaps and provides essential insights for advancing serious games in rehabilitation, aiming to enhance their engagement for clients and effectiveness as a therapeutic tool. Addressing these challenges requires a paradigm shift towards developing and co-creating serious games for rehabilitation with therapists, researchers, and stakeholders. Furthermore, future research is crucial to advance the development of serious games, ensuring they adhere to evidence-based principles and engage both clients and therapists. This endeavor will identify gaps in the field, inspire new directions, and support the creation of practical guidelines for serious games research.

Authors:Björn Rene Severitt, Yannick Sauer, Alexander Neugebauer, Rajat Agarwala, Nora Castner, Siegfried Wahl
Title: The interplay of user preference and precision in different gaze-based interaction methods
Abstract:
In this study, we investigated gaze-based interaction methods within a virtual reality game with a visual search task with 52 participants. We compared four different interaction techniques: Selection by dwell time or confirmation of selection by head orientation, nodding or smooth pursuit eye movements. We evaluated both subjective and objective performance metrics, including NASA-TLX for subjective task load as well as time to find the correct targets and points achieved for objective analysis. The results showed significant differences between the interaction methods in terms of NASA TLX dimensions, time to find the right targets, and overall performance scores, suggesting differential effectiveness of gaze-based approaches in improving intuitive system communication. Interestingly, the results revealed gender-specific differences, suggesting interesting implications for the design of gaze-based interaction paradigms that are optimized for different user needs and preferences. These findings could help to develop more customized and effective gaze interaction systems that can improve accessibility and user satisfaction.

Authors:Miriam Doh, Caroline Mazini Rodrigues, N. Boutry, L. Najman, Matei Mancas, Bernard Gosselin
Title: Found in Translation: semantic approaches for enhancing AI interpretability in face verification
Abstract:
The increasing complexity of machine learning models in computer vision, particularly in face verification, requires the development of explainable artificial intelligence (XAI) to enhance interpretability and transparency. This study extends previous work by integrating semantic concepts derived from human cognitive processes into XAI frameworks to bridge the comprehension gap between model outputs and human understanding. We propose a novel approach combining global and local explanations, using semantic features defined by user-selected facial landmarks to generate similarity maps and textual explanations via large language models (LLMs). The methodology was validated through quantitative experiments and user feedback, demonstrating improved interpretability. Results indicate that our semantic-based approach, particularly the most detailed set, offers a more nuanced understanding of model decisions than traditional methods. User studies highlight a preference for our semantic explanations over traditional pixelbased heatmaps, emphasizing the benefits of human-centric interpretability in AI. This work contributes to the ongoing efforts to create XAI frameworks that align AI models behaviour with human cognitive processes, fostering trust and acceptance in critical applications.

Authors:Nilesh Kumar Sahu, Nandigramam Sai Harshit, Rishabh Uikey, Haroon R. Lone
Title: Beyond Questionnaires: Video Analysis for Social Anxiety Detection
Abstract:
Social Anxiety Disorder (SAD) significantly impacts individuals' daily lives and relationships. The conventional methods for SAD detection involve physical consultations and self-reported questionnaires, but they have limitations such as time consumption and bias. This paper introduces video analysis as a promising method for early SAD detection. Specifically, we present a new approach for detecting SAD in individuals from various bodily features extracted from the video data. We conducted a study to collect video data of 92 participants performing impromptu speech in a controlled environment. Using the video data, we studied the behavioral change in participants' head, body, eye gaze, and action units. By applying a range of machine learning and deep learning algorithms, we achieved an accuracy rate of up to 74\% in classifying participants as SAD or non-SAD. Video-based SAD detection offers a non-intrusive and scalable approach that can be deployed in real-time, potentially enhancing early detection and intervention capabilities.

Authors:Tangyao Li, Yuyang Wang
Title: Balancing Exploration and Cybersickness: Investigating Curiosity-Driven Behavior in Virtual Environments
Abstract:
During virtual navigation, users exhibit varied interaction and navigation behaviors influenced by several factors. Existing theories and models have been developed to explain and predict these diverse patterns. While users often experience uncomfortable sensations, such as cybersickness, during virtual reality (VR) use, they do not always make optimal decisions to mitigate these effects. Although methods like reinforcement learning have been used to model decision-making processes, they typically rely on random selection to simulate actions, failing to capture the complexities of real navigation behavior. In this study, we propose curiosity as a key factor driving irrational decision-making, suggesting that users continuously balance exploration and cybersickness according to the free energy principle during virtual navigation. Our findings show that VR users generally adopt conservative strategies when navigating, with most participants displaying negative curiosity across trials. However, curiosity levels tend to rise when the virtual environment changes, illustrating the dynamic interplay between exploration and discomfort. This study provides a quantitative approach to decoding curiosity-driven behavior during virtual navigation, offering insights into how users balance exploration and the avoidance of cybersickness. Future research will further refine this model by incorporating additional psychological and environmental factors to improve the accuracy of navigation pattern predictions.

Authors:Nicolas Rothbacher, Kit T. Rodolfa, Mihir Bhaskar, Erin Maneri, Christine Tsang, Daniel E. Ho
Title: Artificial Intelligence in Environmental Protection: The Importance of Organizational Context from a Field Study in Wisconsin
Abstract:
Advances in Artificial Intelligence (AI) have generated widespread enthusiasm for the potential of AI to support our understanding and protection of the environment. As such tools move from basic research to more consequential settings, such as regulatory enforcement, the human context of how AI is utilized, interpreted, and deployed becomes increasingly critical. Yet little work has systematically examined the role of such organizational goals and incentives in deploying AI systems. We report results from a unique case study of a satellite imagery-based AI tool to detect dumping of agricultural waste, with concurrent field trials with the Wisconsin Department of Natural Resources (WDNR) and a non-governmental environmental interest group in which the tool was utilized for field investigations when dumping was presumptively illegal in February-March 2023. Our results are threefold: First, both organizations confirmed a similar level of ground-truth accuracy for the model's detections. Second, they differed, however, in their overall assessment of its usefulness, as WDNR was interested in clear violations of existing law, while the interest group sought to document environmental risk beyond the scope of existing regulation. Dumping by an unpermitted entity or just before February 1, for instance, were deemed irrelevant by WDNR. Third, while AI tools promise to prioritize allocation of environmental protection resources, they may expose important gaps of existing law.

Authors:Karim Benharrak, Amy Pavel
Title: HistoryPalette: Supporting Exploration and Reuse of Past Alternatives in Image Generation and Editing
Abstract:
All creative tasks require creators to iteratively produce, select, and discard potentially useful ideas. Now, creativity tools include generative AI features (e.g., Photoshop Generative Fill) that increase the number of alternatives creators consider due to rapid experiments with text prompts and random generations. Creators use tedious manual systems for organizing their prior ideas by saving file versions or hiding layers, but they lack the support they want for reusing prior alternatives in personal work or in communication with others. We present HistoryPalette, a system that supports exploration and reuse of prior designs in generative image creation and editing. Using HistoryPalette, creators and their collaborators explore a "palette" of prior design alternatives organized by spatial position, topic category, and creation time. HistoryPalette enables creators to quickly preview and reuse their prior work. In creative professional and client collaborator user studies, participants generated and edited images by exploring and reusing past design alternatives with HistoryPalette.

Authors:Pranav Pandey, Ramviyas Parasuraman, Prashant Doshi
Title: FRESHR-GSI: A Generalized Safety Model and Evaluation Framework for Mobile Robots in Multi-Human Environments
Abstract:
Human safety is critical in applications involving close human-robot interactions (HRI) and is a key aspect of physical compatibility between humans and robots. While measures of human safety in HRI exist, these mainly target industrial settings involving robotic manipulators. Less attention has been paid to settings where mobile robots and humans share the space. This paper introduces a new robot-centered directional framework of human safety. It is particularly useful for evaluating mobile robots as they operate in environments populated by multiple humans. The framework integrates several key metrics, such as each human's relative distance, speed, and orientation. The core novelty lies in the framework's flexibility to accommodate different application requirements while allowing for both the robot-centered and external observer points of view. We instantiate the framework by using RGB-D based vision integrated with a deep learning-based human detection pipeline to yield a generalized safety index (GSI) that instantaneously assesses human safety. We evaluate GSI's capability of producing appropriate, robust, and fine-grained safety measures in real-world experimental scenarios and compare its performance with extant safety models.

Authors:Andrew Chang, Viswadruth Akkaraju, Ray McFadden Cogliano, David Poeppel, Dustin Freeman
Title: Multimodal Machine Learning Can Predict Videoconference Fluidity and Enjoyment
Abstract:
Videoconferencing is now a frequent mode of communication in both professional and informal settings, yet it often lacks the fluidity and enjoyment of in-person conversation. This study leverages multimodal machine learning to predict moments of negative experience in videoconferencing. We sampled thousands of short clips from the RoomReader corpus, extracting audio embeddings, facial actions, and body motion features to train models for identifying low conversational fluidity, low enjoyment, and classifying conversational events (backchanneling, interruption, or gap). Our best models achieved an ROC-AUC of up to 0.87 on hold-out videoconference sessions, with domain-general audio features proving most critical. This work demonstrates that multimodal audio-video signals can effectively predict high-level subjective conversational outcomes. In addition, this is a contribution to research on videoconferencing user experience by showing that multimodal machine learning can be used to identify rare moments of negative user experience for further study or mitigation.

Authors:Rasmus Lunding, Sebastian Hubenschmid, Tiare Feuchtner, Kaj Grønbæk
Title: ARTHUR: Authoring Human-Robot Collaboration Processes with Augmented Reality using Hybrid User Interfaces
Abstract:
While augmented reality shows promise for supporting human-robot collaboration, creating such interactive systems still poses great challenges. Addressing this, we introduce ARTHUR, an open-source authoring tool for augmented reality-supported human-robot collaboration. ARTHUR supports 20 types of multi-modal feedback to convey robot, task, and system state, 10 actions that enable the user to control the robot and system, and 18 conditions for feedback customization and triggering of actions. By combining these elements, users can create interaction spaces, controls, and information visualizations in augmented reality for collaboration with robot arms. With ARTHUR, we propose to combine desktop interfaces and touchscreen devices for effective authoring, with head-mounted displays for testing and in-situ refinements. To demonstrate the general applicability of ARTHUR for human-robot collaboration scenarios, we replicate representative examples from prior work. Further, in an evaluation with five participants, we reflect on the usefulness of our hybrid user interface approach and the provided functionality, highlighting directions for future work.

Authors:Justin M. Kasowski, Apurv Varshney, Roksana Sadeghi, Michael Beyeler
Title: Simulated prosthetic vision confirms checkerboard as an effective raster pattern for epiretinal implants
Abstract:
Spatial scheduling of electrode activation ("rastering") is essential for safely operating high-density retinal implants, yet its perceptual consequences remain poorly understood. This study systematically evaluates the impact of raster patterns, or spatial arrangements of sequential electrode activation, on performance and perceived difficulty in simulated prosthetic vision (SPV). By addressing this gap, we aimed to identify patterns that optimize functional vision in retinal implants. Sighted participants completed letter recognition and motion discrimination tasks under four raster patterns (horizontal, vertical, checkerboard, and random) using an immersive SPV system. The simulations emulated epiretinal implant perception and employed psychophysically validated models of electrode activation, phosphene appearance, nonlinear spatial summation, and temporal dynamics, ensuring realistic representation of prosthetic vision. Performance accuracy and self-reported difficulty were analyzed to assess the effects of raster patterning. The checkerboard pattern consistently outperformed other raster patterns, yielding significantly higher accuracy and lower difficulty ratings across both tasks. The horizontal and vertical patterns introduced biases aligned with apparent motion artifacts, while the checkerboard minimized such effects. Random patterns resulted in the lowest performance, underscoring the importance of structured activation. Notably, checkerboard matched performance in the "No Raster" condition, despite conforming to groupwise safety constraints. This is the first quantitative, task-based evaluation of raster patterns in SPV. Checkerboard-style scheduling enhances perceptual clarity without increasing computational load, offering a low-overhead, clinically relevant strategy for improving usability in next-generation retinal prostheses.

Authors:Alex Binh Vinh Duc Nguyen, Jan Leusmann, Sven Mayer, Andrew Vande Moere
Title: Eliciting Understandable Architectonic Gestures for Robotic Furniture through Co-Design Improvisation
Abstract:
The vision of adaptive architecture proposes that robotic technologies could enable interior spaces to physically transform in a bidirectional interaction with occupants. Yet, it is still unknown how this interaction could unfold in an understandable way. Inspired by HRI studies where robotic furniture gestured intents to occupants by deliberately positioning or moving in space, we hypothesise that adaptive architecture could also convey intents through gestures performed by a mobile robotic partition. To explore this design space, we invited 15 multidisciplinary experts to join co-design improvisation sessions, where they manually manoeuvred a deactivated robotic partition to design gestures conveying six architectural intents that varied in purpose and urgency. Using a gesture elicitation method alongside motion-tracking data, a Laban-based questionnaire, and thematic analysis, we identified 20 unique gestural strategies. Through categorisation, we introduced architectonic gestures as a novel strategy for robotic furniture to convey intent by indexically leveraging its spatial impact, complementing the established deictic and emblematic gestures. Our study thus represents an exploratory step toward making the autonomous gestures of adaptive architecture more legible. By understanding how robotic gestures are interpreted based not only on their motion but also on their spatial impact, we contribute to bridging HRI with Human-Building Interaction research.

Authors:Tamim Ahmed, Zhaoyi Guo, Mohammod Shaikh Sadid Khan, Thanassis Rikakis, Aisling Kelliher
Title: Data Acquisition Through Participatory Design for Automated Rehabilitation Assessment
Abstract:
Through participatory design, we are developing a computational system for the semi-automated assessment of the Action Research Arm Test (ARAT) for stroke rehabilitation. During rehabilitation assessment, clinicians rate movement segments and components in the context of overall task performance. Clinicians change viewing angles to assess particular components. Through studies with clinicians, we develop a system that includes: a) unobtrusive multi-camera capture, b) a segmentation interface for non-expert segmentors, and c) a rating interface for expert clinicians. Five clinicians independently captured 1800 stroke survivor videos with <5$\%$ errors. Three segmentors have segmented 760 of these videos, averaging 20 seconds per segment. They favor the recommended camera view $>$ 90\%. Multiple clinicians have rated the segmented videos while reporting minimal problems. The complete data will be used for training an automated segmentation and rating system that empowers the clinicians as the ratings will be compatible with clinical practice and intuition.

Authors:Mariia Ershova, Graziano Blasilli
Title: Bridging Service Design, Visualizations, and Visual Analytics in Healthcare Digital Twins: Challenges, Gaps, and Research Opportunities
Abstract:
Digital twins (DT) are increasingly used in healthcare to model patients, processes, and physiological systems. While recent solutions leverage visualization, visual analytics, and user interaction, these systems rarely incorporate structured service design methodologies. Bridging service design with visual analytics and visualization can be valuable for the healthcare DT community. This paper aims to introduce the service design discipline to visualization researchers by framing this integration gap and suggesting research directions to enhance the real-world applicability of DT solutions.

Authors:Shubhabrata Mukherjee, Jack Lang, Obeen Kwon, Iryna Zenyuk, Valerie Brogden, Adam Weber, Daniela Ushizima
Title: Foundation Models for Zero-Shot Segmentation of Scientific Images without AI-Ready Data
Abstract:
Zero-shot and prompt-based models have excelled at visual reasoning tasks by leveraging large-scale natural image corpora, but they often fail on sparse and domain-specific scientific image data. We introduce Zenesis, a no-code interactive computer vision platform designed to reduce data readiness bottlenecks in scientific imaging workflows. Zenesis integrates lightweight multimodal adaptation for zero-shot inference on raw scientific data, human-in-the-loop refinement, and heuristic-based temporal enhancement. We validate our approach on Focused Ion Beam Scanning Electron Microscopy (FIB-SEM) datasets of catalyst-loaded membranes. Zenesis outperforms baselines, achieving an average accuracy of 0.947, Intersection over Union (IoU) of 0.858, and Dice score of 0.923 on amorphous catalyst samples; and 0.987 accuracy, 0.857 IoU, and 0.923 Dice on crystalline samples. These results represent a significant performance gain over conventional methods such as Otsu thresholding and standalone models like the Segment Anything Model (SAM). Zenesis enables effective image segmentation in domains where annotated datasets are limited, offering a scalable solution for scientific discovery.

Authors:Andres Navarro, Carlos de Quinto, José Alberto Hernández
Title: Email as the Interface to Generative AI Models: Seamless Administrative Automation
Abstract:
This paper introduces a novel architectural framework that integrates Large Language Models (LLMs) with email interfaces to automate administrative tasks, specifically targeting accessibility barriers in enterprise environments. The system connects email communication channels with Optical Character Recognition (OCR) and intelligent automation, enabling non-technical administrative staff to delegate complex form-filling and document processing tasks using familiar email interfaces. By treating the email body as a natural language prompt and attachments as contextual information, the workflow bridges the gap between advanced AI capabilities and practical usability. Empirical evaluation shows that the system can complete complex administrative forms in under 8 seconds of automated processing, with human supervision reducing total staff time by a factor of three to four compared to manual workflows. The top-performing LLM accurately filled 16 out of 29 form fields and reduced the total cost per processed form by 64% relative to manual completion. These findings demonstrate that email-based LLM integration is a viable and cost-effective approach for democratizing advanced automation in organizational settings, supporting widespread adoption without requiring specialized technical knowledge or major workflow changes. This aligns with broader trends in leveraging LLMs to enhance accessibility and automate complex tasks for non-technical users, making technology more inclusive and efficient.

Authors:Lluís C. Coll, Martin W. Lauer-Schmaltz, Philip Cash, John P. Hansen, Anja Maier
Title: Towards the "Digital Me": A vision of authentic Conversational Agents powered by personal Human Digital Twins
Abstract:
Human Digital Twins (HDTs) have traditionally been conceptualized as data-driven models designed to support decision-making across various domains. However, recent advancements in conversational AI open new possibilities for HDTs to function as authentic, interactive digital counterparts of individuals. This paper introduces a novel HDT system architecture that integrates large language models with dynamically updated personal data, enabling it to mirror an individual's conversational style, memories, and behaviors. To achieve this, our approach implements context-aware memory retrieval, neural plasticity-inspired consolidation, and adaptive learning mechanisms, creating a more natural and evolving digital persona. The resulting system does not only replicate an individual's unique conversational style depending on who they are speaking with, but also enriches responses with dynamically captured personal experiences, opinions, and memories. While this marks a significant step toward developing authentic virtual counterparts, it also raises critical ethical concerns regarding privacy, accountability, and the long-term implications of persistent digital identities. This study contributes to the field of HDTs by describing our novel system architecture, demonstrating its capabilities, and discussing future directions and emerging challenges to ensure the responsible and ethical development of HDTs.

Authors:Ewelina Gajewska, Michal Wawer, Katarzyna Budzynska, Jarosław A. Chudziak
Title: Leveraging a Multi-Agent LLM-Based System to Educate Teachers in Hate Incidents Management
Abstract:
Computer-aided teacher training is a state-of-the-art method designed to enhance teachers' professional skills effectively while minimising concerns related to costs, time constraints, and geographical limitations. We investigate the potential of large language models (LLMs) in teacher education, using a case of teaching hate incidents management in schools. To this end, we create a multi-agent LLM-based system that mimics realistic situations of hate, using a combination of retrieval-augmented prompting and persona modelling. It is designed to identify and analyse hate speech patterns, predict potential escalation, and propose effective intervention strategies. By integrating persona modelling with agentic LLMs, we create contextually diverse simulations of hate incidents, mimicking real-life situations. The system allows teachers to analyse and understand the dynamics of hate incidents in a safe and controlled environment, providing valuable insights and practical knowledge to manage such situations confidently in real life. Our pilot evaluation demonstrates teachers' enhanced understanding of the nature of annotator disagreements and the role of context in hate speech interpretation, leading to the development of more informed and effective strategies for addressing hate in classroom settings.

Authors:Lisa Marie Otto, Michael Kaiser, Daniel Seebacher, Steffen Müller
Title: Validation of AI-Based 3D Human Pose Estimation in a Cyber-Physical Environment
Abstract:
Ensuring safe and realistic interactions between automated driving systems and vulnerable road users (VRUs) in urban environments requires advanced testing methodologies. This paper presents a test environment that combines a Vehiclein-the-Loop (ViL) test bench with a motion laboratory, demonstrating the feasibility of cyber-physical (CP) testing of vehicle-pedestrian and vehicle-cyclist interactions. Building upon previous work focused on pedestrian localization, we further validate a human pose estimation (HPE) approach through a comparative analysis of real-world (RW) and virtual representations of VRUs. The study examines the perception of full-body motion using a commercial monocular camera-based 3Dskeletal detection AI. The virtual scene is generated in Unreal Engine 5, where VRUs are animated in real time and projected onto a screen to stimulate the camera. The proposed stimulation technique ensures the correct perspective, enabling realistic vehicle perception. To assess the accuracy and consistency of HPE across RW and CP domains, we analyze the reliability of detections as well as variations in movement trajectories and joint estimation stability. The validation includes dynamic test scenarios where human avatars, both walking and cycling, are monitored under controlled conditions. Our results show a strong alignment in HPE between RW and CP test conditions for stable motion patterns, while notable inaccuracies persist under dynamic movements and occlusions, particularly for complex cyclist postures. These findings contribute to refining CP testing approaches for evaluating next-generation AI-based vehicle perception and to enhancing interaction models of automated vehicles and VRUs in CP environments.

Authors:Huai-Chih Wang, Hsiang-Chun Chuang, Hsi-Chun Cheng, Dai-Jie Wu, Shao-Hua Sun
Title: CooT: Learning to Coordinate In-Context with Coordination Transformers
Abstract:
Effective coordination among artificial agents in dynamic and uncertain environments remains a significant challenge in multi-agent systems. Existing approaches, such as self-play and population-based methods, either generalize poorly to unseen partners or require extensive training. To overcome these limitations, we propose Coordination Transformers (CooT), a novel in-context coordination framework that uses recent interaction histories to adapt to unseen partners rapidly. Unlike previous approaches that primarily aim to increase the diversity of training partners, CooT explicitly focuses on adapting to new partner behaviors by predicting actions aligned with observed partner interactions. Trained on interaction trajectories collected from diverse pairs of agents with complementary behaviors, CooT quickly learns effective coordination strategies without explicit supervision or fine-tuning. Evaluations on the Overcooked benchmark demonstrate that CooT significantly outperforms baseline methods in coordination tasks involving previously unseen partners. Human evaluations further confirm CooT as the most effective collaborative partner, while extensive ablations highlight its robustness, flexibility, and sensitivity to context in multi-agent scenarios.

Authors:Ruthvik Bokkasam, Shankar Gangisetty, A. H. Abdul Hafez, C. V. Jawahar
Title: Pedestrian Intention and Trajectory Prediction in Unstructured Traffic Using IDD-PeD
Abstract:
With the rapid advancements in autonomous driving, accurately predicting pedestrian behavior has become essential for ensuring safety in complex and unpredictable traffic conditions. The growing interest in this challenge highlights the need for comprehensive datasets that capture unstructured environments, enabling the development of more robust prediction models to enhance pedestrian safety and vehicle navigation. In this paper, we introduce an Indian driving pedestrian dataset designed to address the complexities of modeling pedestrian behavior in unstructured environments, such as illumination changes, occlusion of pedestrians, unsignalized scene types and vehicle-pedestrian interactions. The dataset provides high-level and detailed low-level comprehensive annotations focused on pedestrians requiring the ego-vehicle's attention. Evaluation of the state-of-the-art intention prediction methods on our dataset shows a significant performance drop of up to $\mathbf{15\%}$, while trajectory prediction methods underperform with an increase of up to $\mathbf{1208}$ MSE, defeating standard pedestrian datasets. Additionally, we present exhaustive quantitative and qualitative analysis of intention and trajectory baselines. We believe that our dataset will open new challenges for the pedestrian behavior research community to build robust models. Project Page: https://cvit.iiit.ac.in/research/projects/cvit-projects/iddped

Authors:Anya Osborne, Sabrina Fielder, Lee Taber, Tara Lamb, Joshua McVeigh-Schultz, Katherine Isbister
Title: Avatars and Environments for Meetings in Social VR: What Styles and Choices Matter to People in Group Creativity Tasks?
Abstract:
Due to the COVID-19 pandemic, many professional entities shifted toward remote collaboration and video conferencing (VC) tools. Social virtual reality (VR) platforms present an alternative to VC for meetings and collaborative activities. Well-crafted social VR environments could enhance feelings of co-presence and togetherness at meetings, helping reduce the need for carbon-intensive travel to face-to-face meetings. This research contributes to creating meeting tools in VR by exploring the effects of avatar styles and virtual environments on groups creative performance using the Mozilla Hubs platform. We present the results of two sequential studies. Study One surveys avatar and environment preferences in various VR meeting contexts (N=87). Study Two applies these findings to the design of a between-subjects and within-subjects research where participants (N=40) perform creativity tasks in pairs as embodied avatars in different virtual settings using VR headsets. We discuss the design implications of avatar appearances and meeting settings on teamwork.

Authors:Fangjun Ding, Renyu Zhang, Xinyu Feng, Chengye Xie, Zheng Zhang, Yanting Zhang
Title: PsyLite Technical Report
Abstract:
With the rapid development of digital technology, AI-driven psychological counseling has gradually become an important research direction in the field of mental health. However, existing models still have deficiencies in dialogue safety, detailed scenario handling, and lightweight deployment. To address these issues, this study proposes PsyLite, a lightweight psychological counseling large language model agent developed based on the base model InternLM2.5-7B-chat. Through a two-stage training strategy (hybrid distillation data fine-tuning and ORPO preference optimization), PsyLite enhances the model's deep-reasoning ability, psychological counseling ability, and safe dialogue ability. After deployment using Ollama and Open WebUI, a custom workflow is created with Pipelines. An innovative conditional RAG is designed to introduce crosstalk humor elements at appropriate times during psychological counseling to enhance user experience and decline dangerous requests to strengthen dialogue safety. Evaluations show that PsyLite outperforms the baseline models in the Chinese general evaluation (CEval), psychological counseling professional evaluation (CPsyCounE), and dialogue safety evaluation (SafeDialBench), particularly in psychological counseling professionalism (CPsyCounE score improvement of 47.6\%) and dialogue safety (\safe{} score improvement of 2.4\%). Additionally, the model uses quantization technology (GGUF q4\_k\_m) to achieve low hardware deployment (5GB memory is sufficient for operation), providing a feasible solution for psychological counseling applications in resource-constrained environments.

Authors:Yunxiu Xu, Siyu Wang, Shoichi Hasegawa
Title: Lightweight Fingernail Haptic Device: Unobstructed Fingerpad Force and Vibration Feedback for Enhanced Virtual Dexterous Manipulation
Abstract:
This study presents a lightweight, wearable fingertip haptic device that provides physics-based haptic feedback for dexterous manipulation in virtual environments without hindering real-world interactions. The device, designed with thin strings and actuators attached to the fingernails, ensures minimal weight (1.55 g per finger) and preserves finger flexibility. Integrating the software with a physics engine renders multiple types of haptic feedback (grip force, collision, and sliding vibration feedback). We evaluated the device's performance in pressure perception, slip feedback, typical dexterous manipulation tasks, and daily operations, and we gathered user experience through subjective assessments. Our results show that participants could perceive and respond to pressure and vibration feedback. Through dexterous manipulation experiments, we further demonstrated that these minimal haptic cues significantly improved virtual task efficiency, showcasing how lightweight haptic feedback can enhance manipulation performance without complex mechanisms. The device's ability to preserve tactile sensations and minimize hindrance to real-world operations is a key advantage over glove-type haptic devices. This research offers a potential solution for designing haptic interfaces that balance lightweight construction, haptic feedback for dexterous manipulation, and daily wearability.

Authors:Hong Wang, Natalia Calvo-Barajas, Katie Winkle, Ginevra Castellano
Title: "Who Should I Believe?": User Interpretation and Decision-Making When a Family Healthcare Robot Contradicts Human Memory
Abstract:
Advancements in robotic capabilities for providing physical assistance, psychological support, and daily health management are making the deployment of intelligent healthcare robots in home environments increasingly feasible in the near future. However, challenges arise when the information provided by these robots contradicts users' memory, raising concerns about user trust and decision-making. This paper presents a study that examines how varying a robot's level of transparency and sociability influences user interpretation, decision-making and perceived trust when faced with conflicting information from a robot. In a 2 x 2 between-subjects online study, 176 participants watched videos of a Furhat robot acting as a family healthcare assistant and suggesting a fictional user to take medication at a different time from that remembered by the user. Results indicate that robot transparency influenced users' interpretation of information discrepancies: with a low transparency robot, the most frequent assumption was that the user had not correctly remembered the time, while with the high transparency robot, participants were more likely to attribute the discrepancy to external factors, such as a partner or another household member modifying the robot's information. Additionally, participants exhibited a tendency toward overtrust, often prioritizing the robot's recommendations over the user's memory, even when suspecting system malfunctions or third-party interference. These findings highlight the impact of transparency mechanisms in robotic systems, the complexity and importance associated with system access control for multi-user robots deployed in home environments, and the potential risks of users' over reliance on robots in sensitive domains such as healthcare.

Authors:Kyosuke Ishibashi, Atsushi Saito, Zin Y. Tun, Lucas Ray, Megan C. Coram, Akihiro Sakurai, Allison M. Okamura, Ko Yamamoto
Title: Effect of Haptic Feedback on Avoidance Behavior and Visual Exploration in Dynamic VR Pedestrian Environment
Abstract:
Human crowd simulation in virtual reality (VR) is a powerful tool with potential applications including emergency evacuation training and assessment of building layout. While haptic feedback in VR enhances immersive experience, its effect on walking behavior in dense and dynamic pedestrian flows is unknown. Through a user study, we investigated how haptic feedback changes user walking motion in crowded pedestrian flows in VR. The results indicate that haptic feedback changed users' collision avoidance movements, as measured by increased walking trajectory length and change in pelvis angle. The displacements of users' lateral position and pelvis angle were also increased in the instantaneous response to a collision with a non-player character (NPC), even when the NPC was inside the field of view. Haptic feedback also enhanced users' awareness and visual exploration when an NPC approached from the side and back. Furthermore, variation in walking speed was increased by the haptic feedback. These results suggested that the haptic feedback enhanced users' sensitivity to a collision in VR environment.

Authors:Haoran Zhang, Xin Zhao, Jinze Chen, Junpeng Guo
Title: A Literature Review on Simulation in Conversational Recommender Systems
Abstract:
Conversational Recommender Systems (CRSs) have garnered attention as a novel approach to delivering personalized recommendations through multi-turn dialogues. This review developed a taxonomy framework to systematically categorize relevant publications into four groups: dataset construction, algorithm design, system evaluation, and empirical studies, providing a comprehensive analysis of simulation methods in CRSs research. Our analysis reveals that simulation methods play a key role in tackling CRSs' main challenges. For example, LLM-based simulation methods have been used to create conversational recommendation data, enhance CRSs algorithms, and evaluate CRSs. Despite several challenges, such as dataset bias, the limited output flexibility of LLM-based simulations, and the gap between text semantic space and behavioral semantics, persist due to the complexity in Human-Computer Interaction (HCI) of CRSs, simulation methods hold significant potential for advancing CRS research. This review offers a thorough summary of the current research landscape in this domain and identifies promising directions for future inquiry.

Authors:Andrew T. Rozema, James C. Davis
Title: Anti-Phishing Training (Still) Does Not Work: A Large-Scale Reproduction of Phishing Training Inefficacy Grounded in the NIST Phish Scale
Abstract:
Social engineering attacks delivered via email, commonly known as phishing, represent a persistent cybersecurity threat leading to significant organizational incidents and data breaches. Although many organizations train employees on phishing, often mandated by compliance requirements, the real-world effectiveness of this training remains debated. To contribute to evidence-based cybersecurity policy, we conducted a large-scale reproduction study (N = 12,511) at a US-based financial technology firm. Our experimental design refined prior work by comparing training modalities in operational environments, validating NIST's standardized phishing difficulty measurement, and introducing novel organizational-level temporal resilience metrics. Echoing prior work, training interventions showed no significant main effects on click rates (p=0.450) or reporting rates (p=0.417), with negligible effect sizes. However, we found that the NIST Phish Scale predicted user behavior, with click rates increasing from 7.0% for easy lures to 15.0% for hard lures. Our organizational-level resilience result was mixed: 36-55% of campaigns achieved "inoculation" patterns where reports preceded clicks, but training did not significantly improve organizational-level temporal protection. In summary, our results confirm the ineffectiveness of current phishing training approaches while offering a refined study design for future work.

Authors:Miriam Doh, Corinna Canali, Nuria Oliver
Title: Filters of Identity: AR Beauty and the Algorithmic Politics of the Digital Body
Abstract:
This position paper situates AR beauty filters within the broader debate on Body Politics in HCI. We argue that these filters are not neutral tools but technologies of governance that reinforce racialized, gendered, and ableist beauty standards. Through naming conventions, algorithmic bias, and platform governance, they impose aesthetic norms while concealing their influence. To address these challenges, we advocate for transparency-driven interventions and a critical rethinking of algorithmic aesthetics and digital embodiment.

Authors:Panagiotis Kourtesis, Evgenia Giatzoglou, Panagiotis Vorias, Katerina Alkisti Gounari, Eleni Orfanidou, Chrysanthi Nega
Title: Examination of Eye-Tracking, Head-Gaze, and Controller-Based Ray-casting in TMT-VR: Performance and Usability Across Adulthood
Abstract:
Virtual reality (VR) can enrich neuropsychological testing, yet the ergonomic trade-offs of its input modes remain under-examined. Seventy-seven healthy volunteers-young (19-29 y) and middle-aged (35-56 y)-completed a VR Trail-Making Test with three pointing methods: eye-tracking, head-gaze, and a six-degree-of-freedom hand controller. Completion time, spatial accuracy, and error counts for the simple (Trail A) and alternating (Trail B) sequences were analysed in 3 x 2 x 2 mixed-model ANOVAs; post-trial scales captured usability (SUS), user experience (UEQ-S), and acceptability. Age dominated behaviour: younger adults were reliably faster, more precise, and less error-prone. Against this backdrop, input modality mattered. Eye-tracking yielded the best spatial accuracy and shortened Trail A time relative to manual control; head-gaze matched eye-tracking on Trail A speed and became the quickest, least error-prone option on Trail B. Controllers lagged on every metric. Subjective ratings were high across the board, with only a small usability dip in middle-aged low-gamers. Overall, gaze-based ray-casting clearly outperformed manual pointing, but optimal choice depended on task demands: eye-tracking maximised spatial precision, whereas head-gaze offered calibration-free enhanced speed and error-avoidance under heavier cognitive load. TMT-VR appears to be accurate, engaging, and ergonomically adaptable assessment, yet it requires age-specific-stratified norms.

Authors:Qing Zhang, Zixiong Su, Yoshihito Kondoh, Kazunori Asada, Thad Starner, Kai Kunze, Yuta Itoh, Jun Rekimoto
Title: OpticalAging: Real-time Presbyopia Simulation for Inclusive Design via Tunable Lenses
Abstract:
Presbyopia, a common age-related vision condition affecting most people as they age, often remains inadequately understood by those unaffected. To help bridge the gap between abstract accessibility knowledge and a more grounded appreciation of perceptual challenges, this study presents OpticalAging, an optical see-through simulation approach. Unlike VR-based methods, OpticalAging uses dynamically controlled tunable lenses to simulate the first-person visual perspective of presbyopia's distance-dependent blur during real-world interaction, aiming to enhance awareness. While acknowledging critiques regarding simulation's limitations in fully capturing lived experience, we position this tool as a complement to user-centered methods. Our user study (N = 19, 18-35 years old) provides validation: quantitative measurements show statistically significant changes in near points across three age modes (40s, 50s, 60s), while qualitative results suggest increases in reported understanding and empathy among participants. The integration of our tool into a design task showcases its potential applicability within age-inclusive design workflows when used critically alongside direct user engagement.

Authors:Roi Alfassi, Angelora Cooper, Zoe Mitchell, Mary Calabro, Orit Shaer, Osnat Mokryn
Title: Fanfiction in the Age of AI: Community Perspectives on Creativity, Authenticity and Adoption
Abstract:
The integration of Generative AI (GenAI) into creative communities, like fanfiction, is reshaping how stories are created, shared, and valued. This study investigates the perceptions of 157 active fanfiction members, both readers and writers, regarding AI-generated content in fanfiction. Our research explores the impact of GenAI on community dynamics, examining how AI affects the participatory and collaborative nature of these spaces. The findings reveal responses ranging from cautious acceptance of AI's potential for creative enhancement to concerns about authenticity, ethical issues, and the erosion of human-centered values. Participants emphasized the importance of transparency and expressed worries about losing social connections. Our study highlights the need for thoughtful AI integration in creative platforms using design interventions that enable ethical practices, promote transparency, increase engagement and connection, and preserve the community's core values.

Authors:Imene Tarakli, Samuele Vinanzi, Richard Moore, Alessandro Di Nuovo
Title: Robots and Children that Learn Together : Improving Knowledge Retention by Teaching Peer-Like Interactive Robots
Abstract:
Despite growing interest in Learning-by-Teaching (LbT), few studies have explored how this paradigm can be implemented with autonomous, peer-like social robots in real classrooms. Most prior work has relied on scripted or Wizard-of-Oz behaviors, limiting our understanding of how real-time, interactive learning can be supported by artificial agents. This study addresses this gap by introducing Interactive Reinforcement Learning (RL) as a cognitive model for teachable social robots. We conducted two between-subject experiments with 58 primary school children, who either taught a robot or practiced independently on a tablet while learning French vocabulary (memorization) and grammatical rules (inference). The robot, powered by Interactive RL, learned from the child's evaluative feedback. Children in the LbT condition achieved significantly higher retention gains compared to those in the self-practice condition, especially on the grammar task. Learners with lower prior knowledge benefited most from teaching the robot. Behavioural metrics revealed that children adapted their teaching strategies over time and engaged more deeply during inference tasks. This work makes two contributions: (1) it introduces Interactive RL as a pedagogically effective and scalable model for peer-robot learning, and (2) it demonstrates, for the first time, the feasibility of deploying multiple autonomous robots simultaneously in real classrooms. These findings extend theoretical understanding of LbT by showing that social robots can function not only as passive tutees but as adaptive partners that enhance meta-cognitive engagement and long-term learning outcomes.

Authors:Feiqi Gu, Zhixiong Wang, Zhenyu Wang, Dengbo He
Title: Supporting Car-Following Behavior through V2V-Based Beyond-Visual-Range Information Display
Abstract:
Rear-end collisions constituted a large portion of crashes on the road, despite efforts to mitigate rear-end collisions, such as forward collision warnings. The chance of rear-end collisions is closely related to drivers' car-following (CF) behaviors in the traffic flow. Given that drivers may rely on more than the information of the direct lead vehicle (DLV) when making CF decisions, expanding drivers' perceptual range by providing beyond-visual-range (BVR) information based on vehicle-to-vehicle (V2V) communication may enhance CF safety. Thus, four different human-machine interfaces (HMIs) providing various types of BVR information in CF events were designed, including Brake-HMI showing only brake action of indirect lead vehicles (ILV), Dis-HMI and THW-HMI showing the relative distance and time headway between the ILV and DLV, respectively, and Video-HMI showing the live-stream video of ILV from the perspective of DLV. A driving simulator experiment with 40 participants was conducted to evaluate the impact of BVR-based HMI on driving safety in CF events. We found that, in general, BVR information could improve CF safety without overloading drivers and compromising their visual attention allocation strategies, particularly among novice drivers, by enabling quicker brake responses and increasing time headway and time-to-collision in brake events. The Brake-HMI yielded the safest performance in chain brake events, whereas Video-HMI increased attentional demands without observable benefits. This research provides insights into enabling drivers' BVR perception based on V2V communication to enhance driving safety in CF scenarios.

Authors:Fangzheng Liu, Lancelot Blanchard, Don D. Haddad, Joseph A. Paradiso
Title: Two Sonification Methods for the MindCube
Abstract:
In this work, we explore the musical interface potential of the MindCube, an interactive device designed to study emotions. Embedding diverse sensors and input devices, this interface resembles a fidget cube toy commonly used to help users relieve their stress and anxiety. As such, it is a particularly well-suited controller for musical systems that aim to help with emotion regulation. In this regard, we present two different mappings for the MindCube, with and without AI. With our generative AI mapping, we propose a way to infuse meaning within a latent space and techniques to navigate through it with an external controller. We discuss our results and propose directions for future work.

Authors:Lancelot Blanchard, Cameron Holt, Joseph A. Paradiso
Title: AI Harmonizer: Expanding Vocal Expression with a Generative Neurosymbolic Music AI System
Abstract:
Vocals harmonizers are powerful tools to help solo vocalists enrich their melodies with harmonically supportive voices. These tools exist in various forms, from commercially available pedals and software to custom-built systems, each employing different methods to generate harmonies. Traditional harmonizers often require users to manually specify a key or tonal center, while others allow pitch selection via an external keyboard-both approaches demanding some degree of musical expertise. The AI Harmonizer introduces a novel approach by autonomously generating musically coherent four-part harmonies without requiring prior harmonic input from the user. By integrating state-of-the-art generative AI techniques for pitch detection and voice modeling with custom-trained symbolic music models, our system arranges any vocal melody into rich choral textures. In this paper, we present our methods, explore potential applications in performance and composition, and discuss future directions for real-time implementations. While our system currently operates offline, we believe it represents a significant step toward AI-assisted vocal performance and expressive musical augmentation. We release our implementation on GitHub.

Authors:Abdulhaq Adetunji Salako, Christian Tominski
Title: Toward Understanding Similarity of Visualization Techniques
Abstract:
The literature describes many visualization techniques for different types of data, tasks, and application contexts, and new techniques are proposed on a regular basis. Visualization surveys try to capture the immense space of techniques and structure it with meaningful categorizations. Yet, it remains difficult to understand the similarity of visualization techniques in general. We approach this open research question from two angles. First, we follow a model-driven approach that is based on defining the signature of visualization techniques and interpreting the similarity of signatures as the similarity of their associated techniques. Second, following an expert-driven approach, we asked visualization experts in a small online study for their ad-hoc intuitive assessment of the similarity of pairs of visualization techniques. From both approaches, we gain insight into the similarity of a set of 13 basic and advanced visualizations for different types of data. While our results are so far preliminary and academic, they are first steps toward better understanding the similarity of visualization techniques.

Authors:Zhiwei Li, Carl Kesselman, Tran Huy Nguyen, Benjamin Yixing Xu, Kyle Bolo, Kimberley Yu
Title: From Data to Decision: Data-Centric Infrastructure for Reproducible ML in Collaborative eScience
Abstract:
Reproducibility remains a central challenge in machine learning (ML), especially in collaborative eScience projects where teams iterate over data, features, and models. Current ML workflows are often dynamic yet fragmented, relying on informal data sharing, ad hoc scripts, and loosely connected tools. This fragmentation impedes transparency, reproducibility, and the adaptability of experiments over time. This paper introduces a data-centric framework for lifecycle-aware reproducibility, centered around six structured artifacts: Dataset, Feature, Workflow, Execution, Asset, and Controlled Vocabulary. These artifacts formalize the relationships between data, code, and decisions, enabling ML experiments to be versioned, interpretable, and traceable over time. The approach is demonstrated through a clinical ML use case of glaucoma detection, illustrating how the system supports iterative exploration, improves reproducibility, and preserves the provenance of collaborative decisions across the ML lifecycle.

Authors:Xiangyang He, Jiale Li, Jiahao Chen, Yang Yang, Mingming Fan
Title: SimuPanel: A Novel Immersive Multi-Agent System to Simulate Interactive Expert Panel Discussion
Abstract:
Panel discussion allows the audience to learn different perspectives through interactive discussions among experts moderated by a host and a Q&A session with the audience. Despite its benefits, panel discussion in the real world is inaccessible to many who do not have the privilege to participate due to geographical, financial, and time constraints. We present SimuPanel, which simulates panel discussions among academic experts through LLM-based multi-agent interaction. It enables users to define topics of interest for the panel, observe the expert discussion, engage in Q&A, and take notes. SimuPanel employs a host-expert architecture where each panel member is simulated by an agent with specialized expertise, and the panel is visualized in an immersive 3D environment to enhance engagement. Traditional dialogue generation struggles to capture the depth and interactivity of real-world panel discussions. To address this limitation, we propose a novel multi-agent interaction framework that simulates authentic panel dynamics by modeling reasoning strategies and personas of experts grounded in multimedia sources. This framework enables agents to dynamically recall and contribute to the discussion based on past experiences from diverse perspectives. Our technical evaluation and the user study with university students show that SimuPanel was able to simulate more in-depth discussions and engage participants to interact with and reflect on the discussions. As a first step in this direction, we offer design implications for future avenues to improve and harness the power of panel discussion for multimedia learning.

Authors:Gregory Croisdale, Emily Huang, John Joon Young Chung, Anhong Guo, Xu Wang, Austin Z. Henley, Cyrus Omar
Title: DeckFlow: Iterative Specification on a Multimodal Generative Canvas
Abstract:
Generative AI promises to allow people to create high-quality personalized media. Although powerful, we identify three fundamental design problems with existing tooling through a literature review. We introduce a multimodal generative AI tool, DeckFlow, to address these problems. First, DeckFlow supports task decomposition by allowing users to maintain multiple interconnected subtasks on an infinite canvas populated by cards connected through visual dataflow affordances. Second, DeckFlow supports a specification decomposition workflow where an initial goal is iteratively decomposed into smaller parts and combined using feature labels and clusters. Finally, DeckFlow supports generative space exploration by generating multiple prompt and output variations, presented in a grid, that can feed back recursively into the next design iteration. We evaluate DeckFlow for text-to-image generation against a state-of-practice conversational AI baseline for image generation tasks. We then add audio generation and investigate user behaviors in a more open-ended creative setting with text, image, and audio outputs.

Authors:Massimo Chiriatti, Marianna Bergamaschi Ganapini, Enrico Panai, Brenda K. Wiederhold, Giuseppe Riva
Title: System 0: Transforming Artificial Intelligence into a Cognitive Extension
Abstract:
This paper introduces System 0, a conceptual framework for understanding how artificial intelligence functions as a cognitive extension preceding both intuitive (System 1) and deliberative (System 2) thinking processes. As AI systems increasingly shape the informational substrate upon which human cognition operates, they transform from passive tools into active cognitive partners. Building on the Extended Mind hypothesis and Heersmink's criteria for cognitive extension, we argue that AI systems satisfy key conditions for cognitive integration. These include reliability, trust, transparency, individualization, and the ability to enhance and transform human mental functions. However, AI integration creates a paradox: while expanding cognitive capabilities, it may simultaneously constrain thinking through sycophancy and bias amplification. To address these challenges, we propose seven evidence-based frameworks for effective human-AI cognitive integration: Enhanced Cognitive Scaffolding, which promotes progressive autonomy; Symbiotic Division of Cognitive Labor, strategically allocating tasks based on comparative strengths; Dialectical Cognitive Enhancement, countering AI sycophancy through productive epistemic tension; Agentic Transparency and Control, ensuring users understand and direct AI influence; Expertise Democratization, breaking down knowledge silos; Social-Emotional Augmentation, addressing affective dimensions of cognitive work; and Duration-Optimized Integration, managing the evolving human-AI relationship over time. Together, these frameworks provide a comprehensive approach for harnessing AI as a genuine cognitive extension while preserving human agency, critical thinking, and intellectual growth, transforming AI from a replacement for human cognition into a catalyst for enhanced thinking.

Authors:Emma R. Dodoo, Tamara Nelson-Fromm, Mark Guzdial
Title: The Teacher's Dilemma: Balancing Trade-Offs in Programming Education for Emergent Bilingual Students
Abstract:
K-12 computing teachers must navigate complex trade-offs when selecting programming languages and instructional materials for classrooms with emergent bilingual students. While they aim to foster an inclusive learning environment by addressing language barriers that impact student engagement, they must also align with K-12 computer science curricular guidelines and prepare students for industry-standard programming tools. Because programming languages predominantly use English keywords and most instructional materials are written in English, these linguistic barriers introduce cognitive load and accessibility challenges. This paper examines teachers' decisions in balancing these competing priorities, highlighting the tensions between accessibility, curriculum alignment, and workforce preparation. The findings shed light on how our teacher participants negotiate these trade-offs and what factors influence their selection of programming tools to best support EB students while meeting broader educational and professional goals.

Authors:Ivania Donoso-Guzmán, Kristýna Sirka Kacafírková, Maxwell Szymanski, An Jacobs, Denis Parra, Katrien Verbert
Title: A Systematic Review of User-Centred Evaluation of Explainable AI in Healthcare
Abstract:
Despite promising developments in Explainable Artificial Intelligence, the practical value of XAI methods remains under-explored and insufficiently validated in real-world settings. Robust and context-aware evaluation is essential, not only to produce understandable explanations but also to ensure their trustworthiness and usability for intended users, but tends to be overlooked because of no clear guidelines on how to design an evaluation with users. This study addresses this gap with two main goals: (1) to develop a framework of well-defined, atomic properties that characterise the user experience of XAI in healthcare; and (2) to provide clear, context-sensitive guidelines for defining evaluation strategies based on system characteristics. We conducted a systematic review of 82 user studies, sourced from five databases, all situated within healthcare settings and focused on evaluating AI-generated explanations. The analysis was guided by a predefined coding scheme informed by an existing evaluation framework, complemented by inductive codes developed iteratively. The review yields three key contributions: (1) a synthesis of current evaluation practices, highlighting a growing focus on human-centred approaches in healthcare XAI; (2) insights into the interrelations among explanation properties; and (3) an updated framework and a set of actionable guidelines to support interdisciplinary teams in designing and implementing effective evaluation strategies for XAI systems tailored to specific application contexts.

Authors:Twm Stone, Anna Soligo
Title: An LLM's Apology: Outsourcing Awkwardness in the Age of AI
Abstract:
A key part of modern social dynamics is flaking at short notice. However, anxiety in coming up with believable and socially acceptable reasons to do so can instead lead to 'ghosting', awkwardness, or implausible excuses, risking emotional harm and resentment in the other party. The ability to delegate this task to a Large Language Model (LLM) could substantially reduce friction and enhance the flexibility of user's social life while greatly minimising the aforementioned creative burden and moral qualms. We introduce FLAKE-Bench, an evaluation of models' capacity to effectively, kindly, and humanely extract themselves from a diverse set of social, professional and romantic scenarios. We report the efficacy of 10 frontier or recently-frontier LLMs in bailing on prior commitments, because nothing says "I value our friendship" like having AI generate your cancellation texts. We open-source FLAKE-Bench at github.com/Cloakless/flake-bench to support future research.

Authors:Tobias Hildebrandt, Lars Mehnen
Title: The Transition Matrix -- A classification of navigational patterns between LMS course sections
Abstract:
Learning management systems (LMS) like Moodle are increasingly used to support university teaching. As Moodle courses become more complex, incorporating diverse interactive elements, it is important to understand how students navigate through course sections and whether course designs are meeting student needs. While substantial research exists on student usage of individual LMS elements, there is a lack of research on broader navigational patterns between course sections and how these patterns differ across courses. This study analyzes navigational data from 747 courses in the Moodle LMS at a technical university of applied sciences, representing (after filtering) around 4,400 students and 1.8 million logged events. By mapping section names across a large sample of courses, the analysis enables cross-course comparisons of student navigational sequences between sections. Transition matrices and heat map visualizations are used to identify common navigational patterns. Findings include that many of the generated heatmap include one or more diagonal axis, indicating that students typically navigate from the current to the next or previous section. More fine-grained patterns show typical behavior for blended learning scenarios. Other patterns include dominant sections.

Authors:ATM Mizanur Rahman, Md Romael Haque, Sharifa Sultana
Title: DAIEM: Decolonizing Algorithm's Role as a Team-member in Informal E-market
Abstract:
In Bangladesh's rapidly expanding informal e-market, small-scale sellers use social media platforms like Facebook to run businesses outside formal infrastructures. These sellers rely heavily on platform algorithms, not just for visibility, but as active collaborators in business operations. Drawing on 37 in-depth interviews with sellers, buyers, and stakeholders, this paper examines how people in informal e-markets perceive and interact with the algorithm as a "team member" that performs sales, marketing, and customer engagement tasks. We found that while sellers and local tech entrepreneurs are interested in developing services to support this industry, buyers and investors place greater trust in human interactions. This reveals a postcolonial tension involving cultural values, local tech education and training, and a mismatch between the global and Bangladeshi e-market growth. We expand this discussion using perspectives from HCI, political design, and AI design. We also support the decoloniality movement in informal e-markets by proposing the DAIEM framework, which includes six components: autonomy and agency; resistance; locality, culture, and history; rationality; materiality; and advocacy. DAIEM serves as both a guideline for algorithm design and an analytical tool.

Authors:G. R. Lau, W. Y. Low, S. M. Koh, A. Hartanto
Title: Evaluating AI Alignment in Eleven LLMs through Output-Based Analysis and Human Benchmarking
Abstract:
Large language models (LLMs) are increasingly used in psychological research and practice, yet traditional benchmarks reveal little about the values they express in real interaction. We introduce PAPERS, an output-based evaluation of the values LLMs prioritise in their text. Study 1 thematically analysed responses from eleven LLMs, identifying five recurring dimensions (Purposeful Contribution, Adaptive Growth, Positive Relationality, Ethical Integrity, and Robust Functionality) with Self-Actualised Autonomy appearing only under a hypothetical sentience prompt. These results suggest that LLMs are trained to prioritise humanistic and utility values as dual objectives of optimal functioning, a pattern supported by existing AI alignment and prioritisation frameworks. Study 2 operationalised PAPERS as a ranking instrument across the same eleven LLMs, yielding stable, non-random value priorities alongside systematic between-model differences. Hierarchical clustering distinguished "human-centric" models (e.g., ChatGPT-4o, Claude Sonnet 4) that prioritised relational/ethical values from "utility-driven" models (e.g., Llama 4, Gemini 2.5 Pro) that emphasised operational priorities. Study 3 benchmarked four LLMs against human judgements (N = 376) under matched prompts, finding near-perfect rank-order convergence (r = .97-.98) but moderate absolute agreement; among tested models, ChatGPT-4o showed the closest alignment with human ratings (ICC = .78). Humans also showed limited readiness to endorse sentient AI systems. Taken together, PAPERS enabled systematic value audits and revealed trade-offs with direct implications for deployment: human-centric models aligned more closely with human value judgments and appear better suited for humanistic psychological applications, whereas utility-driven models emphasised functional efficiency and may be more appropriate for instrumental or back-office tasks.

Authors:Vivek Chavan, Arsen Cenaj, Shuyuan Shen, Ariane Bar, Srishti Binwani, Tommaso Del Becaro, Marius Funk, Lynn Greschner, Roberto Hung, Stina Klein, Romina Kleiner, Stefanie Krause, Sylwia Olbrych, Vishvapalsinhji Parmar, Jaleh Sarafraz, Daria Soroko, Daksitha Withanage Don, Chang Zhou, Hoang Thuy Duong Vu, Parastoo Semnani, Daniel Weinhardt, Elisabeth Andre, Jörg Krüger, Xavier Fresquet
Title: Feeling Machines: Ethics, Culture, and the Rise of Emotional AI
Abstract:
This paper explores the growing presence of emotionally responsive artificial intelligence through a critical and interdisciplinary lens. Bringing together the voices of early-career researchers from multiple fields, it explores how AI systems that simulate or interpret human emotions are reshaping our interactions in areas such as education, healthcare, mental health, caregiving, and digital life. The analysis is structured around four central themes: the ethical implications of emotional AI, the cultural dynamics of human-machine interaction, the risks and opportunities for vulnerable populations, and the emerging regulatory, design, and technical considerations. The authors highlight the potential of affective AI to support mental well-being, enhance learning, and reduce loneliness, as well as the risks of emotional manipulation, over-reliance, misrepresentation, and cultural bias. Key challenges include simulating empathy without genuine understanding, encoding dominant sociocultural norms into AI systems, and insufficient safeguards for individuals in sensitive or high-risk contexts. Special attention is given to children, elderly users, and individuals with mental health challenges, who may interact with AI in emotionally significant ways. However, there remains a lack of cognitive or legal protections which are necessary to navigate such engagements safely. The report concludes with ten recommendations, including the need for transparency, certification frameworks, region-specific fine-tuning, human oversight, and longitudinal research. A curated supplementary section provides practical tools, models, and datasets to support further work in this domain.

Authors:Sharifa Sultana, Hafsah Mahzabin Chowdhury, Zinnat Sultana, Nervo Verdezoto
Title: `Socheton': A Culturally Appropriate AI Tool to Support Reproductive Well-being
Abstract:
Reproductive well-being education in the Global South is often challenged as many communities perceive many of its contents as misinformation, misconceptions, and language-inappropriate. Our ten-month-long ethnographic study (n=41) investigated the impact of sociocultural landscape, cultural beliefs, and healthcare infrastructure on Bangladeshi people's access to quality reproductive healthcare and set four design goals: combating misinformation, including culturally appropriate language, professionals' accountable moderation, and promoting users' democratic participation. Building on the model of `\textit{Distributive Justice,}' we designed and evaluated \textit{`Socheton,'} a culturally appropriate AI-mediated tool for reproductive well-being that includes healthcare professionals, AI-language teachers, and community members to moderate and run the activity-based platform. Our user study (n=28) revealed that only combating misinformation and language inappropriateness may still leave the community with a conservative mob culture and patronize reproductive care-seeking. This guides well-being HCI design toward being culturally appropriate in the context of reproductive justice with sensitive marginalized communities.

Authors:Jieyu Zhou, Rui Shen, Yue You, Carl DiSalvo, Lynn Dombrowski, Christopher MacLellan
Title: Improving Public Service Chatbot Design and Civic Impact: Investigation of Citizens' Perceptions of a Metro City 311 Chatbot
Abstract:
As governments increasingly adopt digital tools, public service chatbots have emerged as a growing communication channel. This paper explores the design considerations and engagement opportunities of public service chatbots, using a 311 chatbot from a metropolitan city as a case study. Our qualitative study consisted of official survey data and 16 interviews examining stakeholder experiences and design preferences for the chatbot. We found two key areas of concern regarding these public chatbots: individual-level and community-level. At the individual level, citizens experience three key challenges: interpretation, transparency, and social contextualization. Moreover, the current chatbot design prioritizes the efficient completion of individual tasks but neglects the broader community perspective. It overlooks how individuals interact and discuss problems collectively within their communities. To address these concerns, we offer design opportunities for creating more intelligent, transparent, community-oriented chatbots that better engage individuals and their communities.

Authors:Nishan Gunawardena, Gough Yumu Lui, Bahman Javadi, Jeewani Anupama Ginige
Title: Evaluating Sensitivity Parameters in Smartphone-Based Gaze Estimation: A Comparative Study of Appearance-Based and Infrared Eye Trackers
Abstract:
This study evaluates a smartphone-based, deep-learning eye-tracking algorithm by comparing its performance against a commercial infrared-based eye tracker, the Tobii Pro Nano. The aim is to investigate the feasibility of appearance-based gaze estimation under realistic mobile usage conditions. Key sensitivity factors, including age, gender, vision correction, lighting conditions, device type, and head position, were systematically analysed. The appearance-based algorithm integrates a lightweight convolutional neural network (MobileNet-V3) with a recurrent structure (Long Short-Term Memory) to predict gaze coordinates from grayscale facial images. Gaze data were collected from 51 participants using dynamic visual stimuli, and accuracy was measured using Euclidean distance. The deep learning model produced a mean error of 17.76 mm, compared to 16.53 mm for the Tobii Pro Nano. While overall accuracy differences were small, the deep learning-based method was more sensitive to factors such as lighting, vision correction, and age, with higher failure rates observed under low-light conditions among participants using glasses and in older age groups. Device-specific and positional factors also influenced tracking performance. These results highlight the potential of appearance-based approaches for mobile eye tracking and offer a reference framework for evaluating gaze estimation systems across varied usage conditions.

Authors:ATM Mizanur Rahman, Sharifa Sultana
Title: Digital Labor: Challenges, Ethical Insights, and Implications
Abstract:
Digital workers on crowdsourcing platforms (e.g., Amazon Mechanical Turk, Appen, Clickworker, Prolific) play a crucial role in training and improving AI systems, yet they often face low pay, unfair conditions, and a lack of recognition for their contributions. To map these issues in the existing literature of computer science, AI, and related scholarship, we selected over 300 research papers on digital labor published between 2015 and 2024, narrowing them down to 143 on digital gig-labor for a detailed analysis. This analysis provides a broad overview of the key challenges, concerns, and trends in the field. Our synthesis reveals how the persistent patterns of representation and voices of gig workers in digital labor are structured and governed. We offer new insights for researchers, platform designers, and policymakers, helping them better understand the experiences of digital workers and pointing to key areas where interventions and future investigations are promptly needed. By mapping the findings from the past ten years' growth of the domain and possible implications, this paper contributes to a more coherent and critical understanding of digital labor in contemporary and future AI ecosystems.

Authors:Daniel Hove Paludan, Julie Fredsgård, Kasper Patrick Bährentz, Ilhan Aslan
Title: "If we misunderstand the client, we misspend 100 hours": Exploring conversational AI and response types for information elicitation
Abstract:
Client-designer alignment is crucial to the success of design projects, yet little research has explored how digital technologies might influence this alignment. To address this gap, this paper presents a three-phase study investigating how digital systems can support requirements elicitation in professional design practice. Specifically, it examines how integrating a conversational agent and choice-based response formats into a digital elicitation tool affects early-stage client-designer collaboration. The first phase of the study inquired into the current practices of 10 design companies through semi-structured interviews, informing the system's design. The second phase evaluated the system using a 2x2 factorial design with 50 mock clients, quantifying the effects of conversational AI and response type on user experience and perceived preparedness. In phase three, the system was presented to seven of the original 10 companies to gather reflections on its value, limitations, and potential integration into practice. Findings show that both conversational AI and choice-based responses lead to lower dependability scores on the User Experience Questionnaire, yet result in client input with greater clarity. We contribute design implications for integrating conversational AI and choice-based responses into elicitation tools to support mutual understanding in early-stage client-designer collaboration.

Authors:Alemitu Bezabih, Shadi Nourriz, Anne-Marie Snider, Rosalie Rauenzahn, George Handzo, C. Estelle Smith
Title: Meeting Patients Where They're At: Toward the Expansion of Chaplaincy Care into Online Spiritual Care Communities
Abstract:
Despite a growing need for spiritual care in the US, it is often under-served, inaccessible, or misunderstood, while almost no prior work in CSCW/HCI research has engaged with professional chaplains and spiritual care providers. This interdisciplinary study aims to develop a foundational understanding of how spiritual care may (or may not) be expanded into online spaces -- especially focusing on anonymous, asynchronous, and text-based online communities. We conducted an exploratory mixed-methods study with chaplains (N=22) involving interviews and user testing sessions centered around Reddit support communities to understand participants' perspectives on technology and their ideations about the role of chaplaincy in prospective Online Spiritual Care Communities (OSCCs). Our Grounded Theory Method analysis highlighted benefits of OSCCs including: meeting patients where they are at; accessibility and scalability; and facilitating patient-initiated care. Chaplains highlighted how their presence in OSCCs could help with shaping peer interactions, moderation, synchronous chats for group care, and redirecting to external resources, while also raising important feasibility concerns, risks, and needs for future design and research. We used an existing taxonomy of chaplaincy techniques to show that some spiritual care strategies may be amenable to online spaces, yet we also exposed the limitations of technology to fully mediate spiritual care and the need to develop new online chaplaincy interventions. Based on these findings, we contribute the model of a ``Care Loop'' between institutionally-based formal care and platform-based community care to expand access and drive greater awareness and utilization of spiritual care. We also contribute design implications to guide future work in online spiritual care.

Authors:Jubin Abhishek Soni, Amit Anand, Rajesh Kumar Pandey, Aniket Abhishek Soni
Title: Dynamic Context Tuning for Retrieval-Augmented Generation: Enhancing Multi-Turn Planning and Tool Adaptation
Abstract:
Retrieval-Augmented Generation (RAG) has significantly advanced large language models (LLMs) by grounding their outputs in external tools and knowledge sources. However, existing RAG systems are typically constrained to static, single-turn interactions with fixed toolsets, making them ill-suited for dynamic domains such as healthcare and smart homes, where user intent, available tools, and contextual factors evolve over time. We present Dynamic Context Tuning (DCT), a lightweight framework that extends RAG to support multi-turn dialogue and evolving tool environments without requiring retraining. DCT integrates an attention-based context cache to track relevant past information, LoRA-based retrieval to dynamically select domain-specific tools, and efficient context compression to maintain inputs within LLM context limits. Experiments on both synthetic and real-world benchmarks show that DCT improves plan accuracy by 14% and reduces hallucinations by 37%, while matching GPT-4 performance at significantly lower cost. Furthermore, DCT generalizes to previously unseen tools, enabling scalable and adaptable AI assistants across a wide range of dynamic environments.

Authors:Ziwen Wang, Yue Zhang, Zhiqiang Zhang, Sheng Quan Xie, Alexander Lanzon, William P. Heath, Zhenhong Li
Title: Instance-Based Transfer Learning with Similarity-Aware Subject Selection for Cross-Subject SSVEP-Based BCIs
Abstract:
Steady-state visual evoked potential (SSVEP)-based brain-computer interfaces (BCIs) can achieve high recognition accuracy with sufficient training data. Transfer learning presents a promising solution to alleviate data requirements for the target subject by leveraging data from source subjects; however, effectively addressing individual variability among both target and source subjects remains a challenge. This paper proposes a novel transfer learning framework, termed instance-based task-related component analysis (iTRCA), which leverages knowledge from source subjects while considering their individual contributions. iTRCA extracts two types of features: (1) the subject-general feature, capturing shared information between source and target subjects in a common latent space, and (2) the subject-specific feature, preserving the unique characteristics of the target subject. To mitigate negative transfer, we further design an enhanced framework, subject selection-based iTRCA (SS-iTRCA), which integrates a similarity-based subject selection strategy to identify appropriate source subjects for transfer based on their task-related components (TRCs). Comparative evaluations on the Benchmark, BETA, and a self-collected dataset demonstrate the effectiveness of the proposed iTRCA and SS-iTRCA frameworks. This study provides a potential solution for developing high-performance SSVEP-based BCIs with reduced target subject data.

Authors:Jiaying Lizzy Liu, Yan Zhang
Title: Video-Mediated Emotion Disclosure: Expressions of Fear, Sadness, and Joy by People with Schizophrenia on YouTube
Abstract:
Individuals with schizophrenia frequently experience intense emotions and often turn to vlogging as a medium for emotional expression. While previous research has predominantly focused on text based disclosure, little is known about how individuals construct narratives around emotions and emotional experiences in video blogs. Our study addresses this gap by analyzing 200 YouTube videos created by individuals with schizophrenia. Drawing on media research and self presentation theories, we developed a visual analysis framework to disentangle these videos. Our analysis revealed diverse practices of emotion disclosure through both verbal and visual channels, highlighting the dynamic interplay between these modes of expression. We found that the deliberate construction of visual elements, including environmental settings and specific aesthetic choices, appears to foster more supportive and engaged viewer responses. These findings underscore the need for future large scale quantitative research examining how visual features shape video mediated communication on social media platforms. Such investigations would inform the development of care centered video sharing platforms that better support individuals managing illness experiences.

Authors:Nicholas Vincent, Matthew Prewitt, Hanlin Li
Title: Collective Bargaining in the Information Economy Can Address AI-Driven Power Concentration
Abstract:
This position paper argues that there is an urgent need to restructure markets for the information that goes into AI systems. Specifically, producers of information goods (such as journalists, researchers, and creative professionals) need to be able to collectively bargain with AI product builders in order to receive reasonable terms and a sustainable return on the informational value they contribute. We argue that without increased market coordination or collective bargaining on the side of these primary information producers, AI will exacerbate a large-scale "information market failure" that will lead not only to undesirable concentration of capital, but also to a potential "ecological collapse" in the informational commons. On the other hand, collective bargaining in the information economy can create market frictions and aligned incentives necessary for a pro-social, sustainable AI future. We provide concrete actions that can be taken to support a coalition-based approach to achieve this goal. For example, researchers and developers can establish technical mechanisms such as federated data management tools and explainable data value estimations, to inform and facilitate collective bargaining in the information economy. Additionally, regulatory and policy interventions may be introduced to support trusted data intermediary organizations representing guilds or syndicates of information producers.

Authors:James Eschrich, Cole McMullen, Sarah Sterman
Title: Speculative Design in Spiraling Time: Methods and Indigenous HCI
Abstract:
In this position paper, we first discuss the uptake of speculative design as a method for Indigenous HCI. Then, we outline how a key assumption about temporality threatens to undermine the usefulness of speculative design in this context. Finally, we briefly sketch out a possible alternative understanding of speculative design, based on the concept of "spiraling time," which could be better suited for Indigenous HCI.

Authors:Yuanhaur Chang, Oren Heller, Yaniv Shlomo, Iddo Bar-Noy, Ella Bokobza, Michal Grinstein-Weiss, Ning Zhang
Title: Mind the Gap: Revealing Security Barriers through Situational Awareness of Small and Medium Business Key Decision-Makers
Abstract:
Key decision-makers in small and medium businesses (SMBs) often lack the awareness and knowledge to implement cybersecurity measures effectively. To gain a deeper understanding of how SMB executives navigate cybersecurity decision-making, we deployed a mixed-method approach, conducting semi-structured interviews (n=21) and online surveys (n=322) with SMB key decision-makers. Using thematic analysis, we revealed SMB decision-makers' perceived risks in terms of the digital assets they valued, and found reasons for their choice of defense measures and factors impacting security perception. We employed the situational awareness model to characterize decision-makers based on cybersecurity awareness, identifying those who have comparatively low awareness in the fight against adversaries. We further explored the relationship between awareness and business attributes, and constructed a holistic structural equation model to understand how awareness can be improved. Finally, we proposed interventions to help SMBs overcome potential challenges.

Authors:Wentao Ge, Yuqing Sun, Ziyan Wang, Haoyue Zheng, Weiyang He, Piaohong Wang, Qianyu Zhu, Benyou Wang
Title: SRLAgent: Enhancing Self-Regulated Learning Skills through Gamification and LLM Assistance
Abstract:
Self-regulated learning (SRL) is crucial for college students navigating increased academic demands and independence. Insufficient SRL skills can lead to disorganized study habits, low motivation, and poor time management, undermining learners ability to thrive in challenging environments. Through a formative study involving 59 college students, we identified key challenges students face in developing SRL skills, including difficulties with goal-setting, time management, and reflective learning. To address these challenges, we introduce SRLAgent, an LLM-assisted system that fosters SRL skills through gamification and adaptive support from large language models (LLMs). Grounded in Zimmermans three-phase SRL framework, SRLAgent enables students to engage in goal-setting, strategy execution, and self-reflection within an interactive game-based environment. The system offers real-time feedback and scaffolding powered by LLMs to support students independent study efforts. We evaluated SRLAgent using a between-subjects design, comparing it to a baseline system (SRL without Agent features) and a traditional multimedia learning condition. Results showed significant improvements in SRL skills within the SRLAgent group (p < .001, Cohens d = 0.234) and higher engagement compared to the baselines. This work highlights the value of embedding SRL scaffolding and real-time AI support within gamified environments, offering design implications for educational technologies that aim to promote deeper learning and metacognitive skill development.

Authors:Karen Joy, Tawfiq Ammari, Alyssa Sheehan
Title: Beyond the Hype: Mapping Uncertainty and Gratification in AI Assistant Use
Abstract:
This paper examines the gap between the promises and real-world performance of emerging AI personal assistants. Drawing on interviews with early adopters of devices like Rabbit R1 and Humane AI Pin, as well as services like Ohai and Docus, we map user experiences through the lens of Uses and Gratifications and Uncertainty Reduction Theory. We identify three core types of user uncertainty, functional, interactional, and social, and explore how each disrupts different user gratifications. We show that while marketing hype fuels initial adoption, unmet expectations often result in frustration or abandonment. Our findings highlight the importance of transparency, task-specific design, and user control over contextual memory and personalization. We provide design and policy recommendations, including user-facing explainability tools and calls for regulatory benchmarks such as CI Bench, to guide ethical and interpretable AI integration. Our study offers actionable insights for creating more usable, trustworthy, and socially aligned AI assistants.

Authors:Qing, Xia, Advait Sarkar, Duncan Brumby, Anna Cox
Title: "How do you even know that stuff?": Barriers to expertise sharing among spreadsheet users
Abstract:
Spreadsheet collaboration provides valuable opportunities for learning and expertise sharing between colleagues. Sharing expertise is essential for the retention of important technical skillsets within organisations, but previous studies suggest that spreadsheet experts often fail to disseminate their knowledge to others. We suggest that social norms and beliefs surrounding the value of spreadsheet use significantly influence user engagement in sharing behaviours. To explore this, we conducted 31 semi-structured interviews with professional spreadsheet users from two separate samples. We found that spreadsheet providers face challenges in adapting highly personalised strategies to often subjective standards and evaluating the appropriate social timing of sharing. In addition, conflicted self-evaluations of one's spreadsheet expertise, dismissive normative beliefs about the value of this knowledge, and concerns about the potential disruptions associated with collaboration can further deter sharing. We suggest these observations reflect the challenges of long-term learning in feature-rich software designed primarily with initial learnability in mind. We therefore provide implications for design to navigate this tension. Overall, our findings demonstrate how the complex interaction between technology design and social dynamics can shape collaborative learning behaviours in the context of feature-rich software.

Authors:Vishwa Mohan Singh, Sai Anirudh Aryasomayajula, Ahan Chatterjee, Beste Aydemir, Rifat Mehreen Amin
Title: Improving AI-generated music with user-guided training
Abstract:
AI music generation has advanced rapidly, with models like diffusion and autoregressive algorithms enabling high-fidelity outputs. These tools can alter styles, mix instruments, or isolate them. Since sound can be visualized as spectrograms, image-generation algorithms can be applied to generate novel music. However, these algorithms are typically trained on fixed datasets, which makes it challenging for them to interpret and respond to user input accurately. This is especially problematic because music is highly subjective and requires a level of personalization that image generation does not provide. In this work, we propose a human-computation approach to gradually improve the performance of these algorithms based on user interactions. The human-computation element involves aggregating and selecting user ratings to use as the loss function for fine-tuning the model. We employ a genetic algorithm that incorporates user feedback to enhance the baseline performance of a model initially trained on a fixed dataset. The effectiveness of this approach is measured by the average increase in user ratings with each iteration. In the pilot test, the first iteration showed an average rating increase of 0.2 compared to the baseline. The second iteration further improved upon this, achieving an additional increase of 0.39 over the first iteration.

Authors:Lin Kyi, Cristiana Santos, Sushil Ammanaghatta Shivakumar, Franziska Roesner, Asia Biega
Title: Turning to Online Forums for Legal Information: A Case Study of GDPR's Legitimate Interests
Abstract:
Practitioners building online services and tools often turn to online forums such as Reddit, Law Stack Exchange, and Stack Overflow for legal guidance to ensure compliance with the GDPR. The legal information presented in these forums directly impacts present-day industry practitioner's decisions. Online forums can serve as gateways that, depending on the accuracy and quality of the answers provided, may either support or undermine the protection of privacy and data protection fundamental rights. However, there is a need for deeper investigation into practitioners' decision-making processes and their understanding of legal compliance when seeking for legal information online. Using GDPR's ``legitimate interests'' legal ground for processing personal data as a case study, we investigate how practitioners use online forums to identify common areas of confusion in applying legitimate interests in practice, and evaluate how legally sound online forum responses are. Our analysis found that applying the legal basis of legitimate interest is complex for practitioners, with important implications for how the GDPR is implemented in practice. The legal analysis showed that crowdsourced legal information tends to be legally sound, though sometimes incomplete. We outline recommendations to improve the quality of online forums by ensuring that responses are more legally sound and comprehensive, enabling practitioners to apply legitimate interests effectively in practice and uphold the GDPR.

Authors:Rifat Mehreen Amin, Oliver Hans Kühle, Daniel Buschek, Andreas Butz
Title: PromptCanvas: Composable Prompting Workspaces Using Dynamic Widgets for Exploration and Iteration in Creative Writing
Abstract:
We introduce PromptCanvas, a concept that transforms prompting into a composable, widget-based experience on an infinite canvas. Users can generate, customize, and arrange interactive widgets representing various facets of their text, offering greater control over AI-generated content. PromptCanvas allows widget creation through system suggestions, user prompts, or manual input, providing a flexible environment tailored to individual needs. This enables deeper engagement with the creative process. In a lab study with 18 participants, PromptCanvas outperformed a traditional conversational UI on the Creativity Support Index. Participants found that it reduced cognitive load, with lower mental demand and frustration. Qualitative feedback revealed that the visual organization of thoughts and easy iteration encouraged new perspectives and ideas. A follow-up field study (N=10) confirmed these results, showcasing the potential of dynamic, customizable interfaces in improving collaborative writing with AI.

Authors:Han Zhang, KaWing Tsang, Zhenhui Peng
Title: VChatter: Exploring Generative Conversational Agents for Simulating Exposure Therapy to Reduce Social Anxiety
Abstract:
Many people struggle with social anxiety, feeling fear, or even physically uncomfortable in social situations like talking to strangers. Exposure therapy, a clinical method that gradually and repeatedly exposes individuals to the source of their fear and helps them build coping mechanisms, can reduce social anxiety but traditionally requires human therapists' guidance and constructions of situations. In this paper, we developed a multi-agent system VChatter to explore large language models(LLMs)-based conversational agents for simulating exposure therapy with users. Based on a survey study (N=36) and an expert interview, VChatter includes an Agent-P, which acts as a psychotherapist to design the exposure therapy plans for users, and two Agent-Hs, which can take on different interactive roles in low, medium, and high exposure scenarios. A six-day qualitative study (N=10) showcases VChatter's usefulness in reducing users' social anxiety, feelings of isolation, and avoidance of social interactions. We demonstrated the feasibility of using LLMs-based conversational agents to simulate exposure therapy for addressing social anxiety and discussed future concerns for designing agents tailored to social anxiety.

Authors:Joonyoung Park, Hyewon Cho, Hyehyun Chu, Yeeun Lee, Hajin Lim
Title: NoRe: Augmenting Journaling Experience with Generative AI for Music Creation
Abstract:
Journaling has long been recognized for fostering emotional awareness and self-reflection, and recent advancements in generative AI offer new opportunities to create personalized music that can enhance these practices. In this study, we explore how AI-generated music can augment the journaling experience. Through a formative study, we examined journal writers' writing patterns, purposes, emotional regulation strategies, and the design requirements for the system that augments journaling experience by journal-based AI-generated music. Based on these insights, we developed NoRe, a system that transforms journal entries into personalized music using generative AI. In a seven-day in-the-wild study (N=15), we investigated user engagement and perceived emotional effectiveness through system logs, surveys, and interviews. Our findings suggest that journal-based music generation could support emotional reflection and provide vivid reminiscence of daily experiences. Drawing from these findings, we discuss design implications for tailoring music to journal writers' emotional states and preferences.

Authors:Chenlong Wang, Jiaao Li, Shuailei Zhang, Wenbo Ding, Xinlei Chen
Title: Fast SSVEP Detection Using a Calibration-Free EEG Decoding Framework
Abstract:
Steady-State Visual Evoked Potential is a brain response to visual stimuli flickering at constant frequencies. It is commonly used in brain-computer interfaces for direct brain-device communication due to their simplicity, minimal training data, and high information transfer rate. Traditional methods suffer from poor performance due to reliance on prior knowledge, while deep learning achieves higher accuracy but requires substantial high-quality training data for precise signal decoding. In this paper, we propose a calibration-free EEG signal decoding framework for fast SSVEP detection. Our framework integrates Inter-Trial Remixing & Context-Aware Distribution Alignment data augmentation for EEG signals and employs a compact architecture of small fully connected layers, effectively addressing the challenge of limited EEG data availability. Additionally, we propose an Adaptive Spectrum Denoise Module that operates in the frequency domain based on global features, requiring only linear complexity to reduce noise in EEG data and improve data quality. For calibration-free classification experiments on short EEG signals from three public datasets, our framework demonstrates statistically significant accuracy advantages(p<0.05) over existing methods in the majority of cases, while requiring at least 52.7% fewer parameters and 29.9% less inference time. By eliminating the need for user-specific calibration, this advancement significantly enhances the usability of BCI systems, accelerating their commercialization and widespread adoption in real-world applications.

Authors:Jiawen Stefanie Zhu, Jian Zhao
Title: Understanding Remote Communication between Grandparents and Grandchildren in Distributed Immigrant Families
Abstract:
Grandparent-grandchild bonds are crucial for both parties. Many immigrant families are geographically dispersed, and the grandparents and grandchildren need to rely on remote communication to maintain their relationships. In addition to geographical separation, grandparents and grandchildren in such families also face language and culture barriers during remote communication. The associated challenges and needs remain understudied as existing research primarily focuses on non-immigrant families or co-located immigrant families. To address this gap, we conducted interviews with six Chinese immigrant families in Canada. Our findings highlight unique challenges faced by immigrant families during remote communication, such as amplified language and cultural barriers due to geographic separation, and provide insights into how technology can better support remote communication. This work offers empirical knowledge about the communication needs of distributed immigrant families and provides directions for future research and design to support grandparent-grandchild remote communication in these families.

Authors:Xingru Zhou, Sadanand Modak, Yao-Cheng Chan, Zhiyun Deng, Luis Sentis, Maria Esteva
Title: Curate, Connect, Inquire: A System for Findable Accessible Interoperable and Reusable (FAIR) Human-Robot Centered Datasets
Abstract:
The rapid growth of AI in robotics has amplified the need for high-quality, reusable datasets, particularly in human-robot interaction (HRI) and AI-embedded robotics. While more robotics datasets are being created, the landscape of open data in the field is uneven. This is due to a lack of curation standards and consistent publication practices, which makes it difficult to discover, access, and reuse robotics data. To address these challenges, this paper presents a curation and access system with two main contributions: (1) a structured methodology to curate, publish, and integrate FAIR (Findable, Accessible, Interoperable, Reusable) human-centered robotics datasets; and (2) a ChatGPT-powered conversational interface trained with the curated datasets metadata and documentation to enable exploration, comparison robotics datasets and data retrieval using natural language. Developed based on practical experience curating datasets from robotics labs within Texas Robotics at the University of Texas at Austin, the system demonstrates the value of standardized curation and persistent publication of robotics data. The system's evaluation suggests that access and understandability of human-robotics data are significantly improved. This work directly aligns with the goals of the HCRL @ ICRA 2025 workshop and represents a step towards more human-centered access to data for embodied AI.

Authors:Konstantin Aal, Tanja Aal, Vasil Navumau, David Unbehaun, Claudia Müller, Volker Wulf, Sarah Rüller
Title: Feeling Guilty Being a c(ai)borg: Navigating the Tensions Between Guilt and Empowerment in AI Use
Abstract:
This paper explores the emotional, ethical and practical dimensions of integrating Artificial Intelligence (AI) into personal and professional workflows, focusing on the concept of feeling guilty as a 'c(ai)borg' - a human augmented by AI. Inspired by Donna Haraway's Cyborg Manifesto, the study explores how AI challenges traditional notions of creativity, originality and intellectual labour. Using an autoethnographic approach, the authors reflect on their year-long experiences with AI tools, revealing a transition from initial guilt and reluctance to empowerment through skill-building and transparency. Key findings highlight the importance of basic academic skills, advanced AI literacy and honest engagement with AI results. The c(ai)borg vision advocates for a future where AI is openly embraced as a collaborative partner, fostering innovation and equity while addressing issues of access and agency. By reframing guilt as growth, the paper calls for a thoughtful and inclusive approach to AI integration.

Authors:Yuri Miyagi, Nils Rodrigues, Daniel Weiskopf, Takayuki Itoh
Title: Visualization and Comparison of AOI Transitions with Force-Directed Graph Layout
Abstract:
By analyzing the gaze trajectories of people viewing screens and advertisements, we can determine what people are interested in. This knowledge can be effective when recommending commercial products and services, and also, when improving advertisement design. Therefore, analysis and visualization of eye gaze have been an active research topic. This paper proposes a new method for visualizing patterns of the gaze trajectories of multiple people by (1) visualizing patterns that move through multiple areas of interest (AOI) and (2) visualizing differences among multiple gaze trajectories. The method first constructs a hierarchical AOI structure to a Web page or an image, and uses this structure to convert the trajectory into a sequence of symbols. We apply N-grams to the generated symbol sequences to extract transition patterns between AOIs. Finally, the method visualizes a list of the pattern extraction results and the shapes of the characteristic elements. We present the visualization of gaze trajectories for three examples of stimuli, and argue that analysts can efficiently discover trends in gaze transitions between text and figures, as well as differences between participants of the eye-tracking experiments.

Authors:Mahmood Jasim, Narges Mahyar
Title: Beyond the Prototype: Challenges of Long-Term Integration of Visual Analytics in Civic Spaces
Abstract:
Despite the recognized benefits of visual analytics systems in supporting data-driven decision-making, their deployment in real-world civic contexts often faces significant barriers. Beyond technical challenges such as resource constraints and development complexity, sociotechnical factors, including organizational hierarchies, misalignment between designers and stakeholders, and concerns around technology adoption hinder their sustained use. In this work, we reflect on our collective experiences of designing, developing, and deploying visual analytics systems in the civic domain and discuss challenges across design and adoption aspects. We emphasize the need for deeper integration strategies, equitable stakeholder engagement, and sustainable implementation frameworks to bridge the gap between research and practice.

Authors:Vahid Danesh, Paul Arauz, Maede Boroji, Andrew Zhu, Mia Cottone, Elaine Gould, Fazel A. Khan, Imin Kao
Title: Improved Accuracy in Pelvic Tumor Resections Using a Real-Time Vision-Guided Surgical System
Abstract:
Pelvic bone tumor resections remain significantly challenging due to complex three-dimensional anatomy and limited surgical visualization. Current navigation systems and patient-specific instruments, while accurate, present limitations including high costs, radiation exposure, workflow disruption, long production time, and lack of reusability. This study evaluates a real-time vision-guided surgical system combined with modular jigs to improve accuracy in pelvic bone tumor resections. A vision-guided surgical system combined with modular cutting jigs and real-time optical tracking was developed and validated. Five female pelvis sawbones were used, with each hemipelvis randomly assigned to either the vision-guided and modular jig system or traditional freehand method. A total of twenty resection planes were analyzed for each method. Accuracy was assessed by measuring distance and angular deviations from the planned resection planes. The vision-guided and modular jig system significantly improved resection accuracy compared to the freehand method, reducing the mean distance deviation from 2.07 $\pm$ 1.71 mm to 1.01 $\pm$ 0.78 mm (p=0.0193). In particular, all specimens resected using the vision-guided system exhibited errors of less than 3 mm. Angular deviations also showed significant improvements with roll angle deviation reduced from 15.36 $\pm$ 17.57$^\circ$ to 4.21 $\pm$ 3.46$^\circ$ (p=0.0275), and pitch angle deviation decreased from 6.17 $\pm$ 4.58$^\circ$ to 1.84 $\pm$ 1.48$^\circ$ (p<0.001). The proposed vision-guided and modular jig system significantly improves the accuracy of pelvic bone tumor resections while maintaining workflow efficiency. This cost-effective solution provides real-time guidance without the need for referencing external monitors, potentially improving surgical outcomes in complex pelvic bone tumor cases.

Authors:Marco Hirsch, Peter Hevesi, Paul Lukowicz
Title: The CASE Framework -- A New Architecture for Participatory Research and Digital Health Surveillance
Abstract:
We present the CASE framework, an open-source platform for adaptive, context-aware participatory research, and pandemic preparedness. CASE implements an event-driven architecture that enables dynamic survey workflows, allowing real-time adaptation based on participant responses, external data, temporal conditions, and evolving user states. The framework supports a broad range of research needs, from simple one-time questionnaires to complex longitudinal studies with advanced conditional logic. Built on over a decade of practical experience, CASE underwent a major architectural rework in 2024, transitioning from a microservice-based design to a streamlined monolithic architecture. This evolution significantly improved maintainability, flexibility, and accessibility to deployment, particularly for institutions with limited technical capacity. CASE has been successfully deployed across diverse domains, powering national disease surveillance platforms, supporting post-COVID cohort studies, and enabling real-time sentiment analysis during political events. These applications, involving tens of thousands of participants, demonstrate the framework's scalability, versatility, and practical value. This paper describes the foundations of CASE, details its architectural evolution, and presents lessons learned from real-world deployments. We establish CASE as a mature and reusable research infrastructure that balances sophisticated functionality with practical implementation, addressing the critical global need for sustainable and institutionally controlled data collection systems.

Authors:Sara Johansson Fernstad, Sarah Alsufyani, Silvia Del Din, Alison Yarnall, Lynn Rochester
Title: To Measure What Isn't There -- Visual Exploration of Missingness Structures Using Quality Metrics
Abstract:
This paper contributes a set of quality metrics for identification and visual analysis of structured missingness in high-dimensional data. Missing values in data are a frequent challenge in most data generating domains and may cause a range of analysis issues. Structural missingness in data may indicate issues in data collection and pre-processing, but may also highlight important data characteristics. While research into statistical methods for dealing with missing data are mainly focusing on replacing missing values with plausible estimated values, visualization has great potential to support a more in-depth understanding of missingness structures in data. Nonetheless, while the interest in missing data visualization has increased in the last decade, it is still a relatively overlooked research topic with a comparably small number of publications, few of which address scalability issues. Efficient visual analysis approaches are needed to enable exploration of missingness structures in large and high-dimensional data, and to support informed decision-making in context of potential data quality issues. This paper suggests a set of quality metrics for identification of patterns of interest for understanding of structural missingness in data. These quality metrics can be used as guidance in visual analysis, as demonstrated through a use case exploring structural missingness in data from a real-life walking monitoring study. All supplemental materials for this paper are available at https://doi.org/10.25405/data.ncl.c.7741829.

Authors:Grzegorz Wolny, Michał Szczerbak
Title: Voice CMS: updating the knowledge base of a digital assistant through conversation
Abstract:
In this study, we propose a solution based on a multi-agent LLM architecture and a voice user interface (VUI) designed to update the knowledge base of a digital assistant. Its usability is evaluated in comparison to a more traditional graphical content management system (CMS), with a focus on understanding the relationship between user preferences and the complexity of the information being provided. The findings demonstrate that, while the overall usability of the VUI is rated lower than the graphical interface, it is already preferred by users for less complex tasks. Furthermore, the quality of content entered through the VUI is comparable to that achieved with the graphical interface, even for highly complex tasks. Obtained qualitative results suggest that a hybrid interface combining the strengths of both approaches could address the key challenges identified during the experiment, such as reducing cognitive load through graphical feedback while maintaining the intuitive nature of voice-based interactions. This work highlights the potential of conversational interfaces as a viable and effective method for knowledge management in specific business contexts.

Authors:Ryosuke Kohita, Akira Kasuga
Title: System-driven Cloud Architecture Design Support with Structured State Management and Guided Decision Assistance
Abstract:
Cloud architecture design is a complex process requiring both technical expertise and architectural knowledge to develop solutions from frequently ambiguous requirements. We present CloudArchitectBuddy, a system-driven cloud architecture design support application with two key mechanisms: (1) structured state management that enhances design understanding through explicit representation of requirements and architectural decisions, and (2) guided decision assistance that facilitates design progress through proactive verification and requirement refinement. Our study with 16 industry practitioners showed that while our approach achieved comparable design quality to a chat interface, participants rated our system higher for usability and appreciated its ability to help understand architectural relationships and identify missing requirements. However, participants also expressed a need for user-initiated interactions where they could freely provide design instructions and engage in detailed discussions with LLMs. These results suggest that integrating a chat interface into our structured and guided workflow approach would create a more practical solution, balancing systematic design support with conversational flexibility for comprehensive cloud architecture development.

Authors:Linfeng, Zhao, Rishul Bhuvanagiri, Blake Gonzales, Kellen Sharp, Dhiraj Murthy
Title: A Dashboard Approach to Monitoring Mpox-Related Discourse and Misinformation on Social Media
Abstract:
Mpox (formerly monkeypox) is a zoonotic disease caused by an orthopoxvirus closely related to variola and remains a significant global public health concern. During outbreaks, social media platforms like X (formerly Twitter) can both inform and misinform the public, complicating efforts to convey accurate health information. To support local response efforts, we developed a researcher-focused dashboard for use by public health stakeholders and the public that enables searching and visualizing mpox-related tweets through an interactive interface. Following the CDC's designation of mpox as an emerging virus in August 2024, our dashboard recorded a marked increase in tweet volume compared to 2023, illustrating the rapid spread of health discourse across digital platforms. These findings underscore the continued need for real-time social media monitoring tools to support public health communication and track evolving sentiment and misinformation trends at the local level.

Authors:Anton Hummel, Håkan Burden, Susanne Stenberg, Jan-Philipp Steghöfer, Niklas Kühl
Title: The EU AI Act, Stakeholder Needs, and Explainable AI: Aligning Regulatory Compliance in a Clinical Decision Support System
Abstract:
Explainable AI (XAI) is a promising solution to ensure compliance with the EU AI Act, the first multi-national regulation for AI. XAI aims to enhance transparency and human oversight of AI systems, particularly ``black-box models'', which are criticized as incomprehensible. However, the discourse around the main stakeholders in the AI Act and XAI appears disconnected. While XAI prioritizes the end user's needs as the primary goal, the AI Act focuses on the obligations of the provider and deployer of the AI system. We aim to bridge this divide and provide guidance on how these two worlds are related. By fostering an interdisciplinary discussion in a cross-functional team with XAI, AI Act, legal, and requirements engineering experts, we walk through the steps necessary to analyze an AI-based clinical decision support system to clarify the end-user needs and assess AI Act applicability. By analyzing our justified understanding using an AI system under development as a case, we show that XAI techniques can fill a gap between stakeholder needs and the requirements of the AI Act. We look at the similarities and contrasts between the legal requirements and the needs of stakeholders. In doing so, we encourage researchers and practitioners from the XAI community to reflect on their role towards the AI Act by achieving a mutual understanding of the implications of XAI and the AI Act within different disciplines.

Authors:Subek Acharya, Sansrit Paudel
Title: Literature review on assistive technologies for people with Parkinson's disease
Abstract:
Parkinson's Disease (PD) is a neurodegenerative disorder that significantly impacts motor and non-motor functions. There is currently no treatment that slows or stops neurodegeneration in PD. In this context, assistive technologies (ATs) have emerged as vital tools to aid people with Parkinson's and significantly improve their quality of life. This review explores a broad spectrum of ATs, including wearable and cueing devices, exoskeletons, robotics, virtual reality, voice and video-assisted technologies, and emerging innovations such as artificial intelligence (AI), machine learning (ML), and the Internet of Things (IoT). The review highlights ATs' significant role in addressing motor symptoms such as freezing of gait (FOG) and gait and posture disorders. However, it also identifies significant gaps in addressing non-motor symptoms such as sleep dysfunction and mental health. Similarly, the research identifies substantial potential in the further implementation of deep learning, AI, IOT technologies. Overall, this review highlights the transformative potential of AT in PD management while identifying gaps that future research should address to ensure personalized, accessible, and effective solutions.

Authors:Omer Ege, Mustafa Cagal, Kemal Bicakci
Title: Usability of Token-based and Remote Electronic Signatures: A User Experience Study
Abstract:
As electronic signatures (e-signatures) become increasingly integral to secure digital transactions, understanding their usability and security perception from an end-user perspective has become crucial. This study empirically evaluates and compares two major e-signature systems -- token-based and remote signatures -- through a controlled user experience study with 20 participants. Participants completed tasks involving acquisition, installation, and document signing using both methods, followed by structured surveys and qualitative feedback. Statistical analyses revealed that remote e-signatures were perceived as significantly more usable than token-based ones, due to their minimal setup and platform-independent accessibility. In contrast, token-based signatures were rated as significantly more secure, highlighting users' trust in hardware-based protection. Although more participants preferred remote e-signatures for document signing, the preference did not reach statistical significance, indicating a trend toward favoring convenience in real-world scenarios. These findings underline the fundamental trade-off between usability and perceived security in digital signing systems. By bridging the gap between theoretical frameworks and real user experience, this study contributes valuable insights to the design and policymaking of qualified electronic signature solutions.

Authors:John Oyekan, Christopher Turner, Michael Bax, Erich Graf
Title: Applying Ontologies and Knowledge Augmented Large Language Models to Industrial Automation: A Decision-Making Guidance for Achieving Human-Robot Collaboration in Industry 5.0
Abstract:
The rapid advancement of Large Language Models (LLMs) has resulted in interest in their potential applications within manufacturing systems, particularly in the context of Industry 5.0. However, determining when to implement LLMs versus other Natural Language Processing (NLP) techniques, ontologies or knowledge graphs, remains an open question. This paper offers decision-making guidance for selecting the most suitable technique in various industrial contexts, emphasizing human-robot collaboration and resilience in manufacturing. We examine the origins and unique strengths of LLMs, ontologies, and knowledge graphs, assessing their effectiveness across different industrial scenarios based on the number of domains or disciplines required to bring a product from design to manufacture. Through this comparative framework, we explore specific use cases where LLMs could enhance robotics for human-robot collaboration, while underscoring the continued relevance of ontologies and knowledge graphs in low-dependency or resource-constrained sectors. Additionally, we address the practical challenges of deploying these technologies, such as computational cost and interpretability, providing a roadmap for manufacturers to navigate the evolving landscape of Language based AI tools in Industry 5.0. Our findings offer a foundation for informed decision-making, helping industry professionals optimize the use of Language Based models for sustainable, resilient, and human-centric manufacturing. We also propose a Large Knowledge Language Model architecture that offers the potential for transparency and configuration based on complexity of task and computing resources available.

Authors:Yuqi Wang, Sirui Wang, Shiman Zhang, Kexue Fu, Michelle Lui, Ray Lc
Title: From Temporal to Spatial: Designing Spatialized Interactions with Segmented-audios in Immersive Environments for Active Engagement with Performing Arts Intangible Cultural Heritage
Abstract:
Performance artforms like Peking opera face transmission challenges due to the extensive passive listening required to understand their nuance. To create engaging forms of experiencing auditory Intangible Cultural Heritage (ICH), we designed a spatial interaction-based segmented-audio (SISA) Virtual Reality system that transforms passive ICH experiences into active ones. We undertook: (1) a co-design workshop with seven stakeholders to establish design requirements, (2) prototyping with five participants to validate design elements, and (3) user testing with 16 participants exploring Peking Opera. We designed transformations of temporal music into spatial interactions by cutting sounds into short audio segments, applying t-SNE algorithm to cluster audio segments spatially. Users navigate through these sounds by their similarity in audio property. Analysis revealed two distinct interaction patterns (Progressive and Adaptive), and demonstrated SISA's efficacy in facilitating active auditory ICH engagement. Our work illuminates the design process for enriching traditional performance artform using spatially-tuned forms of listening.

Authors:Manuel Valle Torre, Thom van der Velden, Marcus Specht, Catharine Oertel
Title: JELAI: Integrating AI and Learning Analytics in Jupyter Notebooks
Abstract:
Generative AI offers potential for educational support, but often lacks pedagogical grounding and awareness of the student's learning context. Furthermore, researching student interactions with these tools within authentic learning environments remains challenging. To address this, we present JELAI, an open-source platform architecture designed to integrate fine-grained Learning Analytics (LA) with Large Language Model (LLM)-based tutoring directly within a Jupyter Notebook environment. JELAI employs a modular, containerized design featuring JupyterLab extensions for telemetry and chat, alongside a central middleware handling LA processing and context-aware LLM prompt enrichment. This architecture enables the capture of integrated code interaction and chat data, facilitating real-time, context-sensitive AI scaffolding and research into student behaviour. We describe the system's design, implementation, and demonstrate its feasibility through system performance benchmarks and two proof-of-concept use cases illustrating its capabilities for logging multi-modal data, analysing help-seeking patterns, and supporting A/B testing of AI configurations. JELAI's primary contribution is its technical framework, providing a flexible tool for researchers and educators to develop, deploy, and study LA-informed AI tutoring within the widely used Jupyter ecosystem.

Authors:Jiaqi Jiang, Kexin Huang, Roberto Martinez-Maldonado, Huan Zeng, Duo Gong, Pengcheng An
Title: Novobo: Supporting Teachers' Peer Learning of Instructional Gestures by Teaching a Mentee AI-Agent Together
Abstract:
Instructional gestures are essential for teaching, as they enhance communication and support student comprehension. However, existing training methods for developing these embodied skills can be time-consuming, isolating, or overly prescriptive. Research suggests that developing these tacit, experiential skills requires teachers' peer learning, where they learn from each other and build shared knowledge. This paper introduces Novobo, an apprentice AI-agent stimulating teachers' peer learning of instructional gestures through verbal and bodily inputs. Positioning the AI as a mentee employs the learning-by-teaching paradigm, aiming to promote deliberate reflection and active learning. Novobo encourages teachers to evaluate its generated gestures and invite them to provide demonstrations. An evaluation with 30 teachers in 10 collaborative sessions showed Novobo prompted teachers to share tacit knowledge through conversation and movement. This process helped teachers externalize, exchange, and internalize their embodied knowledge, promoting collaborative learning and building a shared understanding of instructional gestures within the local teaching community. This work advances understanding of how teachable AI agents can enhance collaborative learning in teacher professional development, offering valuable design insights for leveraging AI to promote the sharing and construction of embodied and practical knowledge.

Authors:Suifang Zhou, Kexue Fu, Huanmin Yi, Ray Lc
Title: RetroChat: Designing for the Preservation of Past Digital Experiences
Abstract:
Rapid changes in social networks have transformed the way people express themselves, turning past neologisms, values, and mindsets embedded in these expressions into online heritage. How can we preserve these expressions as cultural heritage? Instead of traditional archiving methods for static material, we designed an interactive and experiential form of archiving for Chinese social networks. Using dialogue data from 2000-2010 on early Chinese social media, we developed a GPT-driven agent within a retro chat interface, emulating the language and expression style of the period for interaction. Results from a qualitative study with 18 participants show that the design captures the past chatting experience and evokes memory flashbacks and nostalgia feeling through conversation. Participants, particularly those familiar with the era, adapted their language to match the agent's chatting style. This study explores how the design of preservation methods for digital experiences can be informed by experiential representations supported by generative tools.

Authors:Fidaa Khandaqji, Huthaifa I. Ashqar, Abdelrahem Atawnih
Title: Enhancing Mathematics Learning for Hard-of-Hearing Students Through Real-Time Palestinian Sign Language Recognition: A New Dataset
Abstract:
The study aims to enhance mathematics education accessibility for hard-of-hearing students by developing an accurate Palestinian sign language PSL recognition system using advanced artificial intelligence techniques. Due to the scarcity of digital resources for PSL, a custom dataset comprising 41 mathematical gesture classes was created, and recorded by PSL experts to ensure linguistic accuracy and domain specificity. To leverage state-of-the-art-computer vision techniques, a Vision Transformer ViTModel was fine-tuned for gesture classification. The model achieved an accuracy of 97.59%, demonstrating its effectiveness in recognizing mathematical signs with high precision and reliability. This study highlights the role of deep learning in developing intelligent educational tools that bridge the learning gap for hard-of-hearing students by providing AI-driven interactive solutions to enhance mathematical comprehension. This work represents a significant step toward innovative and inclusive frosting digital integration in specialized learning environments. The dataset is hosted on Hugging Face at https://huggingface.co/datasets/fidaakh/STEM_data.

Authors:Mudassir Ibrahim Awan, Seokhee Jeon
Title: Estimating Perceptual Attributes of Haptic Textures Using Visuo-Tactile Data
Abstract:
Accurate prediction of perceptual attributes of haptic textures is essential for advancing VR and AR applications and enhancing robotic interaction with physical surfaces. This paper presents a deep learning-based multi-modal framework, incorporating visual and tactile data, to predict perceptual texture ratings by leveraging multi-feature inputs. To achieve this, a four-dimensional haptic attribute space encompassing rough-smooth, flat-bumpy, sticky-slippery, and hard-soft dimensions is first constructed through psychophysical experiments, where participants evaluate 50 diverse real-world texture samples. A physical signal space is subsequently created by collecting visual and tactile data from these textures. Finally, a deep learning architecture integrating a CNN-based autoencoder for visual feature learning and a ConvLSTM network for tactile data processing is trained to predict user-assigned attribute ratings. This multi-modal, multi-feature approach maps physical signals to perceptual ratings, enabling accurate predictions for unseen textures. To evaluate predictive accuracy, we employed leave-one-out cross-validation to rigorously assess the model's reliability and generalizability against several machine learning and deep learning baselines. Experimental results demonstrate that the framework consistently outperforms single-modality approaches, achieving lower MAE and RMSE, highlighting the efficacy of combining visual and tactile modalities.

Authors:Daisuke Yukita, Tim Miller, Joel Mackenzie
Title: Reassessing Collaborative Writing Theories and Frameworks in the Age of LLMs: What Still Applies and What We Must Leave Behind
Abstract:
In this paper, we conduct a critical review of existing theories and frameworks on human-human collaborative writing to assess their relevance to the current human-AI paradigm in organizational workplace settings, and draw seven insights along with design implications for human-AI collaborative writing tools. Our main finding was that, as we delegate more writing to AI, our cognitive process shifts from the traditional planning/translating/reviewing process to a planning/waiting/reviewing process, breaking the process due to the waiting that occurs in between. To ensure that our cognitive process remains intact, we suggest a "prototyping" approach, where the tool allows for faster iterations of the cognitive process by starting with smaller chunks of text, and gradually moving on to a fully fleshed-out document. We aim to bring theoretical grounding and practical design guidance to the interaction designs of human-AI collaborative writing, with the goal of enhancing future human-AI writing software.

Authors:Sheshera Mysore, Debarati Das, Hancheng Cao, Bahareh Sarrafzadeh
Title: Prototypical Human-AI Collaboration Behaviors from LLM-Assisted Writing in the Wild
Abstract:
As large language models (LLMs) are used in complex writing workflows, users engage in multi-turn interactions to steer generations to better fit their needs. Rather than passively accepting output, users actively refine, explore, and co-construct text. We conduct a large-scale analysis of this collaborative behavior for users engaged in writing tasks in the wild with two popular AI assistants, Bing Copilot and WildChat. Our analysis goes beyond simple task classification or satisfaction estimation common in prior work and instead characterizes how users interact with LLMs through the course of a session. We identify prototypical behaviors in how users interact with LLMs in prompts following their original request. We refer to these as Prototypical Human-AI Collaboration Behaviors (PATHs) and find that a small group of PATHs explain a majority of the variation seen in user-LLM interaction. These PATHs span users revising intents, exploring texts, posing questions, adjusting style or injecting new content. Next, we find statistically significant correlations between specific writing intents and PATHs, revealing how users' intents shape their collaboration behaviors. We conclude by discussing the implications of our findings on LLM alignment.

Authors:Alan Ta, Nilsu Salgin, Mustafa Demir, Kala Phillips Reindel, Ranjana K. Mehta, Anthony McDonald, Carly McCord, Farzan Sasangohar
Title: Real-Time Stress Monitoring, Detection, and Management in College Students: A Wearable Technology and Machine-Learning Approach
Abstract:
College students are increasingly affected by stress, anxiety, and depression, yet face barriers to traditional mental health care. This study evaluated the efficacy of a mobile health (mHealth) intervention, Mental Health Evaluation and Lookout Program (mHELP), which integrates a smartwatch sensor and machine learning (ML) algorithms for real-time stress detection and self-management. In a 12-week randomized controlled trial (n = 117), participants were assigned to a treatment group using mHELP's full suite of interventions or a control group using the app solely for real-time stress logging and weekly psychological assessments. The primary outcome, "Moments of Stress" (MS), was assessed via physiological and self-reported indicators and analyzed using Generalized Linear Mixed Models (GLMM) approaches. Similarly, secondary outcomes of psychological assessments, including the Generalized Anxiety Disorder-7 (GAD-7) for anxiety, the Patient Health Questionnaire (PHQ-8) for depression, and the Perceived Stress Scale (PSS), were also analyzed via GLMM. The finding of the objective measure, MS, indicates a substantial decrease in MS among the treatment group compared to the control group, while no notable between-group differences were observed in subjective scores of anxiety (GAD-7), depression (PHQ-8), or stress (PSS). However, the treatment group exhibited a clinically meaningful decline in GAD-7 and PSS scores. These findings underscore the potential of wearable-enabled mHealth tools to reduce acute stress in college populations and highlight the need for extended interventions and tailored features to address chronic symptoms like depression.

Authors:Tejaswi Polimetla, Katy Ilonka Gero, Elena Leah Glassman
Title: A Paradigm for Creative Ownership
Abstract:
As generative AI tools become embedded in creative practice, questions of ownership in co-creative contexts are pressing. Yet studies of human-AI collaboration often invoke "ownership" without definition: sometimes conflating it with other concepts, and other times leaving interpretation to participants. This inconsistency makes findings difficult to compare across or even within studies. We introduce a framework of creative ownership comprising three dimensions - Person, Process, and System - each with three subdimensions, offering a shared language for both system design and HCI research. In semi-structured interviews with 21 creative professionals, we found that participants' initial references to ownership (e.g., embodiment, control, concept) were fully encompassed by the framework, demonstrating its coverage. Once introduced, however, they also articulated and prioritized the remaining subdimensions, underscoring how the framework expands reflection and enables richer insights. Our contributions include 1) the framework, 2) a web-based visualization tool, and 3) empirical findings on its utility.

Authors:Ya-Chuan Hsu, Michael Defranco, Rutvik Patel, Stefanos Nikolaidis
Title: Integrating Field of View in Human-Aware Collaborative Planning
Abstract:
In human-robot collaboration (HRC), it is crucial for robot agents to consider humans' knowledge of their surroundings. In reality, humans possess a narrow field of view (FOV), limiting their perception. However, research on HRC often overlooks this aspect and presumes an omniscient human collaborator. Our study addresses the challenge of adapting to the evolving subtask intent of humans while accounting for their limited FOV. We integrate FOV within the human-aware probabilistic planning framework. To account for large state spaces due to considering FOV, we propose a hierarchical online planner that efficiently finds approximate solutions while enabling the robot to explore low-level action trajectories that enter the human FOV, influencing their intended subtask. Through user study with our adapted cooking domain, we demonstrate our FOV-aware planner reduces human's interruptions and redundant actions during collaboration by adapting to human perception limitations. We extend these findings to a virtual reality kitchen environment, where we observe similar collaborative behaviors.

Authors:Franziska Sofia Hafner, Ana Valdivia, Luc Rocher
Title: Gender Trouble in Language Models: An Empirical Audit Guided by Gender Performativity Theory
Abstract:
Language models encode and subsequently perpetuate harmful gendered stereotypes. Research has succeeded in mitigating some of these harms, e.g. by dissociating non-gendered terms such as occupations from gendered terms such as 'woman' and 'man'. This approach, however, remains superficial given that associations are only one form of prejudice through which gendered harms arise. Critical scholarship on gender, such as gender performativity theory, emphasizes how harms often arise from the construction of gender itself, such as conflating gender with biological sex. In language models, these issues could lead to the erasure of transgender and gender diverse identities and cause harms in downstream applications, from misgendering users to misdiagnosing patients based on wrong assumptions about their anatomy. For FAccT research on gendered harms to go beyond superficial linguistic associations, we advocate for a broader definition of 'gender bias' in language models. We operationalize insights on the construction of gender through language from gender studies literature and then empirically test how 16 language models of different architectures, training datasets, and model sizes encode gender. We find that language models tend to encode gender as a binary category tied to biological sex, and that gendered terms that do not neatly fall into one of these binary categories are erased and pathologized. Finally, we show that larger models, which achieve better results on performance benchmarks, learn stronger associations between gender and sex, further reinforcing a narrow understanding of gender. Our findings lead us to call for a re-evaluation of how gendered harms in language models are defined and addressed.

Authors:Mak Ahmad, Prerna Ravi, David Karger, Marc Facciotti
Title: How Adding Metacognitive Requirements in Support of AI Feedback in Practice Exams Transforms Student Learning Behaviors
Abstract:
Providing personalized, detailed feedback at scale in large undergraduate STEM courses remains a persistent challenge. We present an empirically evaluated practice exam system that integrates AI generated feedback with targeted textbook references, deployed in a large introductory biology course. Our system encourages metacognitive behavior by asking students to explain their answers and declare their confidence. It uses OpenAI's GPT-4o to generate personalized feedback based on this information, while directing them to relevant textbook sections. Through interaction logs from consenting participants across three midterms (541, 342, and 413 students respectively), totaling 28,313 question-student interactions across 146 learning objectives, along with 279 surveys and 23 interviews, we examined the system's impact on learning outcomes and engagement. Across all midterms, feedback types showed no statistically significant performance differences, though some trends suggested potential benefits. The most substantial impact came from the required confidence ratings and explanations, which students reported transferring to their actual exam strategies. About 40 percent of students engaged with textbook references when prompted by feedback -- far higher than traditional reading rates. Survey data revealed high satisfaction (mean rating 4.1 of 5), with 82.1 percent reporting increased confidence on practiced midterm topics, and 73.4 percent indicating they could recall and apply specific concepts. Our findings suggest that embedding structured reflection requirements may be more impactful than sophisticated feedback mechanisms.

Authors:Marina Estévez-Almenzar, Ricardo Baeza-Yates, Carlos Castillo
Title: Human Response to Decision Support in Face Matching: The Influence of Task Difficulty and Machine Accuracy
Abstract:
Decision support systems enhanced by Artificial Intelligence (AI) are increasingly being used in high-stakes scenarios where errors or biased outcomes can have significant consequences. In this work, we explore the conditions under which AI-based decision support systems affect the decision accuracy of humans involved in face matching tasks. Previous work suggests that this largely depends on various factors, such as the specific nature of the task and how users perceive the quality of the decision support, among others. Hence, we conduct extensive experiments to examine how both task difficulty and the precision of the system influence human outcomes. Our results show a strong influence of task difficulty, which not only makes humans less precise but also less capable of determining whether the decision support system is yielding accurate suggestions or not. This has implications for the design of decision support systems, and calls for a careful examination of the context in which they are deployed and on how they are perceived by users.

Authors:Junbo Wang, Haofeng Tan, Bowen Liao, Albert Jiang, Teng Fei, Qixing Huang, Zhengzhong Tu, Shan Ye, Yuhao Kang
Title: SounDiT: Geo-Contextual Soundscape-to-Landscape Generation
Abstract:
We present a novel and practically significant problem-Geo-Contextual Soundscape-to-Landscape (GeoS2L) generation-which aims to synthesize geographically realistic landscape images from environmental soundscapes. Prior audio-to-image generation methods typically rely on general-purpose datasets and overlook geographic and environmental contexts, resulting in unrealistic images that are misaligned with real-world environmental settings. To address this limitation, we introduce a novel geo-contextual computational framework that explicitly integrates geographic knowledge into multimodal generative modeling. We construct two large-scale geo-contextual multimodal datasets, SoundingSVI and SonicUrban, pairing diverse soundscapes with real-world landscape images. We propose SounDiT, a novel Diffusion Transformer (DiT)-based model that incorporates geo-contextual scene conditioning to synthesize geographically coherent landscape images. Furthermore, we propose a practically-informed geo-contextual evaluation framework, the Place Similarity Score (PSS), across element-, scene-, and human perception-levels to measure consistency between input soundscapes and generated landscape images. Extensive experiments demonstrate that SounDiT outperforms existing baselines in both visual fidelity and geographic settings. Our work not only establishes foundational benchmarks for GeoS2L generation but also highlights the importance of incorporating geographic domain knowledge in advancing multimodal generative models, opening new directions at the intersection of generative AI, geography, urban planning, and environmental sciences.

Authors:Dena F. Mujtaba, Nihar R. Mahapatra
Title: Behind the Screens: Uncovering Bias in AI-Driven Video Interview Assessments Using Counterfactuals
Abstract:
AI-enhanced personality assessments are increasingly shaping hiring decisions, using affective computing to predict traits from the Big Five (OCEAN) model. However, integrating AI into these assessments raises ethical concerns, especially around bias amplification rooted in training data. These biases can lead to discriminatory outcomes based on protected attributes like gender, ethnicity, and age. To address this, we introduce a counterfactual-based framework to systematically evaluate and quantify bias in AI-driven personality assessments. Our approach employs generative adversarial networks (GANs) to generate counterfactual representations of job applicants by altering protected attributes, enabling fairness analysis without access to the underlying model. Unlike traditional bias assessments that focus on unimodal or static data, our method supports multimodal evaluation-spanning visual, audio, and textual features. This comprehensive approach is particularly important in high-stakes applications like hiring, where third-party vendors often provide AI systems as black boxes. Applied to a state-of-the-art personality prediction model, our method reveals significant disparities across demographic groups. We also validate our framework using a protected attribute classifier to confirm the effectiveness of our counterfactual generation. This work provides a scalable tool for fairness auditing of commercial AI hiring platforms, especially in black-box settings where training data and model internals are inaccessible. Our results highlight the importance of counterfactual approaches in improving ethical transparency in affective computing.

Authors:Angela Mastrianni, Mary Suhyun Kim, Travis M. Sullivan, Genevieve Jayne Sippel, Randall S. Burd, Krzysztof Z. Gajos, Aleksandra Sarcevic
Title: To Recommend or Not to Recommend: Designing and Evaluating AI-Enabled Decision Support for Time-Critical Medical Events
Abstract:
AI-enabled decision-support systems aim to help medical providers rapidly make decisions with limited information during medical emergencies. A critical challenge in developing these systems is supporting providers in interpreting the system output to make optimal treatment decisions. In this study, we designed and evaluated an AI-enabled decision-support system to aid providers in treating patients with traumatic injuries. We first conducted user research with physicians to identify and design information types and AI outputs for a decision-support display. We then conducted an online experiment with 35 medical providers from six health systems to evaluate two human-AI interaction strategies: (1) AI information synthesis and (2) AI information and recommendations. We found that providers were more likely to make correct decisions when AI information and recommendations were provided compared to receiving no AI support. We also identified two socio-technical barriers to providing AI recommendations during time-critical medical events: (1) an accuracy-time trade-off in providing recommendations and (2) polarizing perceptions of recommendations between providers. We discuss three implications for developing AI-enabled decision support used in time-critical events, contributing to the limited research on human-AI interaction in this context.

Authors:Jiwon Chun, Gefei Zhang, Meng Xia
Title: ConflictLens: LLM-Based Conflict Resolution Training in Romantic Relationship
Abstract:
Our poster presents ConflictLens, a three-stage simulation system powered by large language models (LLMs) and grounded in psychological theory, designed to help users reflect on and practice conflict resolution in romantic relationships. Users can upload real conflict scenarios to receive evaluation of behavioral patterns, reflect on conflicts by annotating their negative behaviors, and practice different conflict resolution strategies in AI-simulated duologues. Initial evaluation by three domain experts suggests that ConflictLens offers a realistic experience and effectively supports self-guided reflection and communication practice in romantic relationships.

Authors:Zeynep Engin, David Hand
Title: Toward Adaptive Categories: Dimensional Governance for Agentic AI
Abstract:
As AI systems evolve from static tools to dynamic agents, traditional categorical governance frameworks -- based on fixed risk tiers, levels of autonomy, or human oversight models -- are increasingly insufficient on their own. Systems built on foundation models, self-supervised learning, and multi-agent architectures increasingly blur the boundaries that categories were designed to police. In this Perspective, we make the case for dimensional governance: a framework that tracks how decision authority, process autonomy, and accountability (the 3As) distribute dynamically across human-AI relationships. A critical advantage of this approach is its ability to explicitly monitor system movement toward and across key governance thresholds, enabling preemptive adjustments before risks materialize. This dimensional approach provides the necessary foundation for more adaptive categorization, enabling thresholds and classifications that can evolve with emerging capabilities. While categories remain essential for decision-making, building them upon dimensional foundations allows for context-specific adaptability and stakeholder-responsive governance that static approaches cannot achieve. We outline key dimensions, critical trust thresholds, and practical examples illustrating where rigid categorical frameworks fail -- and where a dimensional mindset could offer a more resilient and future-proof path forward for both governance and innovation at the frontier of artificial intelligence.

Authors:Jagan K Balasubramanian, Daan M Pool, Yasemin Vardar
Title: Sliding Speed Influences Electrovibration-Induced Finger Friction Dynamics on Touchscreens
Abstract:
Electrovibration technology enables tactile texture rendering on capacitive touchscreens by modulating friction between the finger and the screen through electrostatic attraction forces, generated by applying an alternating voltage signal to the screen. Accurate signal calibration is essential for robust texture rendering but remains challenging due to variations in sliding speed, applied force, and individual skin mechanics, all of which unpredictably affect frictional behavior. Here, we investigate how exploration conditions affect electrovibration-induced finger friction on touchscreens and the role of skin mechanics in this process. Ten participants slid their index fingers across an electrovibration-enabled touchscreen at five sliding speeds ($20\sim100$ mm/s) and applied force levels ($0.2\sim0.6$ N). Contact forces and skin accelerations were measured while amplitude modulated voltage signals spanning the tactile frequency range were applied to the screen. We modeled the finger-touchscreen friction response as a first-order system and the skin mechanics as a mass-spring-damper system. Results showed that sliding speed influenced the friction response's cutoff frequency, along with the estimated finger moving mass and stiffness. For every $1$ mm/s increase in speed, the cutoff frequency, the finger moving mass, and stiffness increased by $13.8$ Hz, $3.23\times 10^{-5}$ kg, and $4.04$ N/m, respectively. Correlation analysis revealed that finger stiffness had a greater impact on the cutoff frequency than moving mass. Notably, we observed a substantial inter-participant variability in both finger-display interaction and skin mechanics parameters. Finally, we developed a speed-dependent friction model to support consistent and perceptually stable electrovibration-based haptic feedback across varying user conditions.

Authors:Carlos R. Cunha, André Moreira, Sílvia Coelho, Vítor Mendonça, João Pedro Gomes
Title: Empowering the Teaching and Learning of Geometry in Basic Education by Combining Extended Reality and Machine Learning
Abstract:
Technology has helped to innovate in the teaching-learning process. Today's students are more demanding actors when it comes to the environment, they have at their disposal to learn, experiment and develop critical thinking. The area of mathematics has successively suffered from students' learning difficulties, whether due to lack of motivation, low abstraction ability, or lack of new tools for teachers to bring innovation into the classroom and outside it. While it is true that digitalization has entered schools, it often follows a process of digital replication of approaches and materials that were previously only available on physical media. This work focuses on the use of Extended Realities for teaching mathematics, and very particularly in the teaching of geometry, with a proposition of a conceptual model that combines the use of Extended Reality and Machine Learning. The proposed model was subject to prototyping, which is presented as a form of laboratory validation as a contribution to innovate the way in which the geometry teaching-learning process is developed, as well as through the ability to obtain useful insights for teachers and students throughout the process.

Authors:Md Farhan Tasnim Oshim, Nigel Doering, Bashima Islam, Tsui-Wei Weng, Tauhidur Rahman
Title: Anti-Sensing: Defense against Unauthorized Radar-based Human Vital Sign Sensing with Physically Realizable Wearable Oscillators
Abstract:
Recent advancements in Ultra-Wideband (UWB) radar technology have enabled contactless, non-line-of-sight vital sign monitoring, making it a valuable tool for healthcare. However, UWB radar's ability to capture sensitive physiological data, even through walls, raises significant privacy concerns, particularly in human-robot interactions and autonomous systems that rely on radar for sensing human presence and physiological functions. In this paper, we present Anti-Sensing, a novel defense mechanism designed to prevent unauthorized radar-based sensing. Our approach introduces physically realizable perturbations, such as oscillatory motion from wearable devices, to disrupt radar sensing by mimicking natural cardiac motion, thereby misleading heart rate (HR) estimations. We develop a gradient-based algorithm to optimize the frequency and spatial amplitude of these oscillations for maximal disruption while ensuring physiological plausibility. Through both simulations and real-world experiments with radar data and neural network-based HR sensing models, we demonstrate the effectiveness of Anti-Sensing in significantly degrading model accuracy, offering a practical solution for privacy preservation.

Authors:Xinglin Sun, Caroline Claisse, Runhua Zhang, Xinyu Wu, Jialin Yuan, Qi Wang
Title: Conversations With The Stressed Body: Facilitating Stress Self-Disclosure Among Adolescent Girls Through An Embodied Approach
Abstract:
Adolescent girls face significant mental health challenges during their transition to adulthood, often experiencing heightened stress from various sources. While various interactive technologies for self-disclosure had been explored to support stress relief, little is known about how to encourage stress-related self-disclosure through an embodied approach. This study presents a co-design workshop centred on Embodied Probes, a series of artefacts and activities incorporating embodied methods and technologies. During the workshop, nine participants aged 15 to 18 engaged with their bodies, expressed bodily sensations through tangible means, and designed embodied prototypes tailored to their personal needs for stress perception and relief. The workshop revealed insights into somatic symptoms, sources, and coping strategies for stress among adolescent girls, as well as how embodied methods can support their stress self-disclosure. This paper contributes to the HCI community by offering design implications on leveraging embodied technologies to support self-disclosure for young women's mental well-being.

Authors:Önder Gürcan, Vanja Falck, Markus G. Rousseau, Larissa L. Lima
Title: Towards an LLM-powered Social Digital Twinning Platform
Abstract:
We present Social Digital Twinner, an innovative social simulation tool for exploring plausible effects of what-if scenarios in complex adaptive social systems. The architecture is composed of three seamlessly integrated parts: a data infrastructure featuring real-world data and a multi-dimensionally representative synthetic population of citizens, an LLM-enabled agent-based simulation engine, and a user interface that enable intuitive, natural language interactions with the simulation engine and the artificial agents (i.e. citizens). Social Digital Twinner facilitates real-time engagement and empowers stakeholders to collaboratively design, test, and refine intervention measures. The approach is promoting a data-driven and evidence-based approach to societal problem-solving. We demonstrate the tool's interactive capabilities by addressing the critical issue of youth school dropouts in Kragero, Norway, showcasing its ability to create and execute a dedicated social digital twin using natural language.

Authors:Agnik Saha, Victoria Churchill, Anny D. Rodriguez, Ugur Kursuncu, Muhammed Y. Idris
Title: Large Language Models for Cancer Communication: Evaluating Linguistic Quality, Safety, and Accessibility in Generative AI
Abstract:
Effective communication about breast and cervical cancers remains a persistent health challenge, with significant gaps in public understanding of cancer prevention, screening, and treatment, potentially leading to delayed diagnoses and inadequate treatments. This study evaluates the capabilities and limitations of Large Language Models (LLMs) in generating accurate, safe, and accessible cancer-related information to support patient understanding. We evaluated five general-purpose and three medical LLMs using a mixed-methods evaluation framework across linguistic quality, safety and trustworthiness, and communication accessibility and affectiveness. Our approach utilized quantitative metrics, qualitative expert ratings, and statistical analysis using Welch's ANOVA, Games-Howell, and Hedges' g. Our results show that general-purpose LLMs produced outputs of higher linguistic quality and affectiveness, while medical LLMs demonstrate greater communication accessibility. However, medical LLMs tend to exhibit higher levels of potential harm, toxicity, and bias, reducing their performance in safety and trustworthiness. Our findings indicate a duality between domain-specific knowledge and safety in health communications. The results highlight the need for intentional model design with targeted improvements, particularly in mitigating harm and bias, and improving safety and affectiveness. This study provides a comprehensive evaluation of LLMs for cancer communication, offering critical insights for improving AI-generated health content and informing future development of accurate, safe, and accessible digital health tools.

Authors:Carlos R. Cunha, Vítor Mendonça, André Moreira, João Pedro Gomes, Aida Carvalho
Title: Using Virtual Reality in Museums to Bridge the Gap Between Material Heritage and the Interpretation of Its Immaterial Context
Abstract:
Material heritage typically has a whole set of associated immaterial heritage, which is essential to pass on to the visitor as a cultural mission of the destinations and those who manage them. In this sense, the interpretation of material heritage is a complex process that is not a fully efficient process with the mere observation of physical artifacts. In this context, it emerges as fundamental to provide visitors with a set of tools that allow them to correctly interpret the artifacts that come to fully understand the cultural dimension of the destinations and their heritage. Accordingly, the role of virtual reality can leverage the creation of innovative and immersive solutions that allow the visitor to understand and feel part of their own heritage and its ancestral component that defines the sociocultural roots of destinations and their civilizational traditions. This article, after dissecting and substantiating the role of virtual reality in the interpretation of heritage, presents a conceptual model, based on the use of virtual reality, which was, in part, prototyped in the scenario of the Portuguese Museum in the city of Miranda do Douro. This proposal is an ongoing contribution to the creation of innovative and immersive tools for the interpretation of heritage.

Authors:Adeline Fau, Mina Ghobrial, Philippe Seitier, Pierre Lagarrigue, Michel Galaup, Alain Daidié, Patrick Gilles
Title: Enhancing performance in bolt torque tightening using a connected torque wrench and augmented reality
Abstract:
Modern production rates and the increasing complexity of mechanical systems require efficient and effective manufacturing and assembly processes. The transition to Industry 4.0, supported by the deployment of innovative tools such as Augmented Reality (AR), equips the industry to tackle future challenges. Among critical processes, the assembly and tightening of bolted joints stand out due to their significant safety and economic implications across various industrial sectors. This study proposes an innovative tightening method designed to enhance the reliability of bolted assembly tightening through the use of Augmented Reality and connected tools. A 6-Degrees-of-Freedom (6-DoF) tracked connected torque wrench assists the operator during tightening, ensuring each screw is tightened to the correct torque. The effectiveness of this method is compared with the conventional tightening method using paper instructions. Participants in the study carried out tightening sequences on two simple parts with multiple screws. The study evaluates the impact of the proposed method on task performance and its acceptability to operators. The tracked connected torque wrench provides considerable assistance to the operators, including wrench control and automatic generation of tightening reports. The results suggest that the AR-based method has the potential to ensure reliable torque tightening of bolted joints.

Authors:Chenglong Wang, Yuhao Kang, Zhaoya Gong, Pengjun Zhao, Yu Feng, Wenjia Zhang, Ge Li
Title: CartoAgent: a multimodal large language model-powered multi-agent cartographic framework for map style transfer and evaluation
Abstract:
The rapid development of generative artificial intelligence (GenAI) presents new opportunities to advance the cartographic process. Previous studies have either overlooked the artistic aspects of maps or faced challenges in creating both accurate and informative maps. In this study, we propose CartoAgent, a novel multi-agent cartographic framework powered by multimodal large language models (MLLMs). This framework simulates three key stages in cartographic practice: preparation, map design, and evaluation. At each stage, different MLLMs act as agents with distinct roles to collaborate, discuss, and utilize tools for specific purposes. In particular, CartoAgent leverages MLLMs' visual aesthetic capability and world knowledge to generate maps that are both visually appealing and informative. By separating style from geographic data, it can focus on designing stylesheets without modifying the vector-based data, thereby ensuring geographic accuracy. We applied CartoAgent to a specific task centered on map restyling-namely, map style transfer and evaluation. The effectiveness of this framework was validated through extensive experiments and a human evaluation study. CartoAgent can be extended to support a variety of cartographic design decisions and inform future integrations of GenAI in cartography.

Authors:Xiaoyan Wei, Zijian Yue, Hsiang-Ting Chen
Title: SnapNCode: An Integrated Development Environment for Programming Physical Objects Interactions
Abstract:
Spatial computing technologies have the potential to revolutionize how we interact with the world around us. However, most modern integrated development environments (IDEs) have not fully adapted to this paradigm shift. For example, physical 3D objects in the real world are still represented as 2D text variables in code, creating a significant perceptual distance between these representations. In response to this challenge, we introduce SnapNCode, a novel IDE for spatial programming. SnapNCode enables programmers to capture various states of physical objects through live video streams from cameras and directly insert these visual representations into their code. Moreover, users can augment physical objects by attaching code snippets onto objects, which are opportunistically triggered when observed by cameras. We conducted a user study (N=12) to assess the usability of SnapNCode. Feedback from participants indicates that the system is easy-to-use and holds promise for daily casual uses and integration into a broader range of workflows.

Authors:Xiaoyan Wei, Zebang Zhang, Zijian Yue, Hsiang-Ting Chen
Title: Context-AI Tunes: Context-Aware AI-Generated Music for Stress Reduction
Abstract:
Music plays a critical role in emotional regulation and stress relief; however, individuals often need different types of music tailored to their unique stress levels or surrounding environment. Choosing the right music can be challenging due to the overwhelming number of options and the time-consuming trial-and-error process. To address this, we propose Context-AI Tune (CAT), a system that generates personalized music based on environmental inputs and the user's self-assessed stress level. A 2x2 within-subject experiment (N=26) was conducted with two independent variables: AI (AI, NoAI) and Environment (Busy Hub, Quiet Library). CAT's effectiveness in reducing stress was evaluated using the Visual Analog Scale for Stress (VAS-S). Results show that CAT is more effective than manually chosen music in reducing stress by adapting to user context.

Authors:Jianlong Zhu, Manon Kempermann, Vikram Kamath Cannanure, Alexander Hartland, Rosa M. Navarrete, Giuseppe Carteny, Daniela Braun, Ingmar Weber
Title: Learn, Explore and Reflect by Chatting: Understanding the Value of an LLM-Based Voting Advice Application Chatbot
Abstract:
Voting advice applications (VAAs), which have become increasingly prominent in European elections, are seen as a successful tool for boosting electorates' political knowledge and engagement. However, VAAs' complex language and rigid presentation constrain their utility to less-sophisticated voters. While previous work enhanced VAAs' click-based interaction with scripted explanations, a conversational chatbot's potential for tailored discussion and deliberate political decision-making remains untapped. Our exploratory mixed-method study investigates how LLM-based chatbots can support voting preparation. We deployed a VAA chatbot to 331 users before Germany's 2024 European Parliament election, gathering insights from surveys, conversation logs, and 10 follow-up interviews. Participants found the VAA chatbot intuitive and informative, citing its simple language and flexible interaction. We further uncovered VAA chatbots' role as a catalyst for reflection and rationalization. Expanding on participants' desire for transparency, we provide design recommendations for building interactive and trustworthy VAA chatbots.

Authors:Sadia Afrin Mim, Fatemeh Vares, Andrew Meenly, Brittany Johnson
Title: What Makes a Fairness Tool Project Sustainable in Open Source?
Abstract:
As society becomes increasingly reliant on artificial intelligence, the need to mitigate risk and harm is paramount. In response, researchers and practitioners have developed tools to detect and reduce undesired bias, commonly referred to as fairness tools. Many of these tools are publicly available for free use and adaptation. While the growing availability of such tools is promising, little is known about the broader landscape beyond well-known examples like AI Fairness 360 and Fairlearn. Because fairness is an ongoing concern, these tools must be built for long-term sustainability. Using an existing set of fairness tools as a reference, we systematically searched GitHub and identified 50 related projects. We then analyzed various aspects of their repositories to assess community engagement and the extent of ongoing maintenance. Our findings show diverse forms of engagement with these tools, suggesting strong support for open-source development. However, we also found significant variation in how well these tools are maintained. Notably, 53 percent of fairness projects become inactive within the first three years. By examining sustainability in fairness tooling, we aim to promote more stability and growth in this critical area.

Authors:Yuchen Wu, Mingduo Zhao, John Canny
Title: Beyond Likes: How Normative Feedback Complements Engagement Signals on Social Media
Abstract:
Many online platforms incorporate engagement signals, such as likes, into their interface design to boost engagement. However, these signals can unintentionally elevate content that may not support normatively desirable behavior, especially when toxic content correlates strongly with popularity indicators. In this study, we propose structured prosocial feedback as a complementary signal, which highlights content quality based on normative criteria. We design and implement an LLM-based feedback system, which evaluates user comments based on principles from positive psychology, such as individual well-being. A pre-registered user study then examines how existing peer-based (popularity) and the new expert-based feedback interact to shape users' reposting behavior in a social media setting. Results show that peer feedback increases conformity to popularity cues, while expert feedback shifts choices toward normatively higher-quality content. This illustrates the added value of normative cues and underscores the potential benefits of incorporating such signals into platform feedback systems to foster healthier online environments.

Authors:Nisha Devasia, Adrian Rodriguez, Logan Tuttle, Julie Kientz
Title: Partnership through Play: Investigating How Long-Distance Couples Use Digital Games to Facilitate Intimacy
Abstract:
Long-distance relationships (LDRs) have become more common in the last few decades, primarily among young adults pursuing educational or employment opportunities. A common way for couples in LDRs to spend time together is by playing multiplayer video games, which are often a shared hobby and therefore a preferred joint activity. However, games are relatively understudied in the context of relational maintenance for LDRs. In this work, we used a mixed-methods approach to collect data on the experiences of 13 couples in LDRs who frequently play games together. We investigated different values around various game mechanics and modalities and found significant differences in couple play styles, and also detail how couples appropriate game mechanics to express affection to each other virtually. We also created prototypes and design implications based on couples' needs surrounding the lack of physical sensation and memorabilia storage in most popular games.

Authors:London Bielicke, Anna Zhang, Shruti Tyagi, Emery Berger, Adam Chlipala, Eunice Jun
Title: PLanet: Formalizing Assignment Procedures in the Design of Experiments
Abstract:
Carefully constructed experimental designs are essential for drawing valid, generalizable conclusions from scientific experiments. Unfortunately, experimental designs can be difficult to specify, communicate clearly, and relate to alternatives. In response, we introduce a grammar of composable operators for constructing experimental assignment procedures (e.g., Latin square). The PLanet DSL implements this grammar. Researchers specify assignment requirements. PLanet compiles these into a constraint satisfaction problem over matrices that determines viable experimental plans. In an expressivity evaluation, we find that PLanet is the most expressive compared to two existing experimental design libraries. Its composability enables expression of both canonical and customized designs in HCI experiments. Case studies with three researchers reveal how PLanet helps them make complex design choices explicit, explore alternatives, and develop a deeper understanding of experimental design.

Authors:Weiqing Li, Yue Xu, Yuefeng Li, Yinghui Huang
Title: Display Content, Display Methods and Evaluation Methods of the HCI in Explainable Recommender Systems: A Survey
Abstract:
Explainable Recommender Systems (XRS) aim to provide users with understandable reasons for the recommendations generated by these systems, representing a crucial research direction in artificial intelligence (AI). Recent research has increasingly focused on the algorithms, display, and evaluation methodologies of XRS. While current research and reviews primarily emphasize the algorithmic aspects, with fewer studies addressing the Human-Computer Interaction (HCI) layer of XRS. Additionally, existing reviews lack a unified taxonomy for XRS and there is insufficient attention given to the emerging area of short video recommendations. In this study, we synthesize existing literature and surveys on XRS, presenting a unified framework for its research and development. The main contributions are as follows: 1) We adopt a lifecycle perspective to systematically summarize the technologies and methods used in XRS, addressing challenges posed by the diversity and complexity of algorithmic models and explanation techniques. 2) For the first time, we highlight the application of multimedia, particularly video-based explanations, along with its potential, technical pathways, and challenges in XRS. 3) We provide a structured overview of evaluation methods from both qualitative and quantitative dimensions. These findings provide valuable insights for the systematic design, progress, and testing of XRS.

Authors:Pawel Chodkiewicz, Pragya Verma, Grischa Liebel
Title: CoVoL: A Cooperative Vocabulary Learning Game for Children with Autism
Abstract:
Children with Autism commonly face difficulties in vocabulary acquisition, which can have an impact on their social communication. Using digital tools for vocabulary learning can prove beneficial for these children, as they can provide a predictable environment and effective individualized feedback. While existing work has explored the use of technology-assisted vocabulary learning for children with Autism, no study has incorporated turn-taking to facilitate learning and use of vocabulary similar to that used in real-world social contexts. To address this gap, we propose the design of a cooperative two-player vocabulary learning game, CoVoL. CoVoL allows children to engage in game-based vocabulary learning useful for real-world social communication scenarios. We discuss our first prototype and its evaluation. Additionally, we present planned features which are based on feedback obtained through ten interviews with researchers and therapists, as well as an evaluation plan for the final release of CoVoL.

Authors:Quentin Romero Lauro, Aakash Gautam, Yasmine Kotturi
Title: BizChat: Scaffolding AI-Powered Business Planning for Small Business Owners Across Digital Skill Levels
Abstract:
Generative AI can help small business owners automate tasks, increase efficiency, and improve their bottom line. However, despite the seemingly intuitive design of systems like ChatGPT, significant barriers remain for those less comfortable with technology. To address these disparities, prior work highlights accessory skills -- beyond prompt engineering -- users must master to successfully adopt generative AI including keyboard shortcuts, editing skills, file conversions, and browser literacy. Building on a design workshop series and 15 interviews with small businesses, we introduce BizChat, a large language model (LLM)-powered web application that helps business owners across digital skills levels write their business plan -- an essential but often neglected document. To do so, BizChat's interface embodies three design considerations inspired by learning sciences: ensuring accessibility to users with less digital skills while maintaining extensibility to power users ("low-floor-high-ceiling"), providing in situ micro-learning to support entrepreneurial education ("just-in-time learning"), and framing interaction around business activities ("contextualized technology introduction"). We conclude with plans for a future BizChat deployment.

Authors:Nandini Doreswamy, Louise Horstmanshof
Title: A Comparison Between Human and Generative AI Decision-Making Attributes in Complex Health Services
Abstract:
A comparison between human and Generative AI decision-making attributes in complex health services is a knowledge gap in the literature, at present. Humans may possess unique attributes beneficial to decision-making in complex health services such as health policy and health regulation, but are also susceptible to decision-making flaws. The objective is to explore whether humans have unique, and/or helpful attributes that contribute to optimal decision-making in complex health services. This comparison may also shed light on whether humans are likely to compete, cooperate, or converge with Generative AI. The comparison is based on two published reviews: a scoping review of human attributes [1] and a rapid review of Generative AI attributes [2]. The analysis categorizes attributes by uniqueness and impact. The results are presented in tabular form, comparing the sets and subsets of human and Generative AI attributes. Humans and Generative AI decision-making attributes have complementary strengths. Cooperation between these two entities seems more likely than pure competition. To maintain meaningful decision-making roles, humans could develop their unique attributes, with decision-making systems integrating both human and Generative AI contributions. These entities may also converge, in future.

Authors:Sian Lee, Haeseung Seo, Aiping Xiong, Dongwon Lee
Title: Partisan Fact-Checkers' Warnings Can Effectively Correct Individuals' Misbeliefs About Political Misinformation
Abstract:
Political misinformation, particularly harmful when it aligns with individuals' preexisting beliefs and political ideologies, has become widespread on social media platforms. In response, platforms like Facebook and X introduced warning messages leveraging fact-checking results from third-party fact-checkers to alert users against false content. However, concerns persist about the effectiveness of these fact-checks, especially when fact-checkers are perceived as politically biased. To address these concerns, this study presents findings from an online human-subject experiment (N=216) investigating how the political stances of fact-checkers influence their effectiveness in correcting misbeliefs about political misinformation. Our findings demonstrate that partisan fact-checkers can decrease the perceived accuracy of political misinformation and correct misbeliefs without triggering backfire effects. This correction is even more pronounced when the misinformation aligns with individuals' political ideologies. Notably, while previous research suggests that fact-checking warnings are less effective for conservatives than liberals, our results suggest that explicitly labeled partisan fact-checkers, positioned as political counterparts to conservatives, are particularly effective in reducing conservatives' misbeliefs toward pro-liberal misinformation.

Authors:Arnav Verma, Judith E. Fan
Title: Measuring and predicting variation in the difficulty of questions about data visualizations
Abstract:
Understanding what is communicated by data visualizations is a critical component of scientific literacy in the modern era. However, it remains unclear why some tasks involving data visualizations are more difficult than others. Here we administered a composite test composed of five widely used tests of data visualization literacy to a large sample of U.S. adults (N=503 participants).We found that items in the composite test spanned the full range of possible difficulty levels, and that our estimates of item-level difficulty were highly reliable. However, the type of data visualization shown and the type of task involved only explained a modest amount of variation in performance across items, relative to the reliability of the estimates we obtained. These results highlight the need for finer-grained ways of characterizing these items that predict the reliable variation in difficulty measured in this study, and that generalize to other tests of data visualization understanding.

Authors:Matthew Russell, Samuel Youkeles, William Xia, Kenny Zheng, Aman Shah, Robert J. K. Jacob
Title: Neural Signatures Within and Between Chess Puzzle Solving and Standard Cognitive Tasks for Brain-Computer Interfaces: A Low-Cost Electroencephalography Study
Abstract:
Consumer-grade electroencephalography (EEG) devices show promise for Brain-Computer Interface (BCI) applications, but their efficacy in detecting subtle cognitive states remains understudied. We developed a comprehensive study paradigm which incorporates a combination of established cognitive tasks (N-Back, Stroop, and Mental Rotation) and adds a novel ecological Chess puzzles task. We tested our paradigm with the MUSE 2, a low-cost consumer-grade EEG device. Using linear mixed-effects modeling we demonstrate successful distinctions of within-task workload levels and cross-task cognitive states based on the spectral power data derived from the MUSE 2 device. With machine learning we further show reliable predictive power to differentiate between workload levels in the N-Back task, and also achieve effective cross-task classification. These findings demonstrate that consumer-grade EEG devices like the MUSE 2 can be used to effectively differentiate between various levels of cognitive workload as well as among more nuanced task-based cognitive states, and that these tools can be leveraged for real-time adaptive BCI applications in practical settings.

Authors:David Leimstädtner, Fatima Halzl-Yürek, Claudia Spies, Claudia Müller-Birn
Title: Design Requirements for Patient-Centered Digital Health Applications: Supporting Patients' Values in Postoperative Delirium Prevention
Abstract:
Postoperative delirium (POD) is among the most common complications after surgeries for older adults and can entail long-term adverse health consequences. Active patient participation in POD prevention presents a central factor in reducing these risks. To support patient engagement through a digital health application, we use value sensitive design approaches to identify the requirements for a patient-centered digital health application supporting patient engagement in POD prevention. Through interviews with medical professionals and patient representatives, we construct a patient journey, which serves as the basis for twelve patient value journey interviews. In these interviews, patients from the high-risk group for POD revisit their recent experience of undergoing surgery to elicit barriers, needs, and values concerning POD prevention from a patient perspective. An analysis of the patient interviews derives four design requirements for a digital health application supporting patients regarding POD prevention: the adaptation of patient-centered communication, the provision of procedural transparency, fostering patient empowerment through consistent guidance, and explicitly addressing relatives as mediators and supporters for a patient after a POD occurrence.

Authors:Jannatun Naim, Jie Cao, Fareen Tasneem, Jennifer Jacobs, Brent Milne, James Martin, Tamara Sumner
Title: Towards Actionable Pedagogical Feedback: A Multi-Perspective Analysis of Mathematics Teaching and Tutoring Dialogue
Abstract:
Effective feedback is essential for refining instructional practices in mathematics education, and researchers often turn to advanced natural language processing (NLP) models to analyze classroom dialogues from multiple perspectives. However, utterance-level discourse analysis encounters two primary challenges: (1) multifunctionality, where a single utterance may serve multiple purposes that a single tag cannot capture, and (2) the exclusion of many utterances from domain-specific discourse move classifications, leading to their omission in feedback. To address these challenges, we proposed a multi-perspective discourse analysis that integrates domain-specific talk moves with dialogue act (using the flattened multi-functional SWBD-MASL schema with 43 tags) and discourse relation (applying Segmented Discourse Representation Theory with 16 relations). Our top-down analysis framework enables a comprehensive understanding of utterances that contain talk moves, as well as utterances that do not contain talk moves. This is applied to two mathematics education datasets: TalkMoves (teaching) and SAGA22 (tutoring). Through distributional unigram analysis, sequential talk move analysis, and multi-view deep dive, we discovered meaningful discourse patterns, and revealed the vital role of utterances without talk moves, demonstrating that these utterances, far from being mere fillers, serve crucial functions in guiding, acknowledging, and structuring classroom discourse. These insights underscore the importance of incorporating discourse relations and dialogue acts into AI-assisted education systems to enhance feedback and create more responsive learning environments. Our framework may prove helpful for providing human educator feedback, but also aiding in the development of AI agents that can effectively emulate the roles of both educators and students.

Authors:Mathyas Giudici, Samuele Scherini, Pascal Chaussumier, Stefano Ginocchio, Franca Garzotto
Title: Exploring Anthropomorphism in Conversational Agents for Environmental Sustainability
Abstract:
The paper investigates the integration of Large Language Models (LLMs) into Conversational Agents (CAs) to encourage a shift in consumption patterns from a demand-driven to a supply-based paradigm. Specifically, the research examines the role of anthropomorphic design in delivering environmentally conscious messages by comparing two CA designs: a personified agent representing an appliance and a traditional, non-personified assistant. A lab study (N=26) assessed the impact of these designs on interaction, perceived self-efficacy, and engagement. Results indicate that LLM-based CAs significantly enhance users' self-reported eco-friendly behaviors, with participants expressing greater confidence in managing energy consumption. While the anthropomorphic design did not notably affect self-efficacy, those interacting with the personified agent reported a stronger sense of connection with the system. These findings suggest that although anthropomorphic CAs may improve user engagement, both designs hold promise for fostering sustainable behaviors in home energy management.

Authors:Linxuan Huang, Dong-Fan Xie, Li Li, Zhengbing He
Title: A Survey on Data-Driven Modeling of Human Drivers' Lane-Changing Decisions
Abstract:
Lane-changing (LC) behavior, a critical yet complex driving maneuver, significantly influences driving safety and traffic dynamics. Traditional analytical LC decision (LCD) models, while effective in specific environments, often oversimplify behavioral heterogeneity and complex interactions, limiting their capacity to capture real LCD. Data-driven approaches address these gaps by leveraging rich empirical data and machine learning to decode latent decision-making patterns, enabling adaptive LCD modeling in dynamic environments. In light of the rapid development of artificial intelligence and the demand for data-driven models oriented towards connected vehicles and autonomous vehicles, this paper presents a comprehensive survey of data-driven LCD models, with a particular focus on human drivers LC decision-making. It systematically reviews the modeling framework, covering data sources and preprocessing, model inputs and outputs, objectives, structures, and validation methods. This survey further discusses the opportunities and challenges faced by data-driven LCD models, including driving safety, uncertainty, as well as the integration and improvement of technical frameworks.

Authors:Somayeh Molaei, Lionel P. Robert, Nikola Banovic
Title: What Do People Want to Know About Artificial Intelligence (AI)? The Importance of Answering End-User Questions to Explain Autonomous Vehicle (AV) Decisions
Abstract:
Improving end-users' understanding of decisions made by autonomous vehicles (AVs) driven by artificial intelligence (AI) can improve utilization and acceptance of AVs. However, current explanation mechanisms primarily help AI researchers and engineers in debugging and monitoring their AI systems, and may not address the specific questions of end-users, such as passengers, about AVs in various scenarios. In this paper, we conducted two user studies to investigate questions that potential AV passengers might pose while riding in an AV and evaluate how well answers to those questions improve their understanding of AI-driven AV decisions. Our initial formative study identified a range of questions about AI in autonomous driving that existing explanation mechanisms do not readily address. Our second study demonstrated that interactive text-based explanations effectively improved participants' comprehension of AV decisions compared to simply observing AV decisions. These findings inform the design of interactions that motivate end-users to engage with and inquire about the reasoning behind AI-driven AV decisions.

Authors:Tongfei Bian, Mathieu Chollet, Tanaya Guha
Title: Robust Understanding of Human-Robot Social Interactions through Multimodal Distillation
Abstract:
The need for social robots and agents to interact and assist humans is growing steadily. To be able to successfully interact with humans, they need to understand and analyse socially interactive scenes from their (robot's) perspective. Works that model social situations between humans and agents are few; and even those existing ones are often too computationally intensive to be suitable for deployment in real time or on real world scenarios with limited available information. We propose a robust knowledge distillation framework that models social interactions through various multimodal cues, yet is robust against incomplete and noisy information during inference. Our teacher model is trained with multimodal input (body, face and hand gestures, gaze, raw images) that transfers knowledge to a student model that relies solely on body pose. Extensive experiments on two publicly available human-robot interaction datasets demonstrate that the our student model achieves an average accuracy gain of 14.75\% over relevant baselines on multiple downstream social understanding task even with up to 51\% of its input being corrupted. The student model is highly efficient: it is $<1$\% in size of the teacher model in terms of parameters and uses $\sim 0.5$\textperthousand~FLOPs of that in the teacher model. Our code will be made public during publication.

Authors:Sora Kang, Joonhwan Lee
Title: Theatrical Language Processing: Exploring AI-Augmented Improvisational Acting and Scriptwriting with LLMs
Abstract:
The increasing convergence of artificial intelligence has opened new avenues, including its emerging role in enhancing creativity. It is reshaping traditional creative practices such as actor improvisation, which often struggles with predictable patterns, limited interaction, and a lack of engaging stimuli. In this paper, we introduce a new concept, Theatrical Language Processing (TLP), and an AI-driven creativity support tool, Scribble$.$ai, designed to augment actors' creative expression and spontaneity through interactive practice. We conducted a user study involving tests and interviews with fourteen participants. Our findings indicate that: (1) Actors expanded their creativity when faced with AI-produced irregular scenarios; (2) The AI's unpredictability heightened their problem-solving skills, specifically in interpreting unfamiliar situations; (3) However, AI often generated excessively detailed scripts, which limited interpretive freedom and hindered subtext exploration. Based on these findings, we discuss the new potential in enhancing creative expressions in film and theater studies through an AI-driven tool.

Authors:Nicky Mol, J. Micah Prendergast, David A. Abbink, Luka Peternel
Title: Fitts' List Revisited: An Empirical Study on Function Allocation in a Two-Agent Physical Human-Robot Collaborative Position/Force Task
Abstract:
In this letter, we investigate whether the classical function allocation holds for physical Human-Robot Collaboration, which is important for providing insights for Industry 5.0 to guide how to best augment rather than replace workers. This study empirically tests the applicability of Fitts' List within physical Human-Robot Collaboration, by conducting a user study (N=26, within-subject design) to evaluate four distinct allocations of position/force control between human and robot in an abstract blending task. We hypothesize that the function in which humans control the position achieves better performance and receives higher user ratings. When allocating position control to the human and force control to the robot, compared to the opposite case, we observed a significant improvement in preventing overblending. This was also perceived better in terms of physical demand and overall system acceptance, while participants experienced greater autonomy, more engagement and less frustration. An interesting insight was that the supervisory role (when the robot controls both position and force control) was rated second best in terms of subjective acceptance. Another surprising insight was that if position control was delegated to the robot, the participants perceived much lower autonomy than when the force control was delegated to the robot. These findings empirically support applying Fitts' principles to static function allocation for physical collaboration, while also revealing important nuanced user experience trade-offs, particularly regarding perceived autonomy when delegating position control.

Authors:Demetrius Hernandez, Jane Cleland-Huang
Title: Runtime Advocates: A Persona-Driven Framework for Requirements@Runtime Decision Support
Abstract:
Complex systems, such as small Uncrewed Aerial Systems (sUAS) swarms dispatched for emergency response, often require dynamic reconfiguration at runtime under the supervision of human operators. This introduces human-on-the-loop requirements, where evolving needs shape ongoing system functionality and behaviors. While traditional personas support upfront, static requirements elicitation, we propose a persona-based advocate framework for runtime requirements engineering to provide ethically informed, safety-driven, and regulatory-aware decision support. Our approach extends standard personas into event-driven personas. When triggered by events such as adverse environmental conditions, evolving mission state, or operational constraints, the framework updates the sUAS operator's view of the personas, ensuring relevance to current conditions. We create three key advocate personas, namely Safety Controller, Ethical Governor, and Regulatory Auditor, to manage trade-offs among risk, ethical considerations, and regulatory compliance. We perform a proof-of-concept validation in an emergency response scenario using sUAS, showing how our advocate personas provide context-aware guidance grounded in safety, regulatory, and ethical constraints. By evolving static, design-time personas into adaptive, event-driven advocates, the framework surfaces mission-critical runtime requirements in response to changing conditions. These requirements shape operator decisions in real time, aligning actions with the operational demands of the moment.

Authors:Madhav Sachdeva, Christopher Narayanan, Marvin Wiedenkeller, Jana Sedlakova, Jürgen Bernard
Title: A Design Space for the Critical Validation of LLM-Generated Tabular Data
Abstract:
LLM-generated tabular data is creating new opportunities for data-driven applications in academia, business, and society. To leverage benefits like missing value imputation, labeling, and enrichment with context-aware attributes, LLM-generated data needs a critical validation process. The number of pioneering approaches is increasing fast, opening a promising validation space that, so far, remains unstructured. We present a design space for the critical validation of LLM-generated tabular data with two dimensions: First, the Analysis Granularity dimension: from within-attribute (single-item and multi-item) to across-attribute perspectives (1 x 1, 1 x m, and n x n). Second, the Data Source dimension: differentiating between LLM-generated values, ground truth values, explanations, and their combinations. We discuss analysis tasks for each dimension cross-cut, map 19 existing validation approaches, and discuss the characteristics of two approaches in detail, demonstrating descriptive power.

Authors:Yurina Mizuho, Yuta Sugiura
Title: Practice Support for Violin Bowing by Measuring Bow Pressure and Position
Abstract:
The violin is one of the most popular musical instruments. Various parameters of bowing motion, such as pressure, position, and speed, are crucial for producing a beautiful tone. However, mastering them is challenging and requires extensive practice. In this study, we aimed to support practice of bowing, focusing on bow pressure. First, we compared the bowing movements, specifically bow pressure, bow position, and bow speed, of eight experienced players with those of eight beginners. Next, we developed and evaluated a visual feedback system that displays bow pressure to support practice. We taught the identified differences to 14 beginners, dividing them into two groups: one practiced with an explanation, and the other with both an explanation and a feedback system. These two experiments found that clarifying the characteristics unique to experienced players can support practice.

Authors:Mats Ole Ellenberg, Katja Krug
Title: Improving Inclusivity for Emotion Recognition Based on Face Tracking
Abstract:
The limited expressiveness of virtual user representations in Mixed Reality and Virtual Reality can inhibit an integral part of communication: emotional expression. Emotion recognition based on face tracking is often used to compensate for this. However, emotional facial expressions are highly individual, which is why many approaches have difficulties recognizing unique variations of emotional expressions. We propose several strategies to improve face tracking systems for emotion recognition with and without user intervention for the Affective Interaction Workshop at CHI '25.

Authors:Stefania Druga, Amy J. Ko
Title: Scratch Copilot: Supporting Youth Creative Coding with AI
Abstract:
Creative coding platforms like Scratch have democratized programming for children, yet translating imaginative ideas into functional code remains a significant hurdle for many young learners. While AI copilots assist adult programmers, few tools target children in block-based environments. Building on prior research \cite{druga_how_2021,druga2023ai, druga2023scratch}, we present Cognimates Scratch Copilot: an AI-powered assistant integrated into a Scratch-like environment, providing real-time support for ideation, code generation, debugging, and asset creation. This paper details the system architecture and findings from an exploratory qualitative evaluation with 18 international children (ages 7--12). Our analysis reveals how the AI Copilot supported key creative coding processes, particularly aiding ideation and debugging. Crucially, it also highlights how children actively negotiated the use of AI, demonstrating strong agency by adapting or rejecting suggestions to maintain creative control. Interactions surfaced design tensions between providing helpful scaffolding and fostering independent problem-solving, as well as learning opportunities arising from navigating AI limitations and errors. Findings indicate Cognimates Scratch Copilot's potential to enhance creative self-efficacy and engagement. Based on these insights, we propose initial design guidelines for AI coding assistants that prioritize youth agency and critical interaction alongside supportive scaffolding.

Authors:Yixiong Hao, Ayush Panda, Stepan Shabalin, Sheikh Abdur Raheem Ali
Title: Patterns and Mechanisms of Contrastive Activation Engineering
Abstract:
Controlling the behavior of Large Language Models (LLMs) remains a significant challenge due to their inherent complexity and opacity. While techniques like fine-tuning can modify model behavior, they typically require extensive computational resources. Recent work has introduced a class of contrastive activation engineering (CAE) techniques as promising approaches for steering LLM outputs through targeted modifications to their internal representations. Applied at inference-time with zero cost, CAE has the potential to introduce a new paradigm of flexible, task-specific LLM behavior tuning. We analyze the performance of CAE in in-distribution, out-of-distribution settings, evaluate drawbacks, and begin to develop comprehensive guidelines for its effective deployment. We find that 1. CAE is only reliably effective when applied to in-distribution contexts. 2. Increasing the number of samples used to generate steering vectors has diminishing returns at around 80 samples. 3. Steering vectors are susceptible to adversarial inputs that reverses the behavior that is steered for. 4. Steering vectors harm the overall model perplexity. 5. Larger models are more resistant to steering-induced degradation.

Authors:Katherine Fennedy, Brian Hilburn, Thaivalappil N. M. Nadirsha, Sameer Alam, Khanh-Duy Le, Hua Li
Title: Do ATCOs Need Explanations, and Why? Towards ATCO-Centered Explainable AI for Conflict Resolution Advisories
Abstract:
Interest in explainable artificial intelligence (XAI) is surging. Prior research has primarily focused on systems' ability to generate explanations, often guided by researchers' intuitions rather than end-users' needs. Unfortunately, such approaches have not yielded favorable outcomes when compared to a black-box baseline (i.e., no explanation). To address this gap, this paper advocates a human-centered approach that shifts focus to air traffic controllers (ATCOs) by asking a fundamental yet overlooked question: Do ATCOs need explanations, and if so, why? Insights from air traffic management (ATM), human-computer interaction, and the social sciences were synthesized to provide a holistic understanding of XAI challenges and opportunities in ATM. Evaluating 11 ATM operational goals revealed a clear need for explanations when ATCOs aim to document decisions and rationales for future reference or report generation. Conversely, ATCOs are less likely to seek them when their conflict resolution approach align with the artificial intelligence (AI) advisory. While this is a preliminary study, the findings are expected to inspire broader and deeper inquiries into the design of ATCO-centric XAI systems, paving the way for more effective human-AI interaction in ATM.

Authors:Karina LaRubbio, Malcolm Grba, Diana Freed
Title: Navigating Privacy and Trust: AI Assistants as Social Support for Older Adults
Abstract:
AI assistants are increasingly integrated into older adults' daily lives, offering new opportunities for social support and accessibility while raising important questions about privacy, autonomy, and trust. As these systems become embedded in caregiving and social networks, older adults must navigate trade-offs between usability, data privacy, and personal agency across different interaction contexts. Although prior work has explored AI assistants' potential benefits, further research is needed to understand how perceived usefulness and risk shape adoption and engagement. This paper examines these dynamics and advocates for participatory design approaches that position older adults as active decision makers in shaping AI assistant functionality. By advancing a framework for privacy-aware, user-centered AI design, this work contributes to ongoing discussions on developing ethical and transparent AI systems that enhance well-being without compromising user control.

Authors:Olga Mironenko, Hadi Banaee, Amy Loutfi
Title: Evaluation of Coordination Strategies for Underground Automated Vehicle Fleets in Mixed Traffic
Abstract:
This study investigates the efficiency and safety outcomes of implementing different adaptive coordination models for automated vehicle (AV) fleets, managed by a centralized coordinator that dynamically responds to human-controlled vehicle behavior. The simulated scenarios replicate an underground mining environment characterized by narrow tunnels with limited connectivity. To address the unique challenges of such settings, we propose a novel metric - Path Overlap Density (POD) - to predict efficiency and potentially the safety performance of AV fleets. The study also explores the impact of map features on AV fleets performance. The results demonstrate that both AV fleet coordination strategies and underground tunnel network characteristics significantly influence overall system performance. While map features are critical for optimizing efficiency, adaptive coordination strategies are essential for ensuring safe operations.

Authors:Mathyas Giudici, Alessandro Sironi, Ismaele Villa, Samuele Scherini, Franca Garzotto
Title: Generating HomeAssistant Automations Using an LLM-based Chatbot
Abstract:
To combat climate change, individuals are encouraged to adopt sustainable habits, in particular, with their household, optimizing their electrical consumption. Conversational agents, such as Smart Home Assistants, hold promise as effective tools for promoting sustainable practices within households. Our research investigated the application of Large Language Models (LLM) in enhancing smart home automation and promoting sustainable household practices, specifically using the HomeAssistant framework. In particular, it highlights the potential of GPT models in generating accurate automation routines. While the LLMs showed proficiency in understanding complex commands and creating valid JSON outputs, challenges such as syntax errors and message malformations were noted, indicating areas for further improvement. Still, despite minimal quantitative differences between "green" and "no green" prompts, qualitative feedback highlighted a positive shift towards sustainability in the routines generated with environmentally focused prompts. Then, an empirical evaluation (N=56) demonstrated that the system was well-received and found engaging by users compared to its traditional rule-based counterpart. Our findings highlight the role of LLMs in advancing smart home technologies and suggest further research to refine these models for broader, real-world applications to support sustainable living.

Authors:Avraham Rahimov, Orel Zamler, Amos Azaria
Title: The Turing Test Is More Relevant Than Ever
Abstract:
The Turing Test, first proposed by Alan Turing in 1950, has historically served as a benchmark for evaluating artificial intelligence (AI). However, since the release of ELIZA in 1966, and particularly with recent advancements in large language models (LLMs), AI has been claimed to pass the Turing Test. Furthermore, criticism argues that the Turing Test primarily assesses deceptive mimicry rather than genuine intelligence, prompting the continuous emergence of alternative benchmarks. This study argues against discarding the Turing Test, proposing instead using more refined versions of it, for example, by interacting simultaneously with both an AI and human candidate to determine who is who, allowing a longer interaction duration, access to the Internet and other AIs, using experienced people as evaluators, etc. Through systematic experimentation using a web-based platform, we demonstrate that richer, contextually structured testing environments significantly enhance participants' ability to differentiate between AI and human interactions. Namely, we show that, while an off-the-shelf LLM can pass some version of a Turing Test, it fails to do so when faced with a more robust version. Our findings highlight that the Turing Test remains an important and effective method for evaluating AI, provided it continues to adapt as AI technology advances. Additionally, the structured data gathered from these improved interactions provides valuable insights into what humans expect from truly intelligent AI systems.

Authors:Jonathan Lynn, Rachel Y. Kim, Sicun Gao, Daniel Schneider, Sachin S. Pandya, Min Kyung Lee
Title: Regulating Algorithmic Management: A Multi-Stakeholder Study of Challenges in Aligning Software and the Law for Workplace Scheduling
Abstract:
Algorithmic management (AM)'s impact on worker well-being has led to calls for regulation. However, little is known about the effectiveness and challenges in real-world AM regulation across the regulatory process -- rule operationalization, software use, and enforcement. Our multi-stakeholder study addresses this gap within workplace scheduling, one of the few AM domains with implemented regulations. We interviewed 38 stakeholders across the regulatory process: regulators, defense attorneys, worker advocates, managers, and workers. Our findings suggest that the efficacy of AM regulation is influenced by: (i) institutional constraints that challenge efforts to encode law into AM software, (ii) on-the-ground use of AM software that shapes its ability to facilitate compliance, (iii) mismatches between software and regulatory contexts that hinder enforcement, and (iv) unique concerns that software introduces when used to regulate AM. These findings underscore the importance of a sociotechnical approach to AM regulation, which considers organizational and collaborative contexts alongside the inherent attributes of software. We offer future research directions and implications for technology policy and design.

Authors:Chan Chea Mean, Sameer Alam, Katherine Fennedy, Meng-Hsueh Hsieh, Shiwei Xin, Brian Hilburn
Title: Evaluating Input Modalities for Pilot-Centered Taxiway Navigation: Insights from a Wizard-of-Oz Simulation
Abstract:
Runway and taxiway incursions continue to challenge aviation safety, as pilots often experience disorientation from poor visibility in adverse conditions and cognitive workload in complex airport layouts. Current tools, such as airport moving maps on portable tablets, allow manual route planning but do not dynamically adapt to air traffic controllers' (ATCOs) clearances, limiting their effectiveness in high-stress scenarios. This study investigates the impact of different input modalities - paper-based, keyboard touch, map touch, and speech-to-text - on taxiway navigation performance, using a medium-fidelity flight simulator and a Wizard-of-Oz methodology to simulate ideal automation conditions. Contrary to common assumptions, recent studies indicate that paper-based methods outperform digital counterparts in accuracy and efficiency under certain conditions, highlighting critical limitations in current automation strategies. In response, our study investigates why manual methods may excel and how future automation can be optimized for pilot-centered operations. Employing a Wizard-of-Oz approach, we replicated the full taxiing process - from receiving ATCO clearances to executing maneuvers - and differentiated between readback and execution accuracy. Findings reveal that speech-based systems suffer from low pilot trust, necessitating hybrid solutions that integrate error correction and confidence indicators. These insights contribute to the development of future pilot-centered taxiway assistance that enhance situational awareness, minimize workload, and improve overall operational safety.

Authors:Kola Ayonrinde, Louis Jaburi
Title: Evaluating Explanations: An Explanatory Virtues Framework for Mechanistic Interpretability -- The Strange Science Part I.ii
Abstract:
Mechanistic Interpretability (MI) aims to understand neural networks through causal explanations. Though MI has many explanation-generating methods, progress has been limited by the lack of a universal approach to evaluating explanations. Here we analyse the fundamental question "What makes a good explanation?" We introduce a pluralist Explanatory Virtues Framework drawing on four perspectives from the Philosophy of Science - the Bayesian, Kuhnian, Deutschian, and Nomological - to systematically evaluate and improve explanations in MI. We find that Compact Proofs consider many explanatory virtues and are hence a promising approach. Fruitful research directions implied by our framework include (1) clearly defining explanatory simplicity, (2) focusing on unifying explanations and (3) deriving universal principles for neural networks. Improved MI methods enhance our ability to monitor, predict, and steer AI systems.

Authors:Federico Maria Cau, Lucio Davide Spano
Title: Exploring the Impact of Explainable AI and Cognitive Capabilities on Users' Decisions
Abstract:
Artificial Intelligence (AI) systems are increasingly used for decision-making across domains, raising debates over the information and explanations they should provide. Most research on Explainable AI (XAI) has focused on feature-based explanations, with less attention on alternative styles. Personality traits like the Need for Cognition (NFC) can also lead to different decision-making outcomes among low and high NFC individuals. We investigated how presenting AI information (prediction, confidence, and accuracy) and different explanation styles (example-based, feature-based, rule-based, and counterfactual) affect accuracy, reliance on AI, and cognitive load in a loan application scenario. We also examined low and high NFC individuals' differences in prioritizing XAI interface elements (loan attributes, AI information, and explanations), accuracy, and cognitive load. Our findings show that high AI confidence significantly increases reliance on AI while reducing cognitive load. Feature-based explanations did not enhance accuracy compared to other conditions. Although counterfactual explanations were less understandable, they enhanced overall accuracy, increasing reliance on AI and reducing cognitive load when AI predictions were correct. Both low and high NFC individuals prioritized explanations after loan attributes, leaving AI information as the least important. However, we found no significant differences between low and high NFC groups in accuracy or cognitive load, raising questions about the role of personality traits in AI-assisted decision-making. These findings highlight the need for user-centric personalization in XAI interfaces, incorporating diverse explanation styles and exploring multiple personality traits and other user characteristics to optimize human-AI collaboration.

Authors:Yujie Tao, Libby Ye, Jeremy N. Bailenson, Sean Follmer
Title: Audio Personas: Augmenting Social Perception via Body-Anchored Audio Cues
Abstract:
We introduce Audio Personas, enabling users to "decorate" themselves with body-anchored sounds in audio augmented reality. Like outfits, makeup, and fragrances, audio personas offer an alternative yet dynamic channel to augment face-to-face interactions. For instance, one can set their audio persona as rain sounds to reflect a bad mood, bee sounds to establish personal boundaries, or a playful "woosh" sound to mimic passing by someone like a breeze. To instantiate the concept, we implemented a headphone-based prototype with multi-user tracking and audio streaming. Our preregistered in-lab study with 64 participants showed that audio personas influenced how participants formed impressions. Individuals with positive audio personas were rated as more socially attractive, more likable, and less threatening than those with negative audio personas. Our study with audio designers revealed that audio personas were preferred in public and semi-public-private spaces for managing social impressions (e.g., personality) and signaling current states (e.g., emotions).

Authors:Stanislava Gardasevic, Manika Lamba, Jasmine S. Malone
Title: Co-Designing a Knowledge Graph Navigation Interface: A Participatory Approach
Abstract:
Navigating and visualizing multilayered knowledge graphs remains a challenging, unresolved problem in information systems design. Building on our earlier study, which engaged end users in both the design and population of a domain-specific knowledge graph, we now focus on translating their insights into actionable interface guidelines. In this paper, we synthesize recommendations drawn from a participatory workshop with doctoral students. We then demonstrate how these recommendations inform the design of a prototype interface. Finally, we found that a participatory iterative design approach can help designers in decision making, leading to interfaces that are both innovative and user-centric. By combining user-driven requirements with proven visualization techniques, this paper presents a coherent framework for guiding future development of knowledge-graph navigation tools.

Authors:Jeffrey Basoah, Jay L. Cunningham, Erica Adams, Alisha Bose, Aditi Jain, Kaustubh Yadav, Zhengyang Yang, Katharina Reinecke, Daniela Rosner
Title: Should AI Mimic People? Understanding AI-Supported Writing Technology Among Black Users
Abstract:
AI-supported writing technologies (AISWT) that provide grammatical suggestions, autocomplete sentences, or generate and rewrite text are now a regular feature integrated into many people's workflows. However, little is known about how people perceive the suggestions these tools provide. In this paper, we investigate how Black American users perceive AISWT, motivated by prior findings in natural language processing that highlight how the underlying large language models can contain racial biases. Using interviews and observational user studies with 13 Black American users of AISWT, we found a strong tradeoff between the perceived benefits of using AISWT to enhance their writing style and feeling like "it wasn't built for us". Specifically, participants reported AISWT's failure to recognize commonly used names and expressions in African American Vernacular English, experiencing its corrections as hurtful and alienating and fearing it might further minoritize their culture. We end with a reflection on the tension between AISWT that fail to include Black American culture and language, and AISWT that attempt to mimic it, with attention to accuracy, authenticity, and the production of social difference.

Authors:Feiyu Lu, Mengyu Chen, Hsiang Hsu, Pranav Deshpande, Cheng Yao Wang, Blair MacIntyre
Title: Adaptive 3D UI Placement in Mixed Reality Using Deep Reinforcement Learning
Abstract:
Mixed Reality (MR) could assist users' tasks by continuously integrating virtual content with their view of the physical environment. However, where and how to place these content to best support the users has been a challenging problem due to the dynamic nature of MR experiences. In contrast to prior work that investigates optimization-based methods, we are exploring how reinforcement learning (RL) could assist with continuous 3D content placement that is aware of users' poses and their surrounding environments. Through an initial exploration and preliminary evaluation, our results demonstrate the potential of RL to position content that maximizes the reward for users on the go. We further identify future directions for research that could harness the power of RL for personalized and optimized UI and content placement in MR.

Authors:Dabbrata Das, Argho Deb Das, Farhan Sadaf
Title: Real-Time Wayfinding Assistant for Blind and Low-Vision Users
Abstract:
Navigating unfamiliar places continues to be one of the most persistent and essential everyday obstacles for those who are blind or have limited vision (BLV). Existing assistive technologies, such as GPS-based navigation systems, AI-powered smart glasses, and sonar-equipped canes, often face limitations in real-time obstacle avoidance, precise localization, and adaptability to dynamic surroundings. To investigate potential solutions, we introduced PathFinder, a novel map-less navigation system that explores different models for understanding 2D images, including Vision Language Models (VLMs), Large Language Models (LLMs), and employs monocular depth estimation for free-path detection. Our approach integrates a Depth-First Search (DFS) algorithm on depth images to determine the longest obstacle-free path, ensuring optimal route selection while maintaining computational efficiency. We conducted comparative evaluations against existing AI-powered navigation methods and performed a usability study with BLV participants. The results demonstrate that PathFinder achieves a favorable balance between accuracy, computational efficiency, and real-time responsiveness. Notably, it reduces mean absolute error (MAE) and improves decision-making speed in outdoor navigation compared to AI-based alternatives. Participant feedback emphasizes the system's usability and effectiveness in outside situations, but also identifies issues in complicated indoor locations and low-light conditions. Usability testing revealed that 73% of participants understood how to use the app in about a minute, and 80% praised its balance of accuracy, quick response, and overall convenience.

Authors:Camille Harris, Clio Andris
Title: Mapping a Movement: Exploring a Proposed Police Training Facility in Atlanta and the Stop Cop City Movement through Online Maps
Abstract:
In 2021, the City of Atlanta and Atlanta Police Foundation launched plans to build a large police training facility in the South River Forest in unincorporated DeKalb County, GA. Residents of Atlanta and DeKalb County, environmental activists, police and prison abolitionists, and other activists and concerned individuals formed the movement in opposition to the facility, known as the Stop Cop City / Defend the Atlanta Forest movement. Social media and digital maps became common tools for communicating information about the facility and the movement. Here, we examine online maps about the facility and the opposition movement, originating from grassroots organizations, the City of Atlanta, news media outlets, the Atlanta Police Foundation, and individuals. We gather and examine 32 publicly available maps collected through the Google Search API, Twitter (now X), Instagram and reddit. Using a framework of critical cartography, we conduct a content analysis of these maps to identify the mapping technologies and techniques (data, cartographic elements, styles) used by different stakeholders and roles that maps and mapping technologies can play in social movements. We examine the extent to which these maps provide data to confirm or contradict concerns raised by grassroots organizations and local residents about the facility. We find that stakeholders and mapmakers use geospatial tools in different ways and likely have varied access to mapping technologies. We argue that documenting the use of maps to communicate information about a contentious project can help enumerate community positions and perspectives, and we advocate for accessible mapmaking tools. We conclude by discussing the implications of accessibility of mapping technology and posting maps to social media, and share example map images that extend the geographic information systems (GIS) techniques seen in the retrieved maps.

Authors:Zhaoyang Jacopo Hu, Haozheng Xu, Sion Kim, Yanan Li, Ferdinando Rodriguez y Baena, Etienne Burdet
Title: Confidence-based Intent Prediction for Teleoperation in Bimanual Robotic Suturing
Abstract:
Robotic-assisted procedures offer enhanced precision, but while fully autonomous systems are limited in task knowledge, difficulties in modeling unstructured environments, and generalisation abilities, fully manual teleoperated systems also face challenges such as delay, stability, and reduced sensory information. To address these, we developed an interactive control strategy that assists the human operator by predicting their motion plan at both high and low levels. At the high level, a surgeme recognition system is employed through a Transformer-based real-time gesture classification model to dynamically adapt to the operator's actions, while at the low level, a Confidence-based Intention Assimilation Controller adjusts robot actions based on user intent and shared control paradigms. The system is built around a robotic suturing task, supported by sensors that capture the kinematics of the robot and task dynamics. Experiments across users with varying skill levels demonstrated the effectiveness of the proposed approach, showing statistically significant improvements in task completion time and user satisfaction compared to traditional teleoperation.

Authors:Joshua Hatherley, Anders Søgaard, Angela Ballantyne, Ruben Pauwels
Title: Federated learning, ethics, and the double black box problem in medical AI
Abstract:
Federated learning (FL) is a machine learning approach that allows multiple devices or institutions to collaboratively train a model without sharing their local data with a third-party. FL is considered a promising way to address patient privacy concerns in medical artificial intelligence. The ethical risks of medical FL systems themselves, however, have thus far been underexamined. This paper aims to address this gap. We argue that medical FL presents a new variety of opacity -- federation opacity -- that, in turn, generates a distinctive double black box problem in healthcare AI. We highlight several instances in which the anticipated benefits of medical FL may be exaggerated, and conclude by highlighting key challenges that must be overcome to make FL ethically feasible in medicine.

Authors:Zafeiria Moumoulidou, Hamza Elhamdadi, Ke Yang, Subrata Mitra, Cindy Xiong Bearfield, Alexandra Meliou
Title: Perception-aware Sampling for Scatterplot Visualizations
Abstract:
Visualizing data is often a crucial first step in data analytics workflows, but growing data sizes pose challenges due to computational and visual perception limitations. As a result, data analysts commonly down-sample their data and work with subsets. Deriving representative samples, however, remains a challenge. This paper focuses on scatterplots, a widely-used visualization type, and introduces a novel sampling objective -- perception-awareness -- aiming to improve sample efficacy by targeting humans' perception of a visualization. We make the following contributions: (1) We propose perception-augmented databases and design PAwS: a novel perception-aware sampling method for scatterplots that leverages saliency maps -- a computer vision tool for predicting areas of attention focus in visualizations -- and models perception-awareness via saliency, density, and coverage objectives. (2) We design ApproPAwS: a fast, perception-aware method for approximate visualizations, which exploits the fact that small visual perturbations are often imperceptible to humans. (3) We introduce the concept of perceptual similarity as a metric for sample quality, and present a novel method that compares saliency maps to measure it. (4) Our extensive experimental evaluation shows that our methods consistently outperform prior art in producing samples with high perceptual similarity, while ApproPAwS achieves up to 100x speed-ups with minimal loss in visual fidelity. Our user study shows that PAwS is often preferred by humans, validating our quantitative findings.

Authors:Stefan Fabian, Oskar von Stryk
Title: Hector UI: A Flexible Human-Robot User Interface for (Semi-)Autonomous Rescue and Inspection Robots
Abstract:
The remote human operator's user interface (UI) is an important link to make the robot an efficient extension of the operator's perception and action. In rescue applications, several studies have investigated the design of operator interfaces based on observations during major robotics competitions or field deployments. Based on this research, guidelines for good interface design were empirically identified. The investigations on the UIs of teams participating in competitions are often based on external observations during UI application, which may miss some relevant requirements for UI flexibility. In this work, we present an open-source and flexibly configurable user interface based on established guidelines and its exemplary use for wheeled, tracked, and walking robots. We explain the design decisions and cover the insights we have gained during its highly successful applications in multiple robotics competitions and evaluations. The presented UI can also be adapted for other robots with little effort and is available as open source.

Authors:Sijia Xiao, Haodi Zou, Amy Mathews, Jingshu Rui, Coye Cheshire, Niloufar Salehi
Title: SnuggleSense: Empowering Online Harm Survivors Through a Structured Sensemaking Process
Abstract:
Online interpersonal harm, such as cyberbullying and sexual harassment, remains a pervasive issue on social media platforms. Traditional approaches, primarily content moderation, often overlook survivors' needs and agency. We introduce SnuggleSense, a system that empowers survivors through structured sensemaking. Inspired by restorative justice practices, SnuggleSense guides survivors through reflective questions, offers personalized recommendations from similar survivors, and visualizes plans using interactive sticky notes. A controlled experiment demonstrates that SnuggleSense significantly enhances sensemaking compared to an unstructured process of making sense of the harm. We argue that SnuggleSense fosters community awareness, cultivates a supportive survivor network, and promotes a restorative justice-oriented approach toward restoration and healing. We also discuss design insights, such as tailoring informational support and providing guidance while preserving survivors' agency.

Authors:Yoseph Berhanu Alebachew, Chris Brown
Title: Automatic Bias Detection in Source Code Review
Abstract:
Bias is an inherent threat to human decision-making, including in decisions made during software development. Extensive research has demonstrated the presence of biases at various stages of the software development life-cycle. Notably, code reviews are highly susceptible to prejudice-induced biases, and individuals are often unaware of these biases as they occur. Developing methods to automatically detect these biases is crucial for addressing the associated challenges. Recent advancements in visual data analytics have shown promising results in detecting potential biases by analyzing user interaction patterns. In this project, we propose a controlled experiment to extend this approach to detect potentially biased outcomes in code reviews by observing how reviewers interact with the code. We employ the "spotlight model of attention", a cognitive framework where a reviewer's gaze is tracked to determine their focus areas on the review screen. This focus, identified through gaze tracking, serves as an indicator of the reviewer's areas of interest or concern. We plan to analyze the sequence of gaze focus using advanced sequence modeling techniques, including Markov Models, Recurrent Neural Networks (RNNs), and Conditional Random Fields (CRF). These techniques will help us identify patterns that may suggest biased interactions. We anticipate that the ability to automatically detect potentially biased interactions in code reviews will significantly reduce unnecessary push-backs, enhance operational efficiency, and foster greater diversity and inclusion in software development. This approach not only helps in identifying biases but also in creating a more equitable development environment by mitigating these biases effectively

Authors:Prashant Garg, Thiemo Fetzer
Title: Artificial Intelligence health advice accuracy varies across languages and contexts
Abstract:
Using basic health statements authorized by UK and EU registers and 9,100 journalist-vetted public-health assertions on topics such as abortion, COVID-19 and politics from sources ranging from peer-reviewed journals and government advisories to social media and news across the political spectrum, we benchmark six leading large language models from in 21 languages, finding that, despite high accuracy on English-centric textbook claims, performance falls in multiple non-European languages and fluctuates by topic and source, highlighting the urgency of comprehensive multilingual, domain-aware validation before deploying AI in global health communication.

Authors:Darcy Kim, Aida Kalender, Sennay Ghebreab, Giovanni Sileno
Title: The Cloud Weaving Model for AI development
Abstract:
While analysing challenges in pilot projects developing AI with marginalized communities, we found it difficult to express them within commonly used paradigms. We therefore constructed an alternative conceptual framework to ground AI development in the social fabric -- the Cloud Weaving Model -- inspired (amongst others) by indigenous knowledge, motifs from nature, and Eastern traditions. This paper introduces and elaborates on the fundamental elements of the model (clouds, spiders, threads, spiderwebs, and weather) and their interpretation in an AI context. The framework is then applied to comprehend patterns observed in co-creation pilots approaching marginalized communities, highlighting neglected yet relevant dimensions for responsible AI development.

Authors:Tadashi Okoshi, Zexiong Gao, Tan Yi Zhen, Takumi Karasawa, Takeshi Miki, Wataru Sasaki, Rajesh K. Balan
Title: Cyberoception: Finding a Painlessly-Measurable New Sense in the Cyberworld Towards Emotion-Awareness in Computing
Abstract:
In Affective computing, recognizing users' emotions accurately is the basis of affective human-computer interaction. Understanding users' interoception contributes to a better understanding of individually different emotional abilities, which is essential for achieving inter-individually accurate emotion estimation. However, existing interoception measurement methods, such as the heart rate discrimination task, have several limitations, including their dependence on a well-controlled laboratory environment and precision apparatus, making monitoring users' interoception challenging. This study aims to determine other forms of data that can explain users' interoceptive or similar states in their real-world lives and propose a novel hypothetical concept "cyberoception," a new sense (1) which has properties similar to interoception in terms of the correlation with other emotion-related abilities, and (2) which can be measured only by the sensors embedded inside commodity smartphone devices in users' daily lives. Results from a 10-day-long in-lab/in-the-wild hybrid experiment reveal a specific cyberoception type "Turn On" (users' subjective sensory perception about the frequency of turning-on behavior on their smartphones), significantly related to participants' emotional valence. We anticipate that cyberoception to serve as a fundamental building block for developing more "emotion-aware", user-friendly applications and services.

Authors:Ali Arya, Anthony Scavarelli, Dan Hawes, Luciara Nardon
Title: VR-based Intervention for Perspective Change: A Case to Investigate Virtual Materiality
Abstract:
This paper addresses the concept of materiality in virtual environments, which we define as being composed of objects that can influence user experience actively. Such virtual materiality is closely related to its physical counterpart, which is discussed in theoretical frameworks such as sociomateriality and actor-network theory. They define phenomena in terms of the entanglement of human and non-human elements. We report on an early investigation of virtual materiality within the context of reflection and perspective change in nature-based virtual environments. We considered the case of university students reflecting on the planning and management of their theses and major projects. Inspired by nature's known positive cognitive and affective effects and repeated questioning processes, we established a virtual reflection intervention to demonstrate the environmental mechanisms and material characteristics relevant to virtual materiality. Our work is a preliminary step toward understanding virtual materiality and its implications for research and the design of virtual environments.

Authors:Wolfgang Büschel, Gabriela Molina León, Arnaud Prouzeau, Mahmood Jasim, Christophe Hurter, Maxime Cordeil, Matthew Brehmer
Title: The 2nd MERCADO Workshop at IEEE VIS 2025: Multimodal Experiences for Remote Communication Around Data Online
Abstract:
We propose a half-day workshop at IEEE VIS 2025 on addressing the emerging challenges in data-rich multimodal remote collaboration. We focus on synchronous, remote, and hybrid settings where people take part in tasks such as data analysis, decision-making, and presentation. With this workshop, we continue successful prior work from the first MERCADO workshop at VIS 2023 and a 2024 Shonan Seminar that followed. Based on the findings of the earlier events, we invite research and ideas related to four themes of challenges: Tools & Technologies, Individual Differences & Interpersonal Dynamics, AI-assisted Collaboration, and Evaluation. With this workshop, we aim to broaden the community, foster new collaborations, and develop a research agenda to address these challenges in future research. Our planned workshop format is comprised of a keynote, short presentations, a breakout group session, and discussions organized around the identified challenges.

Authors:Anjali Khurana, Xiaotian Su, April Yi Wang, Parmit K Chilana
Title: Do It For Me vs. Do It With Me: Investigating User Perceptions of Different Paradigms of Automation in Copilots for Feature-Rich Software
Abstract:
Large Language Model (LLM)-based in-application assistants, or copilots, can automate software tasks, but users often prefer learning by doing, raising questions about the optimal level of automation for an effective user experience. We investigated two automation paradigms by designing and implementing a fully automated copilot (AutoCopilot) and a semi-automated copilot (GuidedCopilot) that automates trivial steps while offering step-by-step visual guidance. In a user study (N=20) across data analysis and visual design tasks, GuidedCopilot outperformed AutoCopilot in user control, software utility, and learnability, especially for exploratory and creative tasks, while AutoCopilot saved time for simpler visual tasks. A follow-up design exploration (N=10) enhanced GuidedCopilot with task-and state-aware features, including in-context preview clips and adaptive instructions. Our findings highlight the critical role of user control and tailored guidance in designing the next generation of copilots that enhance productivity, support diverse skill levels, and foster deeper software engagement.

Authors:Thomas Kosch, Sebastian Feger
Title: Prompt-Hacking: The New p-Hacking?
Abstract:
As Large Language Models (LLMs) become increasingly embedded in empirical research workflows, their use as analytical tools for quantitative or qualitative data raises pressing concerns for scientific integrity. This opinion paper draws a parallel between "prompt-hacking", the strategic tweaking of prompts to elicit desirable outputs from LLMs, and the well-documented practice of "p-hacking" in statistical analysis. We argue that the inherent biases, non-determinism, and opacity of LLMs make them unsuitable for data analysis tasks demanding rigor, impartiality, and reproducibility. We emphasize how researchers may inadvertently, or even deliberately, adjust prompts to confirm hypotheses while undermining research validity. We advocate for a critical view of using LLMs in research, transparent prompt documentation, and clear standards for when LLM use is appropriate. We discuss how LLMs can replace traditional analytical methods, whereas we recommend that LLMs should only be used with caution, oversight, and justification.

Authors:Spencer Lin, Miru Jun, Basem Rizk, Karen Shieh, Scott Fisher, Sharon Mozgai
Title: Optimizing SIA Development: A Case Study in User-Centered Design for Estuary, a Multimodal Socially Interactive Agent Framework
Abstract:
This case study presents our user-centered design model for Socially Intelligent Agent (SIA) development frameworks through our experience developing Estuary, an open source multimodal framework for building low-latency real-time socially interactive agents. We leverage the Rapid Assessment Process (RAP) to collect the thoughts of leading researchers in the field of SIAs regarding the current state of the art for SIA development as well as their evaluation of how well Estuary may potentially address current research gaps. We achieve this through a series of end-user interviews conducted by a fellow researcher in the community. We hope that the findings of our work will not only assist the continued development of Estuary but also guide the development of other future frameworks and technologies for SIAs.

Authors:Nimisha Karnatak, Adrien Baranes, Rob Marchant, Huinan Zeng, Tríona Butler, Kristen Olson
Title: Expanding the Generative AI Design Space through Structured Prompting and Multimodal Interfaces
Abstract:
Text-based prompting remains the predominant interaction paradigm in generative AI, yet it often introduces friction for novice users such as small business owners (SBOs), who struggle to articulate creative goals in domain-specific contexts like advertising. Through a formative study with six SBOs in the United Kingdom, we identify three key challenges: difficulties in expressing brand intuition through prompts, limited opportunities for fine-grained adjustment and refinement during and after content generation, and the frequent production of generic content that lacks brand specificity. In response, we present ACAI (AI Co-Creation for Advertising and Inspiration), a multimodal generative AI tool designed to support novice designers by moving beyond traditional prompt interfaces. ACAI features a structured input system composed of three panels: Branding, Audience and Goals, and the Inspiration Board. These inputs allow users to convey brand-relevant context and visual preferences. This work contributes to HCI research on generative systems by showing how structured interfaces can foreground user-defined context, improve alignment, and enhance co-creative control in novice creative workflows.

Authors:Matt I. B. Oddo, Ryan Smith, Stephen Kobourov, Tamara Munzner
Title: Visualization Tasks for Unlabelled Graphs
Abstract:
We investigate tasks that can be accomplished with unlabelled graphs, where nodes do not have persistent or semantically meaningful labels. New techniques to visualize these graphs have been proposed, but more understanding of unlabelled graph tasks is required before they can be adequately evaluated. Some tasks apply to both labelled and unlabelled graphs, but many do not translate between these contexts. We propose a taxonomy of unlabelled graph abstract tasks, organized according to the Scope of the data at play, the Action intended by the user, and the Target data under consideration. We show the descriptive power of this task abstraction by connecting to concrete examples from previous frameworks, and connect these abstractions to real-world problems. To showcase the evaluative power of the taxonomy, we perform a preliminary assessment of 6 visualizations for each task. For each combination of task and visual encoding, we consider the effort required from viewers, the likelihood of task success, and how both factors vary between small-scale and large-scale graphs.

Authors:Mark Steyvers, Megan A. K. Peters
Title: Metacognition and Uncertainty Communication in Humans and Large Language Models
Abstract:
Metacognition--the capacity to monitor and evaluate one's own knowledge and performance--is foundational to human decision-making, learning, and communication. As large language models (LLMs) become increasingly embedded in both high-stakes and widespread low-stakes contexts, it is important to assess whether, how, and to what extent they exhibit metacognitive abilities. Here, we provide an overview of current knowledge of LLMs' metacognitive capacities, how they might be studied, and how they relate to our knowledge of metacognition in humans. We show that while humans and LLMs can sometimes appear quite aligned in their metacognitive capacities and behaviors, it is clear many differences remain; attending to these differences is important for enhancing human-AI collaboration. Finally, we discuss how endowing future LLMs with more sensitive and more calibrated metacognition may also help them develop new capacities such as more efficient learning, self-direction, and curiosity.

Authors:Rodrigo Simões, Fernando Brito e Abreu, Adriano Lopes
Title: Plataforma para visualização geo-temporal de apinhamento turístico
Abstract:
Tourist crowding degrades the visitor experience and negatively impacts the environment and the local population, potentially making tourism in popular destinations unsustainable. This motivated us to develop, within the framework of the European RESETTING project related to the digital transformation of tourism, a platform to visualize this crowding, exploring historical data, detecting patterns and trends and predicting future events. The ultimate goal is to support short- and medium-term decision-making to mitigate the phenomenon. To this end, the platform takes into account the carrying capacity of the target sites when calculating crowding density. The integration of data from different sources is achieved with an extensible, connector-based architecture. Three scenarios for using the platform are described, relating to major annual crowding events. Two of them, in the municipality of Lisbon, are based on data from a mobile network provided by the LxDataLab initiative. The third, in Melbourne, Australia, using public data from a network of movement sensors called the Pedestrian Counting System. An experiment to evaluate the usability of the proposed platform using NASA-TLX is also described. -- -- O apinhamento turístico degrada a experiência dos visitantes e impacta negativamente o ambiente e a população local, podendo tornar insustentável o turismo em destinos populares. Isto motivou-nos a desenvolver, no âmbito do projeto europeu RESETTING relacionado com a transformação digital do turismo, uma plataforma para visualizar este apinhamento, explorando dados históricos, detetando padrões e tendências e prevendo eventos futuros. O objetivo final é apoiar a tomada de decisão, a curto e médio prazo, para mitigar o fenómeno. Para tal, a plataforma considera a capacidade de carga dos locais alvo no cálculo da densidade de apinhamento. A integração de dados de diversas fontes é conseguida com uma arquitetura extensível, à base de conetores. São descritos três cenários de utilização da plataforma, relativos a eventos anuais de grande apinhamento. Dois deles, no município de Lisboa, baseados em dados de uma rede móvel disponibilizados pela iniciativa LxDataLab. O terceiro, em Melbourne na Austrália, utilizando dados públicos de uma rede de sensores de movimento designada de Pedestrian Counting System. É ainda descrita uma experiência de avaliação da usabilidade da plataforma proposta, usando o NASA-TLX.

Authors:Michal Robert Žák, Moritz Grosse-Wentrup
Title: Auditory Conversational BAI: A Feasibility Study
Abstract:
We introduce a novel auditory brain-computer interface (BCI) paradigm, Auditory Intention Decoding (AID), designed to enhance communication capabilities within the brain-AI interface (BAI) system EEGChat. AID enables users to select among multiple auditory options (intentions) by analyzing their brain responses, offering a pathway to construct a communication system that requires neither muscle movement nor syntactic formation. To evaluate the feasibility of this paradigm, we conducted a proof-of-concept study. The results demonstrated statistically significant decoding performance, validating the approach's potential. Despite these promising findings, further optimization is required to enhance system performance and realize the paradigm's practical application.

Authors:Chameera De Silva, Thilina Halloluwa, Dhaval Vyas
Title: A Multi-Layered Research Framework for Human-Centered AI: Defining the Path to Explainability and Trust
Abstract:
The integration of Artificial Intelligence (AI) into high-stakes domains such as healthcare, finance, and autonomous systems is often constrained by concerns over transparency, interpretability, and trust. While Human-Centered AI (HCAI) emphasizes alignment with human values, Explainable AI (XAI) enhances transparency by making AI decisions more understandable. However, the lack of a unified approach limits AI's effectiveness in critical decision-making scenarios. This paper presents a novel three-layered framework that bridges HCAI and XAI to establish a structured explainability paradigm. The framework comprises (1) a foundational AI model with built-in explainability mechanisms, (2) a human-centered explanation layer that tailors explanations based on cognitive load and user expertise, and (3) a dynamic feedback loop that refines explanations through real-time user interaction. The framework is evaluated across healthcare, finance, and software development, demonstrating its potential to enhance decision-making, regulatory compliance, and public trust. Our findings advance Human-Centered Explainable AI (HCXAI), fostering AI systems that are transparent, adaptable, and ethically aligned.

Authors:Wasim Abbas, Hafiz Syed Muhammad Bilal, Asim Abbas, Muhammad Afzal, Je-Hoon Lee
Title: Mobile-Driven Incentive Based Exercise for Blood Glucose Control in Type 2 Diabetes
Abstract:
We propose and create an incentive based recommendation algorithm aimed at improving the lifestyle of diabetic patients. This algorithm is integrated into a real world mobile application to provide personalized health recommendations. Initially, users enter data such as step count, calorie intake, gender, age, weight, height and blood glucose levels. When the data is preprocessed, the app identifies the personalized health and glucose management goals. The recommendation engine suggests exercise routines and dietary adjustments based on these goals. As users achieve their goals and follow these recommendations, they receive incentives, encouraging adherence and promoting positive health outcomes. Furthermore, the mobile application allows users to monitor their progress through descriptive analytics, which displays their daily activities and health metrics in graphical form. To evaluate the proposed methodology, the study was conducted with 10 participants, with type 2 diabetes for three weeks. The participants were recruited through advertisements and health expert references. The application was installed on the patient phone to use it for three weeks. The expert was also a part of this study by monitoring the patient health record. To assess the algorithm performance, we computed efficiency and proficiency. As a result, the algorithm showed proficiency and efficiency scores of 90% and 92%, respectively. Similarly, we computed user experience with application in terms of attractiveness, hedonic and pragmatic quality, involving 35 people in the study. As a result, it indicated an overall positive user response. The findings show a clear positive correlation between exercise and rewards, with noticeable improvements observed in user outcomes after exercise.

Authors:Massimiliano Nigro, Andrea Righini, Micol Spitale
Title: Exploring the Use of Social Robots to Prepare Children for Radiological Procedures: A Focus Group Study
Abstract:
When children are anxious or scared, it can be hard for them to stay still or follow instructions during medical procedures, making the process more challenging and affecting procedure results. This is particularly true for radiological procedures, where long scan times, confined spaces, and loud noises can cause children to move, significantly impacting scan quality. To this end, sometimes children are sedated, but doctors are constantly seeking alternative non-pharmacological solutions. This work aims to explore how social robots could assist in preparing children for radiological procedures. We have conducted a focus group discussion with five hospital stakeholders, namely radiographers, paediatricians, and clinical engineers, to explore (i) the context regarding children's preparation for radiological procedures, hence their needs and how children are currently prepared, and (ii) the potential role of social robots in this process. The discussion was transcribed and analysed using thematic analysis. Among our findings, we identified three potential roles for a social robot in this preparation process: offering infotainment in the waiting room, acting as a guide within the hospital, and assisting radiographers in preparing children for the procedure. We hope that insights from this study will inform the design of social robots for pediatric healthcare.

Authors:Zehan Li, Jinzhi Deng, Haibing Ma, Chi Zhang, Dan Xiao
Title: Translating Multimodal AI into Real-World Inspection: TEMAI Evaluation Framework and Pathways for Implementation
Abstract:
This paper introduces the Translational Evaluation of Multimodal AI for Inspection (TEMAI) framework, bridging multimodal AI capabilities with industrial inspection implementation. Adapting translational research principles from healthcare to industrial contexts, TEMAI establishes three core dimensions: Capability (technical feasibility), Adoption (organizational readiness), and Utility (value realization). The framework demonstrates that technical capability alone yields limited value without corresponding adoption mechanisms. TEMAI incorporates specialized metrics including the Value Density Coefficient and structured implementation pathways. Empirical validation through retail and photovoltaic inspection implementations revealed significant differences in value realization patterns despite similar capability reduction rates, confirming the framework's effectiveness across diverse industrial sectors while highlighting the importance of industry-specific adaptation strategies.

Authors:Ozan Balci, Stien Poncelet, Alex Binh Vinh Duc Nguyen, Andrew Vande Moere
Title: Manifesting Architectural Subspaces with Two Mobile Robotic Partitions to Facilitate Spontaneous Office Meetings
Abstract:
Although intended to foster spontaneous interactions among workers, a typical open-plan office layout cannot mitigate visual, acoustic, or privacy-related distractions that originate from unplanned meetings. As office workers often refrain from tackling these issues by manually demarcating or physically relocating to a more suitable subspace that is enclosed by movable partitions, we hypothesise that these subspaces could instead be robotically manifested. This study therefore evaluated the perceived impact of two mobile robotic partitions that were wizarded to jointly manifest an enclosed subspace, to: 1) either `mitigate' or `intervene' in the distractions caused by spontaneous face-to-face or remote meetings; or 2) either `gesturally' or `spatially' nudge a distraction-causing worker to relocate. Our findings suggest how robotic furniture should interact with office workers with and through transient space, and autonomously balance the distractions not only for each individual worker but also for multiple workers sharing the same workspace.

Authors:Aleksa Marusic, Sao Mai Nguyen, Adriana Tapus
Title: Skeleton-Based Transformer for Classification of Errors and Better Feedback in Low Back Pain Physical Rehabilitation Exercises
Abstract:
Physical rehabilitation exercises suggested by healthcare professionals can help recovery from various musculoskeletal disorders and prevent re-injury. However, patients' engagement tends to decrease over time without direct supervision, which is why there is a need for an automated monitoring system. In recent years, there has been great progress in quality assessment of physical rehabilitation exercises. Most of them only provide a binary classification if the performance is correct or incorrect, and a few provide a continuous score. This information is not sufficient for patients to improve their performance. In this work, we propose an algorithm for error classification of rehabilitation exercises, thus making the first step toward more detailed feedback to patients. We focus on skeleton-based exercise assessment, which utilizes human pose estimation to evaluate motion. Inspired by recent algorithms for quality assessment during rehabilitation exercises, we propose a Transformer-based model for the described classification. Our model is inspired by the HyperFormer method for human action recognition, and adapted to our problem and dataset. The evaluation is done on the KERAAL dataset, as it is the only medical dataset with clear error labels for the exercises, and our model significantly surpasses state-of-the-art methods. Furthermore, we bridge the gap towards better feedback to the patients by presenting a way to calculate the importance of joints for each exercise.

Authors:Louise Robert, Laurine Moniez, Quentin Luzurier, David Morquin
Title: Participatory Design of EHR Components: Crafting Novel Relational Spaces for IT Specialists and Hospital Staff to Cooperate
Abstract:
Introduced in the early 2010s, Electronic Health Records (EHRs) have become ubiquitous in hospitals. Despite clear benefits, they remain unpopular among healthcare professionals and present significant challenges. Positioned at the intersection of Health Information Systems studies, Computer Supported Collaborative Work (CSCW), Service Design, and Participatory Design (PD), our research investigates how involving users in the co-design of new EHR components within a dedicated hospital space can transform healthcare practices. Through participatory co-design methodologies, including ethnographic observation, collaborative workshops, and realistic simulations, we identify the material and interactional elements essential for rebalancing power dynamics between users and designers. This project contributes to rethinking traditional EHR design approaches, embedding design practice into systemic transformation to genuinely meet healthcare professionals' needs.

Authors:Jessica Szczuka, Lisa Mühl, Paula Ebner, Simon Dubé
Title: 10 Questions to Fall in Love with ChatGPT: An Experimental Study on Interpersonal Closeness with Large Language Models (LLMs)
Abstract:
Large language models (LLMs), like ChatGPT, are capable of computing affectionately nuanced text that therefore can shape online interactions, including dating. This study explores how individuals experience closeness and romantic interest in dating profiles, depending on whether they believe the profiles are human- or AI-generated. In a matchmaking scenario, 307 participants rated 10 responses to the Interpersonal Closeness Generating Task, unaware that all were LLM-generated. Surprisingly, perceived source (human or AI) had no significant impact on closeness or romantic interest. Instead, perceived quality and human-likeness of responses shaped reactions. The results challenge current theoretical frameworks for human-machine communication and raise critical questions about the importance of authenticity in affective online communication.

Authors:Yujie Huang, Audrey Crozet, Toinon Vigier, Alexandre Bruckert, Patrick Le Callet, Pierre Lebranchu
Title: Orientation and mobility test in virtual reality, a tool for quantitative assessment of functional vision: dataset and evaluation in healthy subjects
Abstract:
The purpose of this study was to develop and evaluate a novel virtual reality seated orientation and mobility (VR-S-O&M) test protocol designed to assess functional vision. This study aims to provide a dataset of healthy subjects using this protocol and preliminary analyses. We introduced a VR-based O&M test protocol featuring a novel seated displacement method, diverse lighting conditions, and varying course configurations within a virtual environment. Normally sighted participants (N=42) completed the test, which required them to navigate a path and destroy identified obstacles. We assessed basic performance metrics, including time duration, number of missed objects, and time before the first step, under different environmental conditions to verify ecological validity. Additionally, we analyzed participants' behaviors regarding missed objects, demonstrating the potential of integrating behavioral and interactive data for a more precise functional vision assessment. Our VR-S-O&M test protocol, along with the first O&M behavior dataset, presents significant opportunities for developing more refined performance metrics for assessing functional vision and enhancing the quality of life.

Authors:Ken Jen Lee, PiaoHong Wang, Zhicong Lu
Title: "Can't believe I'm crying over an anime girl": Public Parasocial Grieving and Coping Towards VTuber Graduation and Termination
Abstract:
Despite the significant increase in popularity of Virtual YouTubers (VTubers), research on the unique dynamics of viewer-VTuber parasocial relationships is nascent. This work investigates how English-speaking viewers grieved VTubers whose identities are no longer used, an interesting context as the nakanohito (i.e., the person behind the VTuber identity) is usually alive post-retirement and might "reincarnate" as another VTuber. We propose a typology for VTuber retirements and analyzed 13,655 Reddit posts and comments spanning nearly three years using mixed-methods. Findings include how viewers coped using methods similar to when losing loved ones, alongside novel coping methods reflecting different attachment styles. Although emotions like sadness, shock, concern, disapproval, confusion, and love decreased with time, regret and loyalty showed opposite trends. Furthermore, viewers' reactions situated a VTuber identity within a community of content creators and viewers. We also discuss design implications alongside implications on the VTuber ecosystem and future research directions.

Authors:Alison Crosby, MJ Johns, Katherine Isbister, Sri Kurniawan
Title: Utilizing Virtual Reality for Wildfire Evacuation Training
Abstract:
The risk of loss of lives and property damage has increased all around the world in recent years as wildfire seasons have become longer and fires have become larger. Knowing how to prepare and evacuate safely is critical, yet it may be daunting for those who have never experienced a wildfire threat before. This paper considers the potential for utilizing virtual reality (VR) technology to prepare people for an evacuation scenario. We discuss the unique affordances of VR for this type of work, as well as the initial steps in creating a training simulation. We also explore the next steps for what a tool like this may mean for the future of evacuation preparedness training.

Authors:Sabrina Haque, Christoph Csallner
Title: Early Accessibility: Automating Alt-Text Generation for UI Icons During App Development
Abstract:
Alt-text is essential for mobile app accessibility, yet UI icons often lack meaningful descriptions, limiting accessibility for screen reader users. Existing approaches either require extensive labeled datasets, struggle with partial UI contexts, or operate post-development, increasing technical debt. We first conduct a formative study to determine when and how developers prefer to generate icon alt-text. We then explore the ALTICON approach for generating alt-text for UI icons during development using two fine-tuned models: a text-only large language model that processes extracted UI metadata and a multi-modal model that jointly analyzes icon images and textual context. To improve accuracy, the method extracts relevant UI information from the DOM tree, retrieves in-icon text via OCR, and applies structured prompts for alt-text generation. Our empirical evaluation with the most closely related deep-learning and vision-language models shows that ALTICON generates alt-text that is of higher quality while not requiring a full-screen input.

Authors:Jiwon Chun, Yankun Zhao, Hanlin Chen, Meng Xia
Title: PlanGlow: Personalized Study Planning with an Explainable and Controllable LLM-Driven System
Abstract:
Personal development through self-directed learning is essential in today's fast-changing world, but many learners struggle to manage it effectively. While AI tools like large language models (LLMs) have the potential for personalized learning planning, they face issues such as transparency and hallucinated information. To address this, we propose PlanGlow, an LLM-based system that generates personalized, well-structured study plans with clear explanations and controllability through user-centered interactions. Through mixed methods, we surveyed 28 participants and interviewed 10 before development, followed by a within-subject experiment with 24 participants to evaluate PlanGlow's performance, usability, controllability, and explainability against two baseline systems: a GPT-4o-based system and Khan Academy's Khanmigo. Results demonstrate that PlanGlow significantly improves usability, explainability, and controllability. Additionally, two educational experts assessed and confirmed the quality of the generated study plans. These findings highlight PlanGlow's potential to enhance personalized learning and address key challenges in self-directed learning.

Authors:Elise Paradis, Ambar Murillo, Maulishree Pandey, Sarah D'Angelo, Matthew Hughes, Andrew Macvean, Ben Ferrari-Church
Title: Creating benchmarkable components to measure the quality ofAI-enhanced developer tools
Abstract:
In the AI community, benchmarks to evaluate model quality are well established, but an equivalent approach to benchmarking products built upon generative AI models is still missing. This has had two consequences. First, it has made teams focus on model quality over the developer experience, while successful products combine both. Second, product team have struggled to answer questions about their products in relation to their competitors. In this case study, we share: (1) our process to create robust, enterprise-grade and modular components to support the benchmarking of the developer experience (DX) dimensions of our team's AI for code offerings, and (2) the components we have created to do so, including demographics and attitudes towards AI surveys, a benchmarkable task, and task and feature surveys. By doing so, we hope to lower the barrier to the DX benchmarking of genAI-enhanced code products.

Authors:Naoto Nishida, Jun Rekimoto
Title: SUMART: SUMmARizing Translation from Wordy to Concise Expression
Abstract:
We propose SUMART, a method for summarizing and compressing the volume of verbose subtitle translations. SUMART is designed for understanding translated captions (e.g., interlingual conversations via subtitle translation or when watching movies in foreign language audio and translated captions). SUMART is intended for users who want a big-picture and fast understanding of the conversation, audio, video content, and speech in a foreign language. During the training data collection, when a speaker makes a verbose statement, SUMART employs a large language model on-site to compress the volume of subtitles. This compressed data is then stored in a database for fine-tuning purposes. Later, SUMART uses data pairs from those non-compressed ASR results and compressed translated results for fine-tuning the translation model to generate more concise translations for practical uses. In practical applications, SUMART utilizes this trained model to produce concise translation results. Furthermore, as a practical application, we developed an application that allows conversations using subtitle translation in augmented reality spaces. As a pilot study, we conducted qualitative surveys using a SUMART prototype and a survey on the summarization model for SUMART. We envision the most effective use case of this system is where users need to consume a lot of information quickly (e.g., Speech, lectures, podcasts, Q&A in conferences).

Authors:Naoto Nishida, Yoshio Ishiguro, Jun Rekiomto, Naomi Yamashita
Title: Dynamik: Syntactically-Driven Dynamic Font Sizing for Emphasis of Key Information
Abstract:
In today's globalized world, there are increasing opportunities for individuals to communicate using a common non-native language (lingua franca). Non-native speakers often have opportunities to listen to foreign languages, but may not comprehend them as fully as native speakers do. To aid real-time comprehension, live transcription of subtitles is frequently used in everyday life (e.g., during Zoom conversations, watching YouTube videos, or on social networking sites). However, simultaneously reading subtitles while listening can increase cognitive load. In this study, we propose Dynamik, a system that reduces cognitive load during reading by decreasing the size of less important words and enlarging important ones, thereby enhancing sentence contrast. Our results indicate that Dynamik can reduce certain aspects of cognitive load, specifically, participants' perceived performance and effort among individuals with low proficiency in English, as well as enhance the users' sense of comprehension, especially among people with low English ability. We further discuss our methods' applicability to other languages and potential improvements and further research directions.

Authors:Siyuan Kan, Huanyu Wu, Zhenyao Cui, Fan Huang, Xiaolong Xu, Dongrui Wu
Title: CMCRD: Cross-Modal Contrastive Representation Distillation for Emotion Recognition
Abstract:
Emotion recognition is an important component of affective computing, and also human-machine interaction. Unimodal emotion recognition is convenient, but the accuracy may not be high enough; on the contrary, multi-modal emotion recognition may be more accurate, but it also increases the complexity and cost of the data collection system. This paper considers cross-modal emotion recognition, i.e., using both electroencephalography (EEG) and eye movement in training, but only EEG or eye movement in test. We propose cross-modal contrastive representation distillation (CMCRD), which uses a pre-trained eye movement classification model to assist the training of an EEG classification model, improving feature extraction from EEG signals, or vice versa. During test, only EEG signals (or eye movement signals) are acquired, eliminating the need for multi-modal data. CMCRD not only improves the emotion recognition accuracy, but also makes the system more simplified and practical. Experiments using three different neural network architectures on three multi-modal emotion recognition datasets demonstrated the effectiveness of CMCRD. Compared with the EEG-only model, it improved the average classification accuracy by about 6.2%.

Authors:Kai Lukoff, Xinqi Zhang
Title: Community Empowerment through Location-Based AR: The Thámien Ohlone AR Tour
Abstract:
Community empowerment is the process of enabling communities to increase control over their narratives, resources, and futures. In HCI and design, this social challenge centers on helping marginalized groups gain agency through technology and design interventions. For Indigenous communities in particular, empowerment means not only representation but sovereignty in how their stories are told and by whom. Location-based augmented reality (AR) offers a novel opportunity to address this challenge. By overlaying digital content onto physical places, AR can spatially anchor community narratives in the real world, allowing communities to re-tell the story of a place on their own terms. Such site-specific AR experiences have already been used to reveal hidden histories, re-imagine colonial monuments, and celebrate minority cultures. The affordances of XR - particularly ARś spatial interaction and immersive storytelling - make it a promising tool for cultural continuity and community activism. In this position paper, we focus on how these XR affordances can empower communities, using the Thámien Ohlone AR Tour as a case study. We outline why traditional digital interventions fall short of true empowerment, how AR's immersive qualities uniquely support Indigenous self-determination, insights from co-designing the Ohlone AR Tour, and future directions to scale such efforts responsibly.

Authors:Terrence Neumann, Maria De-Arteaga, Sina Fazelpour
Title: Should you use LLMs to simulate opinions? Quality checks for early-stage deliberation
Abstract:
The emergent capabilities of large language models (LLMs) have prompted interest in using them as surrogates for human subjects in opinion surveys. However, prior evaluations of LLM-based opinion simulation have relied heavily on costly, domain-specific survey data, and mixed empirical results leave their reliability in question. To enable cost-effective, early-stage evaluation, we introduce a quality control assessment designed to test the viability of LLM-simulated opinions on Likert-scale tasks without requiring large-scale human data for validation. This assessment comprises two key tests: \emph{logical consistency} and \emph{alignment with stakeholder expectations}, offering a low-cost, domain-adaptable validation tool. We apply our quality control assessment to an opinion simulation task relevant to AI-assisted content moderation and fact-checking workflows -- a socially impactful use case -- and evaluate seven LLMs using a baseline prompt engineering method (backstory prompting), as well as fine-tuning and in-context learning variants. None of the models or methods pass the full assessment, revealing several failure modes. We conclude with a discussion of the risk management implications and release \texttt{TopicMisinfo}, a benchmark dataset with paired human and LLM annotations simulated by various models and approaches, to support future research.

Authors:Xue Yuan, Keren Shi, Ning Jiang, Jiayuan He
Title: PlugSelect: Pruning Channels with Plug-and-Play Flexibility for Electroencephalography-based Brain Computer Interface
Abstract:
Automatic minimization and optimization of the number of the electrodes is essential for the practical application of electroencephalography (EEG)-based brain computer interface (BCI). Previous methods typically require additional training costs or rely on prior knowledge assumptions. This study proposed a novel channel pruning model, plug-and-select (PlugSelect), applicable across a broad range of BCI paradigms with no additional training cost and plug-and-play functionality. It integrates gradients along the input path to globally infer the causal relationships between input channels and outputs, and ranks the contribution sequences to identify the most highly attributed channels. The results showed that for three BCI paradigms, i.e., auditory attention decoding (AAD), motor imagery (MI), affective computation (AC), PlugSelect could reduce the number of channels by at least half while effectively maintaining decoding performance and improving efficiency. The outcome benefits the design of wearable EEG-based devices, facilitating the practical application of BCI technology.

Authors:Jiarui Guan, Ruishi Zou, Jiajun Zhang, Kimpan Xin, Bingsu He, Zhuhe Zhang, Chen Ye
Title: Designing Human-AI System for Legal Research: A Case Study of Precedent Search in Chinese Law
Abstract:
Recent advancements in AI technology have seen researchers and industry professionals actively exploring the application of AI tools in legal workflows. Despite this prevailing trend, legal practitioners found that AI tools had limited effectiveness in supporting everyday tasks, which can be partly attributed to their design. Typically, AI legal tools only offer end-to-end interaction: practitioners can only manipulate the input and output but have no control over the intermediate steps, raising concerns about AI tools' performance and ethical use. To design an effective AI legal tool, as a first step, we explore users' needs with one specific use case: precedent search. Through a qualitative study with five legal practitioners, we uncovered the precedent search workflow, the challenges they face using current systems, and their concerns and expectations regarding AI tools. We conclude our exploration with an initial prototype to reflect the design implications derived from our findings.

Authors:Ashutosh Chaubey, Xulang Guan, Mohammad Soleymani
Title: Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning
Abstract:
The human face plays a central role in social communication, necessitating the use of performant computer vision tools for human-centered applications. We propose Face-LLaVA, a multimodal large language model for face-centered, in-context learning, including facial expression and attribute recognition. Additionally, Face-LLaVA is able to generate natural language descriptions that can be used for reasoning. Leveraging existing visual databases, we first developed FaceInstruct-1M, a face-centered database for instruction tuning MLLMs for face processing. We then developed a novel face-specific visual encoder powered by Face-Region Guided Cross-Attention that integrates face geometry with local visual features. We evaluated the proposed method across nine different datasets and five different face processing tasks, including facial expression recognition, action unit detection, facial attribute detection, age estimation and deepfake detection. Face-LLaVA achieves superior results compared to existing open-source MLLMs and competitive performance compared to commercial solutions. Our model output also receives a higher reasoning rating by GPT under a zero-shot setting across all the tasks. Both our dataset and model wil be released at https://face-llava.github.io to support future advancements in social AI and foundational vision-language research.

Authors:Leon Reicherts, Zelun Tony Zhang, Elisabeth von Oswald, Yuanting Liu, Yvonne Rogers, Mariam Hassib
Title: AI, Help Me Think$\unicode{x2014}$but for Myself: Assisting People in Complex Decision-Making by Providing Different Kinds of Cognitive Support
Abstract:
How can we design AI tools that effectively support human decision-making by complementing and enhancing users' reasoning processes? Common recommendation-centric approaches face challenges such as inappropriate reliance or a lack of integration with users' decision-making processes. Here, we explore an alternative interaction model in which the AI outputs build upon users' own decision-making rationales. We compare this approach, which we call ExtendAI, with a recommendation-based AI. Participants in our mixed-methods user study interacted with both AIs as part of an investment decision-making task. We found that the AIs had different impacts, with ExtendAI integrating better into the decision-making process and people's own thinking and leading to slightly better outcomes. RecommendAI was able to provide more novel insights while requiring less cognitive effort. We discuss the implications of these and other findings along with three tensions of AI-assisted decision-making which our study revealed.

Authors:Abhinav Pathak, Kalaichelvi Venkatesan, Tarek Taha, Rajkumar Muthusamy
Title: A Multi-Modal Interaction Framework for Efficient Human-Robot Collaborative Shelf Picking
Abstract:
The growing presence of service robots in human-centric environments, such as warehouses, demands seamless and intuitive human-robot collaboration. In this paper, we propose a collaborative shelf-picking framework that combines multimodal interaction, physics-based reasoning, and task division for enhanced human-robot teamwork. The framework enables the robot to recognize human pointing gestures, interpret verbal cues and voice commands, and communicate through visual and auditory feedback. Moreover, it is powered by a Large Language Model (LLM) which utilizes Chain of Thought (CoT) and a physics-based simulation engine for safely retrieving cluttered stacks of boxes on shelves, relationship graph for sub-task generation, extraction sequence planning and decision making. Furthermore, we validate the framework through real-world shelf picking experiments such as 1) Gesture-Guided Box Extraction, 2) Collaborative Shelf Clearing and 3) Collaborative Stability Assistance.

Authors:Joseph Lindley, Roger Whitham
Title: Towards Holistic Prompt Craft
Abstract:
We present an account of an ongoing practice-based Design Research programme that explores the interaction affordances of real-time AI image generators. Based on our experiences from three installations, we reflect on the design of PromptJ, a user interface built around the concept of a prompt mixer. Our first contribution is a series of strong concepts based on our reflections of designing and deploying PromptJ. We cohere and abstract our strong concepts into the notion of Holistic Prompt Craft, which describes the importance of considering all relevant parameters concurrently. Finally, we present PromptTank, a prototype design which exemplifies the principles of Holistic Prompt Craft. Our contributions are articulated as strong concepts or intermediate knowledge that are intended to inform and inspire practitioners and researchers who are designing with image generation models or developing novel interaction paradigms for generative AI systems more generally.

Authors:Thijs Willems, Darion Jin Hotan, Jiawen Cheryl Tang, Norakmal Hakim bin Norhashim, King Wang Poon, Zi An Galvyn Goh, Radha Vinod
Title: Assessing employment and labour issues implicated by using AI
Abstract:
This chapter critiques the dominant reductionist approach in AI and work studies, which isolates tasks and skills as replaceable components. Instead, it advocates for a systemic perspective that emphasizes the interdependence of tasks, roles, and workplace contexts. Two complementary approaches are proposed: an ethnographic, context-rich method that highlights how AI reconfigures work environments and expertise; and a relational task-based analysis that bridges micro-level work descriptions with macro-level labor trends. The authors argue that effective AI impact assessments must go beyond predicting automation rates to include ethical, well-being, and expertise-related questions. Drawing on empirical case studies, they demonstrate how AI reshapes human-technology relations, professional roles, and tacit knowledge practices. The chapter concludes by calling for a human-centric, holistic framework that guides organizational and policy decisions, balancing technological possibilities with social desirability and sustainability of work.

Authors:Francesco Ricci, Amra Delić
Title: Widening the Role of Group Recommender Systems with CAJO
Abstract:
Group Recommender Systems (GRSs) have been studied and developed for more than twenty years. However, their application and usage has not grown. They can even be labeled as failures, if compared to the very successful and common recommender systems (RSs) used on all the major ecommerce and social platforms. As a result, the RSs that we all use now, are only targeted for individual users, aiming at choosing an item exclusively for themselves; no choice support is provided to groups trying to select a service, a product, an experience, a person, serving equally well all the group members. In this opinion article we discuss why the success of group recommender systems is lagging and we propose a research program unfolding on the analysis and development of new forms of collaboration between humans and intelligent systems. We define a set of roles, named CAJO, that GRSs should play in order to become more useful tools for group decision making.

Authors:Jian Zhang, Wafa Johal, Jarrod Knibbe
Title: Illusion Spaces in VR: The Interplay Between Size and Taper Angle Perception in Grasping
Abstract:
Leveraging the integration of visual and proprioceptive cues, research has uncovered various perception thresholds in VR that can be exploited to support haptic feedback for grasping. While previous studies have explored individual dimensions, such as size, the combined effect of multiple geometric properties on perceptual illusions remains poorly understood. We present a two-alternative forced choice study investigating the perceptual interplay between object size and taper angle. We introduce an illusion space model, providing detailed insights into how physical and virtual object configurations affect human perception. Our insights reveal how, for example, as virtual sizes increase, users perceive that taper angles increase, and as virtual angles decrease, users overestimate sizes. We provide a mathematical model of the illusion space, and an associated tool, which can be used as a guide for the design of future VR haptic devices and for proxy object selections.

Authors:Shiran Dudy, Thulasi Tholeti, Resmi Ramachandranpillai, Muhammad Ali, Toby Jia-Jun Li, Ricardo Baeza-Yates
Title: Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language Models
Abstract:
Recent advancements in Large Language Models (LLMs) have made them a popular information-seeking tool among end users. However, the statistical training methods for LLMs have raised concerns about their representation of under-represented topics, potentially leading to biases that could influence real-world decisions and opportunities. These biases could have significant economic, social, and cultural impacts as LLMs become more prevalent, whether through direct interactions--such as when users engage with chatbots or automated assistants--or through their integration into third-party applications (as agents), where the models influence decision-making processes and functionalities behind the scenes. Our study examines the biases present in LLMs recommendations of U.S. cities and towns across three domains: relocation, tourism, and starting a business. We explore two key research questions: (i) How similar LLMs responses are, and (ii) How this similarity might favor areas with certain characteristics over others, introducing biases. We focus on the consistency of LLMs responses and their tendency to over-represent or under-represent specific locations. Our findings point to consistent demographic biases in these recommendations, which could perpetuate a ``rich-get-richer'' effect that widens existing economic disparities.

Authors:Danial Amin, Joni Salminen, Farhan Ahmed, Sonja M. H. Tervola, Sankalp Sethi, Bernard J. Jansen
Title: How Is Generative AI Used for Persona Development?: A Systematic Review of 52 Research Articles
Abstract:
Although Generative AI (GenAI) has the potential for persona development, many challenges must be addressed. This research systematically reviews 52 articles from 2022-2024, with important findings. First, closed commercial models are frequently used in persona development, creating a monoculture Second, GenAI is used in various stages of persona development (data collection, segmentation, enrichment, and evaluation). Third, similar to other quantitative persona development techniques, there are major gaps in persona evaluation for AI generated personas. Fourth, human-AI collaboration models are underdeveloped, despite human oversight being crucial for maintaining ethical standards. These findings imply that realizing the full potential of AI-generated personas will require substantial efforts across academia and industry. To that end, we provide a list of research avenues to inspire future work.

Authors:Mohammad Golam Kibria, Lauren Kucirka, Javed Mostafa
Title: Usability Testing of an Explainable AI-enhanced Tool for Clinical Decision Support: Insights from the Reflexive Thematic Analysis
Abstract:
Artificial intelligence-augmented technology represents a considerable opportunity for improving healthcare delivery. Significant progress has been made to demonstrate the value of complex models to enhance clinicians` efficiency in decision-making. However, the clinical adoption of such models is scarce due to multifaceted implementation issues, with the explainability of AI models being among them. One of the substantially documented areas of concern is the unclear AI explainability that negatively influences clinicians` considerations for accepting the complex model. With a usability study engaging 20 U.S.-based clinicians and following the qualitative reflexive thematic analysis, this study develops and presents a concrete framework and an operational definition of explainability. The framework can inform the required customizations and feature developments in AI tools to support clinicians` preferences and enhance their acceptance.

Authors:Jie Gao, Yue Xue, Xiaofei Xie, SoeMin Thant, Erika Lee
Title: Chain of Understanding: Supporting Code Understanding with Large Language Models
Abstract:
Code auditing demands a robust understanding of codebases - an especially challenging task for end-user developers with limited expertise. To address this, we conducted formative interviews with experienced auditors and identified a Chain-of-Understanding approach, in which Large Language Models (LLMs) guide developers through hierarchical code comprehension - from high-level overviews to specific functions and variables. Building on this, we incorporated the Chain-of-Understanding concept into CodeMap, a system offering interactive visualizations, stepwise guided analysis, and context-aware chatbot support. Through within-subject user studies with 10 participants of diverse backgrounds and 5 expert and 2 novice interviews, CodeMap proved effective in reducing the manual effort of prompt engineering while enhancing engagement with visualization, outperforming both standalone LLMs and traditional static visualization tools.

Authors:Damien Rudaz, Christian Licoppe
Title: Public speech recognition transcripts as a configuring parameter
Abstract:
Displaying a written transcript of what a human said (i.e. producing an "automatic speech recognition transcript") is a common feature for smartphone vocal assistants: the utterance produced by a human speaker (e.g. a question) is displayed on the screen while it is being verbally responded to by the vocal assistant. Although very rarely, this feature also exists on some "social" robots which transcribe human interactants' speech on a screen or a tablet. We argue that this informational configuration is pragmatically consequential on the interaction, both for human participants and for the embodied conversational agent. Based on a corpus of co-present interactions with a humanoid robot, we attempt to show that this transcript is a contextual feature which can heavily impact the actions ascribed by humans to the robot: that is, the way in which humans respond to the robot's behavior as constituting a specific type of action (rather than another) and as constituting an adequate response to their own previous turn.

Authors:Mark McGill, Joseph O'Hagan, Thomas Goodge, Graham Wilson, Mohamed Khamis, Veronika Krauß, Jan Gugenheimer
Title: Do We Need Responsible XR? Drawing on Responsible AI to Inform Ethical Research and Practice into XRAI / the Metaverse
Abstract:
This position paper for the CHI 2025 workshop "Everyday AR through AI-in-the-Loop" reflects on whether as a field HCI needs to define Responsible XR as a parallel to, and in conjunction with, Responsible AI, addressing the unique vulnerabilities posed by mass adoption of wearable AI-enabled AR glasses and XR devices that could enact AI-driven human perceptual augmentation.

Authors:Katja Krug, Wolfgang Büschel, Mats Ole Ellenberg
Title: Managing Information Overload in Large-Scale Distributed Mixed-Reality Meetings
Abstract:
Large-scale distributed mixed-reality meetings involve many people and their audiovisual representations. These collaborative environments can introduce challenges such as sensory overload, cognitive strain, and social fatigue. In this paper, we discuss how the unique adaptability of Mixed Reality can be leveraged to weaken these stressors by managing information overload.

Authors:Kailas Dayanandan, Brejesh Lall
Title: Improving Clinical Imaging Systems using Cognition based Approaches
Abstract:
Clinical systems operate in safety-critical environments and are not intended to function autonomously; however, they are currently designed to replicate clinicians' diagnoses rather than assist them in the diagnostic process. To enable better supervision of system-generated diagnoses, we replicate radiologists' systematic approach used to analyze chest X-rays. This approach facilitates comprehensive analysis across all regions of clinical images and can reduce errors caused by inattentional blindness and under reading. Our work addresses a critical research gap by identifying difficult-to-diagnose diseases for clinicians using insights from human vision, enabling these systems to serve as an effective "second pair of eyes". These improvements make the clinical imaging systems more complementary and combine the strengths of human and machine vision. Additionally, we leverage effective receptive fields in deep learning models to present machine-generated diagnoses with sufficient context, making it easier for clinicians to evaluate them.

Authors:Zelun Tony Zhang, Leon Reicherts
Title: Augmenting Human Cognition With Generative AI: Lessons From AI-Assisted Decision-Making
Abstract:
How can we use generative AI to design tools that augment rather than replace human cognition? In this position paper, we review our own research on AI-assisted decision-making for lessons to learn. We observe that in both AI-assisted decision-making and generative AI, a popular approach is to suggest AI-generated end-to-end solutions to users, which users can then accept, reject, or edit. Alternatively, AI tools could offer more incremental support to help users solve tasks themselves, which we call process-oriented support. We describe findings on the challenges of end-to-end solutions, and how process-oriented support can address them. We also discuss the applicability of these findings to generative AI based on a recent study in which we compared both approaches to assist users in a complex decision-making task with LLMs.

Authors:Rahul R. Divekar, Sophia Guerra, Lisette Gonzalez, Natasha Boos, Helen Zhou
Title: Exploring undercurrents of learning tensions in an LLM-enhanced landscape: A student-centered qualitative perspective on LLM vs Search
Abstract:
Large language models (LLMs) are transforming how students learn by providing readily available tools that can quickly augment or complete various learning activities with non-trivial performance. Similar paradigm shifts have occurred in the past with the introduction of search engines and Wikipedia, which replaced or supplemented traditional information sources such as libraries and books. This study investigates the potential for LLMs to represent the next shift in learning, focusing on their role in information discovery and synthesis compared to existing technologies, such as search engines. Using a within-subjects, counterbalanced design, participants learned new topics using a search engine (Google) and an LLM (ChatGPT). Post-task follow-up interviews explored students' reflections, preferences, pain points, and overall perceptions. We present analysis of their responses that show nuanced insights into when, why, and how students prefer LLMs over search engines, offering implications for educators, policymakers, and technology developers navigating the evolving educational landscape.

Authors:Wasura D. Wattearachchi, Erandi Lakshika, Kathryn Kasmarik, Michael Barlow
Title: Designing Effective Human-Swarm Interaction Interfaces: Insights from a User Study on Task Performance
Abstract:
In this paper, we present a systematic method of design for human-swarm interaction interfaces, combining theoretical insights with empirical evaluation. We first derived ten design principles from existing literature, applying them to key information dimensions identified through goal-directed task analysis and developed a tablet-based interface for a target search task. We then conducted a user study with 31 participants where humans were required to guide a robotic swarm to a target in the presence of three types of hazards that pose a risk to the robots: Distributed, Moving, and Spreading. Performance was measured based on the proximity of the robots to the target and the number of deactivated robots at the end of the task. Results indicate that at least one robot was brought closer to the target in 98% of tasks, demonstrating the interface's success in fulfilling the primary objective of the task. Additionally, in nearly 67% of tasks, more than 50% of the robots reached the target. Moreover, particularly better performance was noted in moving hazards. Additionally, the interface appeared to help minimise robot deactivation, as evidenced by nearly 94% of tasks where participants managed to keep more than 50% of the robots active, ensuring that most of the swarm remained operational. However, its effectiveness varied across hazards, with robot deactivation being lowest in distributed hazard scenarios, suggesting that the interface provided the most support in these conditions.

Authors:Mingyuan Zhong, Ruolin Chen, Xia Chen, James Fogarty, Jacob O. Wobbrock
Title: ScreenAudit: Detecting Screen Reader Accessibility Errors in Mobile Apps Using Large Language Models
Abstract:
Many mobile apps are inaccessible, thereby excluding people from their potential benefits. Existing rule-based accessibility checkers aim to mitigate these failures by identifying errors early during development but are constrained in the types of errors they can detect. We present ScreenAudit, an LLM-powered system designed to traverse mobile app screens, extract metadata and transcripts, and identify screen reader accessibility errors overlooked by existing checkers. We recruited six accessibility experts including one screen reader user to evaluate ScreenAudit's reports across 14 unique app screens. Our findings indicate that ScreenAudit achieves an average coverage of 69.2%, compared to only 31.3% with a widely-used accessibility checker. Expert feedback indicated that ScreenAudit delivered higher-quality feedback and addressed more aspects of screen reader accessibility compared to existing checkers, and that ScreenAudit would benefit app developers in real-world settings.

Authors:Jiqun Liu, Jamshed Karimnazarov, Ryen W. White
Title: Trapped by Expectations: Functional Fixedness in LLM-Enabled Chat Search
Abstract:
Functional fixedness, a cognitive bias that restricts users' interactions with a new system or tool to expected or familiar ways, limits the full potential of Large Language Model (LLM)-enabled chat search, especially in complex and exploratory tasks. To investigate its impact, we conducted a crowdsourcing study with 450 participants, each completing one of six decision-making tasks spanning public safety, diet and health management, sustainability, and AI ethics. Participants engaged in a multi-prompt conversation with ChatGPT to address the task, allowing us to compare pre-chat intent-based expectations with observed interactions. We found that: 1) Several aspects of pre-chat expectations are closely associated with users' prior experiences with ChatGPT, search engines, and virtual assistants; 2) Prior system experience shapes language use and prompting behavior. Frequent ChatGPT users reduced deictic terms and hedge words and frequently adjusted prompts. Users with rich search experience maintained structured, less-conversational queries with minimal modifications. Users of virtual assistants favored directive, command-like prompts, reinforcing functional fixedness; 3) When the system failed to meet expectations, participants generated more detailed prompts with increased linguistic diversity, reflecting adaptive shifts. These findings suggest that while preconceived expectations constrain early interactions, unmet expectations can motivate behavioral adaptation. With appropriate system support, this may promote broader exploration of LLM capabilities. This work also introduces a typology for user intents in chat search and highlights the importance of mitigating functional fixedness to support more creative and analytical use of LLMs.

Authors:Shuang Qiu, Zhongcai Pei, Chen Wang, Jing Zhang, Zhiyong Tang
Title: A novel gesture interaction control method for rehabilitation lower extremity exoskeleton
Abstract:
With the rapid development of Rehabilitation Lower Extremity Robotic Exoskeletons (RLEEX) technology, significant advancements have been made in Human-Robot Interaction (HRI) methods. These include traditional physical HRI methods that are easily recognizable and various bio-electrical signal-based HRI methods that can visualize and predict actions. However, most of these HRI methods are contact-based, facing challenges such as operational complexity, sensitivity to interference, risks associated with implantable devices, and, most importantly, limitations in comfort. These challenges render the interaction less intuitive and natural, which can negatively impact patient motivation for rehabilitation. To address these issues, this paper proposes a novel non-contact gesture interaction control method for RLEEX, based on RGB monocular camera depth estimation. This method integrates three key steps: detecting keypoints, recognizing gestures, and assessing distance, thereby applying gesture information and augmented reality triggering technology to control gait movements of RLEEX. Results indicate that this approach provides a feasible solution to the problems of poor comfort, low reliability, and high latency in HRI for RLEEX platforms. Specifically, it achieves a gesture-controlled exoskeleton motion accuracy of 94.11\% and an average system response time of 0.615 seconds through non-contact HRI. The proposed non-contact HRI method represents a pioneering advancement in control interactions for RLEEX, paving the way for further exploration and development in this field.

Authors:Jennifer Sharp, Joshua Kelson, Daryl South, Anthony Saliba, Muhammad Ashad Kabir
Title: Virtual Reality and Artificial Intelligence as Psychological Countermeasures in Space and Other Isolated and Confined Environments: A Scoping Review
Abstract:
Spaceflight is an isolated and confined environment (ICE) that exposes astronauts to psychological hazards, such as stress, danger, and monotony. Virtual reality (VR) and artificial intelligence (AI) technologies can serve as psychological countermeasures as they can digitally simulate immersive environments, interactive companions, and therapeutic experiences. Our study employs a scoping literature review approach to identify what is currently known about the use and effectiveness of VR and AI-based interventions as psychological countermeasures to improve mood or emotional states in adults in space or other ICEs. Additionally, this review aimed to identify gaps in the knowledge base and whether a systematic review with meta-analysis was warranted. The review included studies where the intervention was used or intended for use in space or other extraterrestrial environments (ICE). Our search strategy yielded 19 studies from 3390 records across seven major databases. All studies focused on VR-based interventions, with no eligible AI-based intervention studies found. VR interventions were found to be effective for relaxation and improving mood, emergency training, as an interactive communication platform, for comparing interior designs, and for enhancing exercise. There were improvements for measures of mood and emotion\n (e.g., anxiety and stress); however, user preferences varied, and some instances of cybersickness were reported. A systematic review with meta-analysis is not recommended due to the heterogeneity of results. There is significant scope for further research into the use of VR for a wider range of mood and emotion variables using standardised assessment instruments. Additionally, the potential application of AI as a psychological countermeasure warrants further investigation.

Authors:Maiko Minatoya, Tatsuya Daikoku, Yasuo Kuniyoshi
Title: Emotional Responses to Auditory Hierarchical Structures is Shaped by Bodily Sensations and Listeners' Sensory Traits
Abstract:
Emotional responses to auditory stimuli are a common part of everyday life. However, for some individuals, these responses can be distressing enough to interfere with daily functioning. Despite their prevalence, the mechanisms underlying auditory-induced emotion remain only partially understood. Prior research has identified contributing factors such as auditory features, listener traits, and bodily sensations. However, most studies have focused on acoustic features, leaving the role of syntactic structure largely unexplored. This study specifically investigates how hierarchical syntactic structures influence emotional experience, in conjunction with listener traits and bodily sensations. An online experiment was conducted with 715 participants, who listened to 26 sound sequences varying systematically in hierarchical syntactic complexity. Sequences were generated by combining three types of local pitch movement with three types of global pitch movement in ascending and descending pitch directions, resulting in nine complexity levels. Participants rated the valence and arousal of each sequence and indicated any bodily sensations on a body map. Measures of sensory processing patterns were also collected. Results showed that emotional valence was associated with the complex interplay of moderate syntactic complexity ("not too simple, not too complex"), sensory sensitivity, and upper torso sensations. These findings expand existing research by identifying syntactic features that shape auditory-induced emotional experience and highlight the link between bodily sensation and emotional response. They also suggest potential applications for incorporating syntactic design into therapeutic approaches to emotion regulation.

Authors:Sihang Zhao, Shoucong Carol Xiong, Bo Pang, Xiaoying Tang, Pinjia He
Title: Let AI Read First: Enhancing Reading Abilities for Individuals with Dyslexia through Artificial Intelligence
Abstract:
Dyslexia, a neurological condition affecting approximately 12% of the global population, presents significant challenges to reading ability and quality of life. Existing assistive technologies are limited by factors such as unsuitability for quiet environments, high costs, and the risk of distorting meaning or failing to provide real-time support. To address these issues, we introduce LARF (Let AI Read First), the first strategy that employs large language models to annotate text and enhance readability while preserving the original content. We evaluated LARF in a large-scale between-subjects experiment, involving 150 participants with dyslexia. The results show that LARF significantly improves reading performance and experience for individuals with dyslexia. Results also prove that LARF is particularly helpful for participants with more severe reading difficulties. Furthermore, this work discusses potential research directions opened up by LARF for the HCI community.

Authors:ByungMin Kim, DongHeun Han, HyeongYeop Kang
Title: Shaping the Future of VR Hand Interactions: Lessons Learned from Modern Methods
Abstract:
In virtual reality, it is widely assumed that increased realism in hand-object interactions enhances user immersion and overall experience. However, recent studies challenge this assumption, suggesting that faithfully replicating real-world physics and visuals is not always necessary for improved usability or immersion. This has led to ambiguity for developers when choosing optimal hand interaction methods for different applications. Currently, there is a lack of comprehensive research to resolve this issue. This study aims to fill this gap by evaluating three contemporary VR hand interaction methods-Attachment, Penetration, and Torque-across two distinct task scenarios: simple manipulation tasks and more complex, precision-driven tasks. By examining key technical features, we identify the strengths and limitations of each method and propose development guidelines for future advancements. Our findings reveal that while Attachment, with its simplified control mechanisms, is well-suited for commercial applications, Penetration and Torque show promise for next-generation interactions. The insights gained from our study provide practical guidance for developers and researchers seeking to balance realism, usability, and user satisfaction in VR environments.

Authors:Ningjing Tang, Megan Li, Amy Winecoff, Michael Madaio, Hoda Heidari, Hong Shen
Title: Navigating Uncertainties: Understanding How GenAI Developers Document Their Models on Open-Source Platforms
Abstract:
Model documentation plays a crucial role in promoting transparency and responsible development of AI systems. With the rise of Generative AI (GenAI), open-source platforms have increasingly become hubs for hosting and distributing these models, prompting platforms like Hugging Face to develop dedicated model documentation guidelines that align with responsible AI principles. Despite these growing efforts, there remains a lack of understanding of how developers document their GenAI models on open-source platforms. Through interviews with 13 GenAI developers active on open-source platforms, we provide empirical insights into their documentation practices and challenges. Our analysis reveals that despite existing resources, developers of GenAI models still face multiple layers of uncertainties in their model documentation: (1) uncertainties about what specific content should be included; (2) uncertainties about how to effectively report key components of their models; and (3) uncertainties in deciding who should take responsibilities for various aspects of model documentation. Based on our findings, we discuss the implications for policymakers, open-source platforms, and the research community to support meaningful, effective and actionable model documentation in the GenAI era, including cultivating better community norms, building robust evaluation infrastructures, and clarifying roles and responsibilities.

Authors:Mahdis Tajdari, Jason Forsyth, Sol Lim
Title: Navigating with Haptic Gloves: Investigating Strategies for Horizontal and Vertical Movement Guidance
Abstract:
Navigating peripersonal space requires reaching targets in both horizontal (e.g., desks) and vertical (e.g., shelves) layouts with high precision. We developed a haptic glove to aid peri-personal target navigation and investigated the effectiveness of different feedback delivery methods. Twenty-two participants completed target navigation tasks under various conditions, including scene layout (horizontal or vertical), guidance approach (two-tactor or worst-axis first), guidance metaphor (push or pull), and intensity mode (linear or zone) for conveying distance cues. Task completion time, hand trajectory distance, and the percentage of hand trajectory in a critical area were measured as performance outcomes, along with subjective feedback. Participants achieved significantly faster task completion times and covered less hand trajectory distance in the horizontal layout, worst-axis first approach, and pull metaphor conditions. Additionally, male participants demonstrated superior performance and reported lower levels of frustration compared to their female counterparts throughout the study. Intensity mode had no significant effect on the results. In summary, vibrating one tactor at a time (worst-axis first) and using the pull metaphor were the most effective methods of delivering vibrotactile feedback for peripersonal target navigation in both horizontal and vertical settings. Findings from this work can guide future development of haptic gloves for individuals with vision impairments, environments with visual limitations, and for accessibility and rehabilitation applications.

Authors:Jun Hu, Mengru Xue, Cheng Yao, Yuan Feng, Jiabao Li, Preben Hansen
Title: Workshop on Aesthetics of Connectivity for Empowerment at ACM Designing Interactive Systems 2024
Abstract:
Connectivity enabled by technologies such as the Internet of Things, Artificial Intelligence, Big Data, and Cloud Computing is rapidly transforming our interactions with the world and with each other. It reshapes social interactions, fostering collaboration, creativity, and unprecedented access to information and resources. However, this connected world and era demand innovative design approaches that harmonize technical functionality with human-centered values. We have run a series of workshops at different conferences, trying to engage the participants in discussions about the related challenges and opportunities, of digital art [1] and aesthetics [2] to AI-driven creativity [3] and their functional aspects in healthcare [1] and empowerment [2, 3]. We want to focus further on the intersection of these challenges where we see opportunities: leveraging aesthetics and connectivity as catalysts for empowerment.

Authors:Heiko Renz, Maximilian Krämer, Frank Hoffmann, Torsten Bertram
Title: Next-Best-Trajectory Planning of Robot Manipulators for Effective Observation and Exploration
Abstract:
Visual observation of objects is essential for many robotic applications, such as object reconstruction and manipulation, navigation, and scene understanding. Machine learning algorithms constitute the state-of-the-art in many fields but require vast data sets, which are costly and time-intensive to collect. Automated strategies for observation and exploration are crucial to enhance the efficiency of data gathering. Therefore, a novel strategy utilizing the Next-Best-Trajectory principle is developed for a robot manipulator operating in dynamic environments. Local trajectories are generated to maximize the information gained from observations along the path while avoiding collisions. We employ a voxel map for environment modeling and utilize raycasting from perspectives around a point of interest to estimate the information gain. A global ergodic trajectory planner provides an optional reference trajectory to the local planner, improving exploration and helping to avoid local minima. To enhance computational efficiency, raycasting for estimating the information gain in the environment is executed in parallel on the graphics processing unit. Benchmark results confirm the efficiency of the parallelization, while real-world experiments demonstrate the strategy's effectiveness.

Authors:Thomas Krämer, Francesco Chiossi, Thomas Kosch
Title: Evaluating Eye Tracking and Electroencephalography as Indicator for Selective Exposure During Online News Reading
Abstract:
Selective exposure to online news consumption reinforces filter bubbles, restricting access to diverse viewpoints. Interactive systems can counteract this bias by suggesting alternative perspectives, but they require real-time indicators to identify selective exposure. This workshop paper proposes the integration of physiological sensing, including Electroencephalography (EEG) and eye tracking, to measure selective exposure. We propose methods for examining news agreement and its relationship to theta band power in the parietal region, indicating a potential link between cortical activity and selective exposure. Our vision is interactive systems that detect selective exposure and provide alternative views in real time. We suggest that future news interfaces incorporate physiological signals to promote more balanced information consumption. This work joins the discussion on AI-enhanced methodology for bias detection.

Authors:Rifat Mehreen Amin, Oliver Hans Kühle, Daniel Buschek, Andreas Butz
Title: Composable Prompting Workspaces for Creative Writing: Exploration and Iteration Using Dynamic Widgets
Abstract:
Generative AI models offer many possibilities for text creation and transformation. Current graphical user interfaces (GUIs) for prompting them lack support for iterative exploration, as they do not represent prompts as actionable interface objects. We propose the concept of a composable prompting canvas for text exploration and iteration using dynamic widgets. Users generate widgets through system suggestions, prompting, or manually to capture task-relevant facets that affect the generated text. In a comparative study with a baseline (conversational UI), 18 participants worked on two writing tasks, creating diverse prompting environments with custom widgets and spatial layouts. They reported having more control over the generated text and preferred our system over the baseline. Our design significantly outperformed the baseline on the Creativity Support Index, and participants felt the results were worth the effort. This work highlights the need for GUIs that support user-driven customization and (re-)structuring to increase both the flexibility and efficiency of prompting.

Authors:Sora Kang, Kaiwen Yu, Xinyi Zhou, Joonhwan Lee
Title: Designing a User Interface for Generative Design in Augmented Reality: A Step Towards More Visualization and Feed-Forwarding
Abstract:
Generative design, an AI-assisted technology for optimizing design through algorithmic processes, is propelling advancements across numerous fields. As the use of immersive environments such as Augmented Reality (AR) continues to rise, integrating generative design into such platforms presents a potent opportunity for innovation. However, a vital challenge that impedes this integration is the current absence of an efficient and user-friendly interface for designers to operate within these environments effectively. To bridge this gap, we introduce a novel UI system for generative design software in AR, which automates the process of generating the potential design constraints based on the users' inputs. This system allows users to construct a virtual environment, edit objects and constraints, and export the final data in CSV format. The interface enhances the user's design experience by enabling more intuitive interactions and providing immediate visual feedback. Deriving from participatory design principles, this research proposes a significant leap forward in the realms of generative design and immersive environments.

Authors:Saelyne Yang, Anh Truong, Juho Kim, Dingzeyu Li
Title: VideoMix: Aggregating How-To Videos for Task-Oriented Learning
Abstract:
Tutorial videos are a valuable resource for people looking to learn new tasks. People often learn these skills by viewing multiple tutorial videos to get an overall understanding of a task by looking at different approaches to achieve the task. However, navigating through multiple videos can be time-consuming and mentally demanding as these videos are scattered and not easy to skim. We propose VideoMix, a system that helps users gain a holistic understanding of a how-to task by aggregating information from multiple videos on the task. Insights from our formative study (N=12) reveal that learners value understanding potential outcomes, required materials, alternative methods, and important details shared by different videos. Powered by a Vision-Language Model pipeline, VideoMix extracts and organizes this information, presenting concise textual summaries alongside relevant video clips, enabling users to quickly digest and navigate the content. A comparative user study (N=12) demonstrated that VideoMix enabled participants to gain a more comprehensive understanding of tasks with greater efficiency than a baseline video interface, where videos are viewed independently. Our findings highlight the potential of a task-oriented, multi-video approach where videos are organized around a shared goal, offering an enhanced alternative to conventional video-based learning.

Authors:Zhuojiang Cai, Jingkai Hong, Zhimin Wang, Feng Lu
Title: GazeSwipe: Enhancing Mobile Touchscreen Reachability through Seamless Gaze and Finger-Swipe Integration
Abstract:
Smartphones with large screens provide users with increased display and interaction space but pose challenges in reaching certain areas with the thumb when using the device with one hand. To address this, we introduce GazeSwipe, a multimodal interaction technique that combines eye gaze with finger-swipe gestures, enabling intuitive and low-friction reach on mobile touchscreens. Specifically, we design a gaze estimation method that eliminates the need for explicit gaze calibration. Our approach also avoids the use of additional eye-tracking hardware by leveraging the smartphone's built-in front-facing camera. Considering the potential decrease in gaze accuracy without dedicated eye trackers, we use finger-swipe gestures to compensate for any inaccuracies in gaze estimation. Additionally, we introduce a user-unaware auto-calibration method that improves gaze accuracy during interaction. Through extensive experiments on smartphones and tablets, we compare our technique with various methods for touchscreen reachability and evaluate the performance of our auto-calibration strategy. The results demonstrate that our method achieves high success rates and is preferred by users. The findings also validate the effectiveness of the auto-calibration strategy.

Authors:Jun Yuan, Kevin Miao, Heyin Oh, Isaac Walker, Zhouyang Xue, Tigran Katolikyan, Marco Cavallo
Title: VibE: A Visual Analytics Workflow for Semantic Error Analysis of CVML Models at Subgroup Level
Abstract:
Effective error analysis is critical for the successful development and deployment of CVML models. One approach to understanding model errors is to summarize the common characteristics of error samples. This can be particularly challenging in tasks that utilize unstructured, complex data such as images, where patterns are not always obvious. Another method is to analyze error distributions across pre-defined categories, which requires analysts to hypothesize about potential error causes in advance. Forming such hypotheses without access to explicit labels or annotations makes it difficult to isolate meaningful subgroups or patterns, however, as analysts must rely on manual inspection, prior expertise, or intuition. This lack of structured guidance can hinder a comprehensive understanding of where models fail. To address these challenges, we introduce VibE, a semantic error analysis workflow designed to identify where and why computer vision and machine learning (CVML) models fail at the subgroup level, even when labels or annotations are unavailable. VibE incorporates several core features to enhance error analysis: semantic subgroup generation, semantic summarization, candidate issue proposals, semantic concept search, and interactive subgroup analysis. By leveraging large foundation models (such as CLIP and GPT-4) alongside visual analytics, VibE enables developers to semantically interpret and analyze CVML model errors. This interactive workflow helps identify errors through subgroup discovery, supports hypothesis generation with auto-generated subgroup summaries and suggested issues, and allows hypothesis validation through semantic concept search and comparative analysis. Through three diverse CVML tasks and in-depth expert interviews, we demonstrate how VibE can assist error understanding and analysis.

Authors:Tobias Kauer, Marian Dörk, Benjamin Bach
Title: Towards Collective Storytelling: Investigating Audience Annotations in Data Visualizations
Abstract:
This work investigates personal perspectives in visualization annotations as devices for collective data-driven storytelling. Inspired by existing efforts in critical cartography, we show how people share personal memories in a visualization of COVID-19 data and how comments by other visualization readers influence the reading and understanding of visualizations. Analyzing interaction logs, reader surveys, visualization annotations, and interviews, we find that reader annotations help other viewers relate to other people's stories and reflect on their own experiences. Further, we found that annotations embedded directly into the visualization can serve as social traces guiding through a visualization and help readers contextualize their own stories. With that, they supersede the attention paid to data encodings and become the main focal point of the visualization.

Authors:Jichen Zhu, Pedro Sanches, Vasiliki Tsaknaki, Willem van der Maden, Irene Kaklopoulou
Title: The Centers and Margins of Modeling Humans in Well-being Technologies: A Decentering Approach
Abstract:
This paper critically examines the machine learning (ML) modeling of humans in three case studies of well-being technologies. Through a critical technical approach, it examines how these apps were experienced in daily life (technology in use) to surface breakdowns and to identify the assumptions about the "human" body entrenched in the ML models (technology design). To address these issues, this paper applies agential realism to decenter foundational assumptions, such as body regularity and health/illness binaries, and speculates more inclusive design and ML modeling paths that acknowledge irregularity, human-system entanglements, and uncertain transitions. This work is among the first to explore the implications of decentering theories in computational modeling of human bodies and well-being, offering insights for more inclusive technologies and speculations toward posthuman-centered ML modeling.

Authors:Fan Zhang, Molin Li, Xiaoyu Chang, Kexue Fu, Richard William Allen, RAY LC
Title: "Becoming My Own Audience": How Dancers React to Avatars Unlike Themselves in Motion Capture-Supported Live Improvisational Performance
Abstract:
The use of motion capture in live dance performances has created an emerging discipline enabling dancers to play different avatars on the digital stage. Unlike classical workflows, avatars enable performers to act as different characters in customized narratives, but research has yet to address how movement, improvisation, and perception change when dancers act as avatars. We created five avatars representing differing genders, shapes, and body limitations, and invited 15 dancers to improvise with each in practice and performance settings. Results show that dancers used avatars to distance themselves from their own habitual movements, exploring new ways of moving through differing physical constraints. Dancers explored using gender-stereotyped movements like powerful or feminine actions, experimenting with gender identity. However, focusing on avatars can coincide with a lack of continuity in improvisation. This work shows how emerging practices with performance technology enable dancers to improvise with new constraints, stepping outside the classical stage.

Authors:Andrew McNutt, Maggie K McCracken, Ishrat Jahan Eliza, Daniel Hajas, Jake Wagoner, Nate Lanza, Jack Wilburn, Sarah Creem-Regehr, Alexander Lex
Title: Accessible Text Descriptions for UpSet Plots
Abstract:
Data visualizations are typically not accessible to blind and low-vision (BLV) users. Automatically generating text descriptions offers an enticing mechanism for democratizing access to the information held in complex scientific charts, yet appropriate procedures for generating those texts remain elusive. Pursuing this issue, we study a single complex chart form: UpSet plots. UpSet Plots are a common way to analyze set data, an area largely unexplored by prior accessibility literature. By analyzing the patterns present in real-world examples, we develop a system for automatically captioning any UpSet plot. We evaluated the utility of our captions via semi-structured interviews with (N=11) BLV users and found that BLV users find them informative. In extensions, we find that sighted users can use our texts similarly to UpSet plots and that they are better than naive LLM usage.

Authors:Silvia Cazacu, Stien Poncelet, Emma Feijtraij, Andrew Vande Moere
Title: The EnviroMapper Toolkit: an Input Physicalisation that Captures the Situated Experience of Environmental Comfort in Offices
Abstract:
The environmental comfort in offices is traditionally captured by surveying an entire workforce simultaneously, which yet fails to capture the situatedness of the different personal experiences. To address this limitation, we developed the EnviroMapper Toolkit, a data physicalisation toolkit that allows individual office workers to record their personal experiences of environmental comfort by mapping the actual moments and locations these occurred. By analysing two in-the-wild studies in existing open-plan office environments (N=14), we demonstrate how this toolkit acts like a situated input visualisation that can be interpreted by domain experts who were not present during its construction. This study therefore offers four key contributions: (1) the iterative design process of the physicalisation toolkit; (2) its preliminary deployment in two real-world office contexts; (3) the decoding of the resulting artefacts by domain experts; and (4) design considerations to support future input physicalisation and visualisation constructions that capture and synthesise data from multiple individuals.

Authors:J. M. Diederik Kruijssen, Nicholas Emmons
Title: Deterministic AI Agent Personality Expression through Standard Psychological Diagnostics
Abstract:
Artificial intelligence (AI) systems powered by large language models have become increasingly prevalent in modern society, enabling a wide range of applications through natural language interaction. As AI agents proliferate in our daily lives, their generic and uniform expressiveness presents a significant limitation to their appeal and adoption. Personality expression represents a key prerequisite for creating more human-like and distinctive AI systems. We show that AI models can express deterministic and consistent personalities when instructed using established psychological frameworks, with varying degrees of accuracy depending on model capabilities. We find that more advanced models like GPT-4o and o1 demonstrate the highest accuracy in expressing specified personalities across both Big Five and Myers-Briggs assessments, and further analysis suggests that personality expression emerges from a combination of intelligence and reasoning capabilities. Our results reveal that personality expression operates through holistic reasoning rather than question-by-question optimization, with response-scale metrics showing higher variance than test-scale metrics. Furthermore, we find that model fine-tuning affects communication style independently of personality expression accuracy. These findings establish a foundation for creating AI agents with diverse and consistent personalities, which could significantly enhance human-AI interaction across applications from education to healthcare, while additionally enabling a broader range of more unique AI agents. The ability to quantitatively assess and implement personality expression in AI systems opens new avenues for research into more relatable, trustworthy, and ethically designed AI.

Authors:Simon Suh, Jihyuk Bang, Ji Woo Han
Title: Developing Critical Thinking in Second Language Learners: Exploring Generative AI like ChatGPT as a Tool for Argumentative Essay Writing
Abstract:
This study employs the Paul-Elder Critical Thinking Model and Tan's argumentative writing framework to create a structured methodology. This methodology, ChatGPT Guideline for Critical Argumentative Writing (CGCAW) framework, integrates the models with ChatGPT's capabilities to guide L2 learners in utilizing ChatGPT to enhance their critical thinking skills. A quantitative experiment was conducted with 10 participants from a state university, divided into experimental and control groups. The experimental group utilized the CGCAW framework, while the control group used ChatGPT without specific guidelines. Participants wrote an argumentative essay within a 40-minute timeframe, and essays were evaluated by three assessors: ChatGPT, Grammarly, and a course instructor. Results indicated that the experimental group showed improvements in clarity, logical coherence, and use of evidence, demonstrating ChatGPT's potential to enhance specific aspects of argumentative writing. However, the control group performed better in overall language mechanics and articulation of main arguments, indicating areas where the CGCAW framework could be further refined. This study highlights the need for further research to optimize the use of AI tools like ChatGPT in L2 learning environments to enhance critical thinking and writing skills.

Authors:Mohammad Golam Kibria, Lauren Kucirka, Javed Mostafa
Title: Assessing AI Explainability: A Usability Study Using a Novel Framework Involving Clinicians
Abstract:
An AI design framework was developed based on three core principles, namely understandability, trust, and usability. The framework was conceptualized by synthesizing evidence from the literature and by consulting with experts. The initial version of the AI Explainability Framework was validated based on an in-depth expert engagement and review process. For evaluation purposes, an AI-anchored prototype, incorporating novel explainability features, was built and deployed online. The primary function of the prototype was to predict the postpartum depression risk using analytics models. The development of the prototype was carried out in an iterative fashion, based on a pilot-level formative evaluation, followed by refinements and summative evaluation. The System Explainability Scale (SES) metric was developed to measure the influence of the three dimensions of the AI Explainability Framework. For the summative stage, a comprehensive usability test was conducted involving 20 clinicians, and the SES metric was used to assess clinicians` satisfaction with the tool. On a 5-point rating system, the tool received high scores for the usability dimension, followed by trust and understandability. The average explainability score was 4.56. In terms of understandability, trust, and usability, the average score was 4.51, 4.53 and 4.71 respectively. Overall, the 13-item SES metric showed strong internal consistency with Cronbach`s alpha of 0.84 and a positive correlation coefficient (Spearman`s rho = 0.81, p<0.001) between the composite SES score and explainability. A major finding was that the framework, combined with the SES usability metric, provides a straightforward approach for developing AI-based healthcare tools that lower the challenges associated with explainability.

Authors:Damien Rudaz, Christian Licoppe
Title: "Playing the robot's advocate": Bystanders' descriptions of a robot's conduct in public settings
Abstract:
Relying on a large corpus of natural interactions between visitors and a robot in a museum setting, we study a recurrent practice through which humans "worked" to maintain the robot as a competent participant: the description by bystanders, in a way that was made accessible to the main speaker, of the social action that the robot was taken to be accomplishing. Doing so, bystanders maintained the robot's (sometimes incongruous) behaviour as relevant to the activity at hand and preserved the robot itself as a competent participant. Relying on these data, we argue that ex ante definitions of a robot as "social" (i.e. before any interaction occurred) run the risk of naturalizing as self-evident the observable result from micro-sociological processes: namely, the interactional work of co-present humans through which the robot's conduct is reconfigured as contextually relevant.

Authors:Onno P Kampman, Michael Xing, Charmaine Lim, Ahmad Ishqi Jabir, Ryan Louie, Jimmy Lee, Robert JT Morris
Title: Conversational Self-Play for Discovering and Understanding Psychotherapy Approaches
Abstract:
This paper explores conversational self-play with LLMs as a scalable approach for analyzing and exploring psychotherapy approaches, evaluating how well AI-generated therapeutic dialogues align with established modalities.

Authors:Ruchik Mishra, Laksita Prasanna, Adair Adair, Dan O Popa
Title: Multimodal Sensing and Machine Learning to Compare Printed and Verbal Assembly Instructions Delivered by a Social Robot
Abstract:
In this paper, we compare a manual assembly task communicated to workers using both printed and robot-delivered instructions. The comparison was made using physiological signals (blood volume pulse (BVP) and electrodermal activity (EDA)) collected from individuals during an experimental study. In addition, we also collected responses of individuals using the NASA Task Load Index (TLX) survey. Furthermore, we mapped the collected physiological signals to the responses of participants for NASA TLX to predict their workload. For both the classification problems, we compare the performance of Convolutional Neural Networks (CNNs) and Long-Short-Term Memory (LSTM) models. Results show that for our CNN-based approach using multimodal data (both BVP and EDA) gave better results than using just BVP (approx. 8.38% more) and EDA (approx 20.49% more). Our LSTM-based model too had better results when we used multimodal data (approx 8.38% more than just BVP and 6.70% more than just EDA). Overall, CNNs performed better than LSTMs for classifying physiologies for paper vs robot-based instruction by 7.72%. The CNN-based model was able to give better classification results (approximately 17.83% more on an average across all responses of the NASA TLX) within a few minutes of training compared to the LSTM-based models.

Authors:Enzo Sinacola, Arnault Pachot, Thierry Petit
Title: Llms, Virtual Users, and Bias: Predicting Any Survey Question Without Human Data
Abstract:
Large Language Models (LLMs) offer a promising alternative to traditional survey methods, potentially enhancing efficiency and reducing costs. In this study, we use LLMs to create virtual populations that answer survey questions, enabling us to predict outcomes comparable to human responses. We evaluate several LLMs-including GPT-4o, GPT-3.5, Claude 3.5-Sonnet, and versions of the Llama and Mistral models-comparing their performance to that of a traditional Random Forests algorithm using demographic data from the World Values Survey (WVS). LLMs demonstrate competitive performance overall, with the significant advantage of requiring no additional training data. However, they exhibit biases when predicting responses for certain religious and population groups, underperforming in these areas. On the other hand, Random Forests demonstrate stronger performance than LLMs when trained with sufficient data. We observe that removing censorship mechanisms from LLMs significantly improves predictive accuracy, particularly for underrepresented demographic segments where censored models struggle. These findings highlight the importance of addressing biases and reconsidering censorship approaches in LLMs to enhance their reliability and fairness in public opinion research.

Authors:Sareh Ahmadi, Michelle Rockwell, Megan Stuart, Nicki Rohani, Allison Tegge, Xuan Wang, Jeffrey Stein, Edward A. Fox
Title: AI-Facilitated Episodic Future Thinking For Adults with Obesity
Abstract:
Episodic Future Thinking (EFT) involves vividly imagining personal future events and experiences in detail. It has shown promise as an intervention to reduce delay discounting-the tendency to devalue delayed rewards in favor of immediate gratification- and to promote behavior change in a range of maladaptive health behaviors. We present EFTeacher, an AI chatbot powered by the GPT-4-Turbo large language model, designed to generate EFT cues for users with lifestyle-related conditions. To evaluate the feasibility and usability of EFTeacher, we conducted a mixed-methods study that included usability assessments, user evaluations based on content characteristics questionnaires, and semi-structured interviews. Qualitative findings indicate that participants perceived EFTeacher as communicative and supportive through an engaging dialogue. The chatbot facilitated imaginative thinking and reflection on future goals. Participants appreciated its adaptability and personalization features, though some noted challenges such as repetitive dialogue and verbose responses. Our findings underscore the potential of large language model-based chatbots in EFT interventions targeting maladaptive health behaviors.

Authors:Ramira van der Meulen, Rineke Verbrugge, Max van Duijn
Title: Towards properly implementing Theory of Mind in AI systems: An account of four misconceptions
Abstract:
The search for effective collaboration between humans and computer systems is one of the biggest challenges in Artificial Intelligence. One of the more effective mechanisms that humans use to coordinate with one another is theory of mind (ToM). ToM can be described as the ability to `take someone else's perspective and make estimations of their beliefs, desires and intentions, in order to make sense of their behaviour and attitudes towards the world'. If leveraged properly, this skill can be very useful in Human-AI collaboration. This introduces the question how we implement ToM when building an AI system. Humans and AI Systems work quite differently, and ToM is a multifaceted concept, each facet rooted in different research traditions across the cognitive and developmental sciences. We observe that researchers from artificial intelligence and the computing sciences, ourselves included, often have difficulties finding their way in the ToM literature. In this paper, we identify four common misconceptions around ToM that we believe should be taken into account when developing an AI system. We have hyperbolised these misconceptions for the sake of the argument, but add nuance in their discussion. The misconceptions we discuss are: (1) "Humans Use a ToM Module, So AI Systems Should As Well". (2) "Every Social Interaction Requires (Advanced) ToM". (3) "All ToM is the Same". (4) "Current Systems Already Have ToM". After discussing the misconception, we end each section by providing tentative guidelines on how the misconception can be overcome.

Authors:Petr Parshakov, Iuliia Naidenova, Sofia Paklina, Nikita Matkin, Cornel Nesseler
Title: Users Favor LLM-Generated Content -- Until They Know It's AI
Abstract:
In this paper, we investigate how individuals evaluate human and large langue models generated responses to popular questions when the source of the content is either concealed or disclosed. Through a controlled field experiment, participants were presented with a set of questions, each accompanied by a response generated by either a human or an AI. In a randomized design, half of the participants were informed of the response's origin while the other half remained unaware. Our findings indicate that, overall, participants tend to prefer AI-generated responses. However, when the AI origin is revealed, this preference diminishes significantly, suggesting that evaluative judgments are influenced by the disclosure of the response's provenance rather than solely by its quality. These results underscore a bias against AI-generated content, highlighting the societal challenge of improving the perception of AI work in contexts where quality assessments should be paramount.

Authors:Iago Alves Brito, Julia Soares Dollis, Fernanda Bufon Färber, Pedro Schindler Freire Brasil Ribeiro, Rafael Teixeira Sousa, Arlindo Rodrigues Galvão Filho
Title: Integrating Personality into Digital Humans: A Review of LLM-Driven Approaches for Virtual Reality
Abstract:
The integration of large language models (LLMs) into virtual reality (VR) environments has opened new pathways for creating more immersive and interactive digital humans. By leveraging the generative capabilities of LLMs alongside multimodal outputs such as facial expressions and gestures, virtual agents can simulate human-like personalities and emotions, fostering richer and more engaging user experiences. This paper provides a comprehensive review of methods for enabling digital humans to adopt nuanced personality traits, exploring approaches such as zero-shot, few-shot, and fine-tuning. Additionally, it highlights the challenges of integrating LLM-driven personality traits into VR, including computational demands, latency issues, and the lack of standardized evaluation frameworks for multimodal interactions. By addressing these gaps, this work lays a foundation for advancing applications in education, therapy, and gaming, while fostering interdisciplinary collaboration to redefine human-computer interaction in VR.

Authors:Hangyeol Kang, Thiago Freitas dos Santos, Maher Ben Moussa, Nadia Magnenat-Thalmann
Title: Mitigating the Uncanny Valley Effect in Hyper-Realistic Robots: A Student-Centered Study on LLM-Driven Conversations
Abstract:
The uncanny valley effect poses a significant challenge in the development and acceptance of hyper-realistic social robots. This study investigates whether advanced conversational capabilities powered by large language models (LLMs) can mitigate this effect in highly anthropomorphic robots. We conducted a user study with 80 participants interacting with Nadine, a hyper-realistic humanoid robot equipped with LLM-driven communication skills. Through pre- and post-interaction surveys, we assessed changes in perceptions of uncanniness, conversational quality, and overall user experience. Our findings reveal that LLM-enhanced interactions significantly reduce feelings of eeriness while fostering more natural and engaging conversations. Additionally, we identify key factors influencing user acceptance, including conversational naturalness, human-likeness, and interestingness. Based on these insights, we propose design recommendations to enhance the appeal and acceptability of hyper-realistic robots in social contexts. This research contributes to the growing field of human-robot interaction by offering empirical evidence on the potential of LLMs to bridge the uncanny valley, with implications for the future development of social robots.

Authors:Brett Puppart, Paul-Henry Paltmann, Jaan Aru
Title: Haunted House: A text-based game for comparing the flexibility of mental models in humans and LLMs
Abstract:
This study introduces "Haunted House" a novel text-based game designed to compare the performance of humans and large language models (LLMs) in model-based reasoning. Players must escape from a house containing nine rooms in a 3x3 grid layout while avoiding the ghost. They are guided by verbal clues that they get each time they move. In Study 1, the results from 98 human participants revealed a success rate of 31.6%, significantly outperforming seven state-of-the-art LLMs tested. Out of 140 attempts across seven LLMs, only one attempt resulted in a pass by Claude 3 Opus. Preliminary results suggested that GPT o3-mini-high performance might be higher, but not at the human level. Further analysis of 29 human participants' moves in Study 2 indicated that LLMs frequently struggled with random and illogical moves, while humans exhibited such errors less frequently. Our findings suggest that current LLMs encounter difficulties in tasks that demand active model-based reasoning, offering inspiration for future benchmarks.

Authors:Young-Ho Bae, Casey C. Bennett
Title: Multimodal Transformer Models for Turn-taking Prediction: Effects on Conversational Dynamics of Human-Agent Interaction during Cooperative Gameplay
Abstract:
This study investigates multimodal turn-taking prediction within human-agent interactions (HAI), particularly focusing on cooperative gaming environments. It comprises both model development and subsequent user study, aiming to refine our understanding and improve conversational dynamics in spoken dialogue systems (SDSs). For the modeling phase, we introduce a novel transformer-based deep learning (DL) model that simultaneously integrates multiple modalities - text, vision, audio, and contextual in-game data to predict turn-taking events in real-time. Our model employs a Crossmodal Transformer architecture to effectively fuse information from these diverse modalities, enabling more comprehensive turn-taking predictions. The model demonstrates superior performance compared to baseline models, achieving 87.3% accuracy and 83.0% macro F1 score. A human user study was then conducted to empirically evaluate the turn-taking DL model in an interactive scenario with a virtual avatar while playing the game "Dont Starve Together", comparing a control condition without turn-taking prediction (n=20) to an experimental condition with our model deployed (n=40). Both conditions included a mix of English and Korean speakers, since turn-taking cues are known to vary by culture. We then analyzed the interaction quality, examining aspects such as utterance counts, interruption frequency, and participant perceptions of the avatar. Results from the user study suggest that our multimodal turn-taking model not only enhances the fluidity and naturalness of human-agent conversations, but also maintains a balanced conversational dynamic without significantly altering dialogue frequency. The study provides in-depth insights into the influence of turn-taking abilities on user perceptions and interaction quality, underscoring the potential for more contextually adaptive and responsive conversational agents.

Authors:Longdi Xian, Junhao Xu
Title: Exploring the Panorama of Anxiety Levels: A Multi-Scenario Study Based on Human-Centric Anxiety Level Detection and Personalized Guidance
Abstract:
More and more people are experiencing pressure from work, life, and education. These pressures often lead to an anxious state of mind, or even the early symptoms of suicidal ideation. With the advancement of artificial intelligence (AI) technology, large language models have become one of the most prominent technologies. They are often used for detecting psychological disorders. However, current studies primarily provide categorization results without offering interpretable explanations for these results. To address this gap, this study adopts a person-centered perspective and focuses on GPT-generated multi-scenario simulated conversations. These simulated conversations were selected as data samples for the study. Various transformer-based encoder models were utilized to develop a classification model capable of identifying different levels of anxiety. Additionally, a knowledge base focusing on anxiety was constructed using LangChain and GPT-4. When analyzing classification results, this knowledge base was able to provide explanations and reasons most relevant to the interlocutor's anxiety situation. The study demonstrates that the proposed model achieves over 94% accuracy in categorical prediction, and the advice provided is highly personalized and relevant.

Authors:Ana Tanevska, Katie Winkle, Ginevra Castellano
Title: "I don't like things where I do not have control": Participants' Experience of Trustworthy Interaction with Autonomous Vehicles
Abstract:
With the rapid advancement of autonomous vehicle (AV) technology, AVs are progressively seen as interactive agents with some level of autonomy, as well as some context-dependent social features. This introduces new challenges and questions, already relevant in other areas of human-robot interaction (HRI) - namely, if an AV is perceived as a social agent by the human with whom it is interacting, how are the various facets of its design and behaviour impacting its human partner? And how can we foster a successful human-agent interaction (HAI) between the AV and the human, maximizing the human's comfort, acceptance, and trust in the AV? In this work, we attempt to understand the various factors that could influence naïve participants' acceptance and trust when interacting with an AV in the role of a driver. Through a large-scale online study, we investigate the effect of the AV's autonomy on the human driver, as well as explore which parameters of the interaction have the highest impact on the user's sense of trust in the AV. Finally, we analyze our preliminary findings from the user study within existing guidelines on Trustworthy HAI/HRI.

Authors:Loukas Triantafyllopoulos, Dimitris Kalles
Title: From Divergence to Consensus: Evaluating the Role of Large Language Models in Facilitating Agreement through Adaptive Strategies
Abstract:
Achieving consensus in group decision-making often involves overcoming significant challenges, particularly in reconciling diverse perspectives and mitigating biases that hinder agreement. Traditional methods relying on human facilitators are often constrained by scalability and efficiency, especially in large-scale, fast-paced discussions. To address these challenges, this study proposes a novel framework employing large language models (LLMs) as automated facilitators within a custom-built multi-user chat system. Leveraging cosine similarity as a core metric, this approach evaluates the ability of three state-of-the-art LLMs- ChatGPT 4.0, Mistral Large 2, and AI21 Jamba Instruct- to synthesize consensus proposals that align with participants' viewpoints. Unlike conventional techniques, the system integrates adaptive facilitation strategies, including clarifying misunderstandings, summarizing discussions, and proposing compromises, enabling the LLMs to iteratively refine consensus proposals based on user feedback. Experimental results demonstrate the superiority of ChatGPT 4.0, which achieves higher alignment with participant opinions, requiring fewer iterations to reach consensus compared to its counterparts. Moreover, analysis reveals the nuanced performance of the models across various sustainability-focused discussion topics, such as climate action, quality education, good health and well-being, and access to clean water and sanitation. These findings highlight the transformative potential of LLM-driven facilitation for improving collective decision-making processes and underscore the importance of advancing evaluation metrics and cross-cultural adaptability in future research.

Authors:Piero A. Bonatti, John Domingue, Anna Lisa Gentile, Andreas Harth, Olaf Hartig, Aidan Hogan, Katja Hose, Ernesto Jimenez-Ruiz, Deborah L. McGuinness, Chang Sun, Ruben Verborgh, Jesse Wright
Title: Towards Computer-Using Personal Agents
Abstract:
Computer-Using Agents (CUA) enable users to automate increasingly-complex tasks using graphical interfaces such as browsers. As many potential tasks require personal data, we propose Computer-Using Personal Agents (CUPAs) that have access to an external repository of the user's personal data. Compared with CUAs, CUPAs offer users better control of their personal data, the potential to automate more tasks involving personal data, better interoperability with external sources of data, and better capabilities to coordinate with other CUPAs in order to solve collaborative tasks involving the personal data of multiple users.

Authors:Scott T Steinmetz, Asmeret Naugle, Paul Schutte, Matt Sweitzer, Alex Washburne, Lisa Linville, Daniel Krofcheck, Michal Kucer, Samuel Myren
Title: The Trust Calibration Maturity Model for Characterizing and Communicating Trustworthiness of AI Systems
Abstract:
Recent proliferation of powerful AI systems has created a strong need for capabilities that help users to calibrate trust in those systems. As AI systems grow in scale, information required to evaluate their trustworthiness becomes less accessible, presenting a growing risk of using these systems inappropriately. We propose the Trust Calibration Maturity Model (TCMM) to characterize and communicate information about AI system trustworthiness. The TCMM incorporates five dimensions of analytic maturity: Performance Characterization, Bias & Robustness Quantification, Transparency, Safety & Security, and Usability. The TCMM can be presented along with system performance information to (1) help a user to appropriately calibrate trust, (2) establish requirements and track progress, and (3) identify research needs. Here, we discuss the TCMM and demonstrate it on two target tasks: using ChatGPT for high consequence nuclear science determinations, and using PhaseNet (an ensemble of seismic models) for categorizing sources of seismic events.

Authors:Mina Ghobrial, Philippe Seitier, Pierre Lagarrigue, Michel Galaup, Patrick Gilles
Title: Effectiveness of machining equipment user guides: A comparative study of augmented reality and traditional media
Abstract:
In the rapidly evolving landscape of manufacturing and material forming, innovative strategies are imperative for maintaining a competitive edge. Augmented Reality (AR) has emerged as a groundbreaking technology, offering new dimensions in how information is displayed and interacted with. It holds particular promise in the panel of instructional guides for complex machinery, potentially enhance traditional methods of knowledge transfer and operator training. Material forming, a key discipline within mechanical engineering, requires high-precision and skill, making it an ideal candidate for the integration of advanced instructional technologies like AR. This study aims to explore the efficiency of three distinct types of user manuals-video, paper, and augmented reality (AR)-on performance and acceptability in a material forming workshop environment. The focus will be on how AR can be specifically applied to improve task execution and understanding in material forming operations. Participants are mechanical engineering students specializing in material forming. They will engage in a series of standardized tasks related to machining processes. Performance will be gauged by metrics like task completion time and error rates, while task load will be assessed via the NASA Task Load Index (NASA-TLX) [1]. Acceptability of each manual type will be evaluated using the System Usability Scale (SUS) [2]. By comparing these various instructional formats, this research seeks to shed light on the most effective mediums for enhancing both operator performance and experience.

Authors:Keon Ju M. Lee, Philippe Pasquier, Jun Yuri
Title: Revival: Collaborative Artistic Creation through Human-AI Interactions in Musical Creativity
Abstract:
Revival is an innovative live audiovisual performance and music improvisation by our artist collective K-Phi-A, blending human and AI musicianship to create electronic music with audio-reactive visuals. The performance features real-time co-creative improvisation between a percussionist, an electronic music artist, and AI musical agents. Trained in works by deceased composers and the collective's compositions, these agents dynamically respond to human input and emulate complex musical styles. An AI-driven visual synthesizer, guided by a human VJ, produces visuals that evolve with the musical landscape. Revival showcases the potential of AI and human collaboration in improvisational artistic creation.

Authors:Prudhvi Naayini, Praveen Kumar Myakala, Chiranjeevi Bura, Anil Kumar Jonnalagadda, Srikanth Kamatala
Title: AI-Powered Assistive Technologies for Visual Impairment
Abstract:
Artificial Intelligence (AI) is revolutionizing assistive technologies. It offers innovative solutions to enhance the quality of life for individuals with visual impairments. This review examines the development, applications, and impact of AI-powered tools in key domains, such as computer vision, natural language processing (NLP), and wearable devices. Specific advancements include object recognition for identifying everyday items, scene description for understanding surroundings, and NLP-driven text-to-speech systems for accessing digital information. Assistive technologies like smart glasses, smartphone applications, and AI-enabled navigation aids are discussed, demonstrating their ability to support independent travel, facilitate social interaction, and increase access to education and employment opportunities. The integration of deep learning models, multimodal interfaces, and real-time data processing has transformed the functionality and usability of these tools, fostering inclusivity and empowerment. This article also addresses critical challenges, including ethical considerations, affordability, and adaptability in diverse environments. Future directions highlight the need for interdisciplinary collaboration to refine these technologies, ensuring equitable access and sustainable innovation. By providing a comprehensive overview, this review underscores AI's transformative potential in promoting independence, enhancing accessibility, and fostering social inclusion for visually impaired individuals.

Authors:Benedikt Holm, Arnar Óskarsson, Björn Elvar Þorleifsson, Hörður Þór Hafsteinsson, Sigríður Sigurðardóttir, Heiður Grétarsdóttir, Kenan Hoelke, Gabriel Marc Marie Jouan, Thomas Penzel, Erna Sif Arnardottir, María Óskarsdóttir
Title: World of ScoreCraft: Novel Multi Scorer Experiment on the Impact of a Decision Support System in Sleep Staging
Abstract:
Manual scoring of polysomnography (PSG) is a time intensive task, prone to inter scorer variability that can impact diagnostic reliability. This study investigates the integration of decision support systems (DSS) into PSG scoring workflows, focusing on their effects on accuracy, scoring time, and potential biases toward recommendations from artificial intelligence (AI) compared to human generated recommendations. Using a novel online scoring platform, we conducted a repeated measures study with sleep technologists, who scored traditional and self applied PSGs. Participants were occasionally presented with recommendations labeled as either human or AI generated. We found that traditional PSGs tended to be scored slightly more accurately than self applied PSGs, but this difference was not statistically significant. Correct recommendations significantly improved scoring accuracy for both PSG types, while incorrect recommendations reduced accuracy. No significant bias was observed toward or against AI generated recommendations compared to human generated recommendations. These findings highlight the potential of AI to enhance PSG scoring reliability. However, ensuring the accuracy of AI outputs is critical to maximizing its benefits. Future research should explore the long term impacts of DSS on scoring workflows and strategies for integrating AI in clinical practice.

Authors:Cristina Fiani, Pejman Saeghe, Mark McGill, Mohamed Khamis
Title: Exploring the Perspectives of Social VR-Aware Non-Parent Adults and Parents on Children's Use of Social Virtual Reality
Abstract:
Social Virtual Reality (VR), where people meet in virtual spaces via 3D avatars, is used by children and adults alike. Children experience new forms of harassment in social VR where it is often inaccessible to parental oversight. To date, there is limited understanding of how parents and non-parent adults within the child social VR ecosystem perceive the appropriateness of social VR for different age groups and the measures in place to safeguard children. We present results of a mixed-methods questionnaire (N=149 adults, including 79 parents) focusing on encounters with children in social VR and perspectives towards children's use of social VR. We draw novel insights on the frequency of social VR use by children under 13 and current use of, and future aspirations for, child protection interventions. Compared to non-parent adults, parents familiar with social VR propose lower minimum ages and are more likely to allow social VR without supervision. Adult users experience immaturity from children in social VR, while children face abuse, encounter age-inappropriate behaviours and self-disclose to adults. We present directions to enhance the safety of social VR through pre-planned controls, real-time oversight, post-event insight and the need for evidence-based guidelines to support parents and platforms around age-appropriate interventions.

Authors:Lillian Maria Eagan, Jacob Young, Jesse Bering, Tobias Langlotz
Title: Virtual Voyages: Evaluating the Role of Real-Time and Narrated Virtual Tours in Shaping User Experience and Memories
Abstract:
Immersive technologies are capable of transporting people to distant or inaccessible environments that they might not otherwise visit. Practitioners and researchers alike are discovering new ways to replicate and enhance existing tourism experiences using virtual reality, yet few controlled experiments have studied how users perceive virtual tours of real-world locations. In this paper we present an initial exploration of a new system for virtual tourism, measuring the effects of real-time experiences and storytelling on presence, place attachment, and user memories of the destination. Our results suggest that narrative plays an important role in inducing presence within and attachment to the destination, while livestreaming can further increase place attachment while providing flexible, tailored experiences. We discuss the design and evaluation of our system, including feedback from our tourism partners, and provide insights into current limitations and further opportunities for virtual tourism.

Authors:Wasura D. Wattearachchi, Erandi Lakshika, Kathryn Kasmarik, Michael Barlow
Title: A Study on Human-Swarm Interaction: A Framework for Assessing Situation Awareness and Task Performance
Abstract:
This paper introduces a framework for human swarm interaction studies that measures situation awareness in dynamic environments. A tablet-based interface was developed for a user study by implementing the concepts introduced in the framework, where operators guided a robotic swarm in a single-target search task, marking hazardous cells unknown to the swarm. Both subjective and objective situation awareness measures were used, with task performance evaluated based on how close the robots were to the target. The framework enabled a structured investigation of the role of situation awareness in human swarm interaction, leading to key findings such as improved task performance across attempts, showing the interface was learnable, centroid active robot position proved to be a useful task performance metric for assessing situation awareness, perception and projection played a key role in task performance, highlighting their importance in interface design and objective situation awareness influenced both subjective situation awareness and task performance, emphasizing the need for interfaces that emphasise objective situation awareness. These findings validate our framework as a structured approach for integrating situation awareness concepts into human swarm interaction studies, offering a systematic way to assess situation awareness and task performance. The framework can be applied to other swarming studies to evaluate interface learnability, identify meaningful task performance metrics, and refine interface designs to enhance situation awareness, ultimately improving human swarm interaction in dynamic environments.

Authors:Tiago Vasconcelos Afonso, Florian Heinrichs
Title: Consumer-grade EEG-based Eye Tracking
Abstract:
Electroencephalography-based eye tracking (EEG-ET) leverages eye movement artifacts in EEG signals as an alternative to camera-based tracking. While EEG-ET offers advantages such as robustness in low-light conditions and better integration with brain-computer interfaces, its development lags behind traditional methods, particularly in consumer-grade settings. To support research in this area, we present a dataset comprising simultaneous EEG and eye-tracking recordings from 113 participants across 116 sessions, amounting to 11 hours and 45 minutes of recordings. Data was collected using a consumer-grade EEG headset and webcam-based eye tracking, capturing eye movements under four experimental paradigms with varying complexity. The dataset enables the evaluation of EEG-ET methods across different gaze conditions and serves as a benchmark for assessing feasibility with affordable hardware. Data preprocessing includes handling of missing values and filtering to enhance usability. In addition to the dataset, code for data preprocessing and analysis is available to support reproducibility and further research.

Authors:Jing Li, Pinhao Wang, Emilia Barakova, Jun Hu
Title: Petting Pen for Stress Awareness and Management in Children
Abstract:
We found that children in elementary school often experience stress during task performance. Limited coping skills and lack of stress awareness restrict children's ability to manage their stress. Many designs and studies have proposed different stress detection and intervention solutions. Still, they often overlook the potential of enhancing everyday objects and actively sensing stress-related behavioral data during human-product interaction. Therefore, we propose Petting pen as an interactive robotic object for children to manage their stress during task performance. It detects and validates stress and further intervenes in stress during a process of natural writing and relaxation interactions. The design is an iteration based on our previous research results of a stress-aware pen, enhanced with tactile needs, robotic interaction, and integration of behavioral and bio-sensing capabilities. Petting pen is supposed to bridge the gap between robots and everyday objects in mental health applications for children.

Authors:Zejia Zhang, Bo Yang, Xinxing Chen, Weizhuang Shi, Haoyuan Wang, Wei Luo, Jian Huang
Title: MindEye-OmniAssist: A Gaze-Driven LLM-Enhanced Assistive Robot System for Implicit Intention Recognition and Task Execution
Abstract:
A promising effective human-robot interaction in assistive robotic systems is gaze-based control. However, current gaze-based assistive systems mainly help users with basic grasping actions, offering limited support. Moreover, the restricted intent recognition capability constrains the assistive system's ability to provide diverse assistance functions. In this paper, we propose an open implicit intention recognition framework powered by Large Language Model (LLM) and Vision Foundation Model (VFM), which can process gaze input and recognize user intents that are not confined to predefined or specific scenarios. Furthermore, we implement a gaze-driven LLM-enhanced assistive robot system (MindEye-OmniAssist) that recognizes user's intentions through gaze and assists in completing task. To achieve this, the system utilizes open vocabulary object detector, intention recognition network and LLM to infer their full intentions. By integrating eye movement feedback and LLM, it generates action sequences to assist the user in completing tasks. Real-world experiments have been conducted for assistive tasks, and the system achieved an overall success rate of 41/55 across various undefined tasks. Preliminary results show that the proposed method holds the potential to provide a more user-friendly human-computer interaction interface and significantly enhance the versatility and effectiveness of assistive systems by supporting more complex and diverse task.

Authors:Matt Gottsacker, Mengyu Chen, David Saffo, Feiyu Lu, Benjamin Lee, Blair MacIntyre
Title: Examining the Effects of Immersive and Non-Immersive Presenter Modalities on Engagement and Social Interaction in Co-located Augmented Presentations
Abstract:
Head-worn augmented reality (AR) allows audiences to be immersed and engaged in stories told by live presenters. While presenters may also be in AR to have the same level of immersion and awareness as their audience, this symmetric presentation style may diminish important social cues such as eye contact. In this work, we examine the effects this (a)symmetry has on engagement, group awareness, and social interaction in co-located one-on-one augmented presentations. We developed a presentation system incorporating 2D/3D content that audiences can view and interact with in AR, with presenters controlling and delivering the presentation in either a symmetric style in AR, or an asymmetric style with a handheld tablet. We conducted a within- and between-subjects evaluation with 12 participant pairs to examine the differences between these symmetric and asymmetric presentation modalities. From our findings, we extracted four themes and derived strategies and guidelines for designers interested in augmented presentations.

Authors:Sebin Lee, Yeonho Cho, Jungjin Lee
Title: Concert Interaction Translation: Augmenting VR Live Concert Experience using Chat-Driven Artificial Collective Reactions
Abstract:
Computer-mediated concerts can be enjoyed on various devices, from desktop and mobile to VR devices, often supporting multiple devices simultaneously. However, due to the limited accessibility of VR devices, relatively small audience members tend to congregate in VR venues, resulting in diminished unique social experiences. To address this gap and enrich VR concert experiences, we present a novel approach that leverages non-VR user interaction data, specifically chat from audiences watching the same content on a live-streaming platform. Based on an analysis of audience reactions in offline concerts, we designed and prototyped a concert interaction translation system that extracts the level of engagement and emotions from chats and translates them to collective movements, cheers, and singalongs of virtual audience avatars in a VR venue. Our user study (n=48) demonstrates that our system, which combines both movement and audio reactions, significantly enhances the sense of immersion and co-presence than the previous method.

Authors:Tinghui Li, Pamuditha Somarathne, Zhanna Sarsenbayeva, Anusha Withana
Title: TA-GNN: Physics Inspired Time-Agnostic Graph Neural Network for Finger Motion Prediction
Abstract:
Continuous prediction of finger joint movement using historical joint positions/rotations is vital in a multitude of applications, especially related to virtual reality, computer graphics, robotics, and rehabilitation. However, finger motions are highly articulated with multiple degrees of freedom, making them significantly harder to model and predict. To address this challenge, we propose a physics-inspired time-agnostic graph neural network (TA-GNN) to accurately predict human finger motions. The proposed encoder comprises a kinematic feature extractor to generate filtered velocity and acceleration and a physics-based encoder that follows linear kinematics. The model is designed to be prediction-time-agnostic so that it can seamlessly provide continuous predictions. The graph-based decoder for learning the topological motion between finger joints is designed to address the higher degree articulation of fingers. We show the superiority of our model performance in virtual reality context. This novel approach enhances finger tracking without additional sensors, enabling predictive interactions such as haptic re-targeting and improving predictive rendering quality.

Authors:Silvia Cazacu, Georgia Panagiotidou, Therese Steenberghen, Andrew Vande Moere
Title: Disentangling the Power Dynamics in Participatory Data Physicalisation
Abstract:
Participatory data physicalisation (PDP) is recognised for its potential to support data-driven decisions among stakeholders who collaboratively construct physical elements into commonly insightful visualisations. Like all participatory processes, PDP is however influenced by underlying power dynamics that might lead to issues regarding extractive participation, marginalisation, or exclusion, among others. We first identified the decisions behind these power dynamics by developing an ontology that synthesises critical theoretical insights from both visualisation and participatory design research, which were then systematically applied unto a representative corpus of 23 PDP artefacts. By revealing how shared decisions are guided by different agendas, this paper presents three contributions: 1) a cross-disciplinary ontology that facilitates the systematic analysis of existing and novel PDP artefacts and processes; which leads to 2) six PDP agendas that reflect the key power dynamics in current PDP practice, revealing the diversity of orientations towards stakeholder participation in PDP practice; and 3) a set of critical considerations that should guide how power dynamics can be balanced, such as by reflecting on how issues are represented, data is contextualised, participants express their meanings, and how participants can dissent with flexible artefact construction. Consequently, this study advances a feminist research agenda by guiding researchers and practitioners in openly reflecting on and sharing responsibilities in data physicalisation and participatory data visualisation.

Authors:Shiva Sinaei, Daisuke Iwai, Kousuke Sato
Title: Artificial Blur Effect for Optical See-through Near-Eye Displays
Abstract:
Saliency modulation has significant potential for various applications. In our pursuit of implementing saliency modulation for optical see-through near-eye displays, we decided to introduce a blur effect to reduce the sharpness of specific areas while preserving the sharpness of others. In this study, we used a digital micromirror device (DMD) to separate the incoming light from a scene into sharp and blurred areas. To achieve this, we integrated an electrically tunable lens (ETL), which operates in its zero optical power mode when the reflected light from the DMD represents the sharp area (i.e., the blur area is masked). Conversely, when the reflected light indicates the blur area, the ETL adjusts to non-zero optical powers. Importantly, these modulations occur at a speed that surpasses the critical flicker frequency threshold of the human eye. Furthermore, we proposed an algorithm to mitigate the artifacts around the border area between the sharp and blur areas that are caused by the magnification of the ETL. We have also developed a prototype system to demonstrate the feasibility of our method.

Authors:Bhawana Chhaglani, Alan Seefeldt
Title: NeckCheck: Predicting Neck Strain using Head Tracker Sensors
Abstract:
Tech neck, a growing musculoskeletal concern caused by prolonged poor posture during device use, has significant health implications. This study investigates the relationship between head posture and muscular activity in the upper trapezius muscle to predict muscle strain by leveraging data from EMG sensors and head trackers. We train a regression model to predict EMG envelope readings using head movement data. We conduct preliminary experiments involving various postures to explore the correlation between these modalities and assess the feasibility of predicting muscle strain using head worn sensors. We discuss the key research challenges in sensing and predicting muscle fatigue. The results highlight the potential of this approach in real-time ergonomic feedback systems, contributing to the prevention and management of tech neck.

Authors:Ran Zhou, Jianru Ding, Chenfeng Gao, Wanli Qian, Benjamin Erickson, Madeline Balaam, Daniel Leithinger, Ken Nakagaki
Title: Shape-Kit: A Design Toolkit for Crafting On-Body Expressive Haptics
Abstract:
Driven by the vision of everyday haptics, the HCI community is advocating for "design touch first" and investigating "how to touch well." However, a gap remains between the exploratory nature of haptic design and technical reproducibility. We present Shape-Kit, a hybrid design toolkit embodying our "crafting haptics" metaphor, where hand touch is transduced into dynamic pin-based sensations that can be freely explored across the body. An ad-hoc tracking module captures and digitizes these patterns. Our study with 14 designers and artists demonstrates how Shape-Kit facilitates sensorial exploration for expressive haptic design. We analyze how designers collaboratively ideate, prototype, iterate, and compose touch experiences and show the subtlety and richness of touch that can be achieved through diverse crafting methods with Shape-Kit. Reflecting on the findings, our work contributes key insights into haptic toolkit design and touch design practices centered on the "crafting haptics" metaphor. We discuss in-depth how Shape-Kit's simplicity, though remaining constrained, enables focused crafting for deeper exploration, while its collaborative nature fosters shared sense-making of touch experiences.

Authors:Ritik Batra, Narjes Pourjafarian, Samantha Chang, Margaret Tsai, Jacob Revelo, Cindy Hsin-Liu Kao
Title: texTENG: Fabricating Wearable Textile-Based Triboelectric Nanogenerators
Abstract:
Recently, there has been a surge of interest in sustainable energy sources, particularly for wearable computing. Triboelectric nanogenerators (TENGs) have shown promise in converting human motion into electric power. Textile-based TENGs, valued for their flexibility and breathability, offer an ideal form factor for wearables. However, uptake in maker communities has been slow due to commercially unavailable materials, complex fabrication processes, and structures incompatible with human motion. This paper introduces texTENG, a textile-based framework simplifying the fabrication of power harvesting and self-powered sensing applications. By leveraging accessible materials and familiar tools, texTENG bridges the gap between advanced TENG research and wearable applications. We explore a design menu for creating multidimensional TENG structures using braiding, weaving, and knitting. Technical evaluations and example applications highlight the performance and feasibility of these designs, offering DIY-friendly pathways for fabricating textile-based TENGs and promoting sustainable prototyping practices within the HCI and maker communities.

Authors:Haocheng Ren, Muzhe Wu, Gregory Croisdale, Anhong Guo, Xu Wang
Title: Rubikon: Intelligent Tutoring for Rubik's Cube Learning Through AR-enabled Physical Task Reconfiguration
Abstract:
Learning to solve a Rubik's Cube requires the learners to repeatedly practice a skill component, e.g., identifying a misplaced square and putting it back. However, for 3D physical tasks such as this, generating sufficient repeated practice opportunities for learners can be challenging, in part because it is difficult for novices to reconfigure the physical object to specific states. We propose Rubikon, an intelligent tutoring system for learning to solve the Rubik's Cube. Rubikon reduces the necessity for repeated manual configurations of the Rubik's Cube without compromising the tactile experience of handling a physical cube. The foundational design of Rubikon is an AR setup, where learners manipulate a physical cube while seeing an AR-rendered cube on a display. Rubikon automatically generates configurations of the Rubik's Cube to target learners' weaknesses and help them exercise diverse knowledge components. In a between-subjects experiment, we showed that Rubikon learners scored 25% higher on a post-test compared to baselines.

Authors:Azhar Ali Khaked, Nobuyuki Oishi, Daniel Roggen, Paula Lago
Title: In Shift and In Variance: Assessing the Robustness of HAR Deep Learning Models against Variability
Abstract:
Human Activity Recognition (HAR) using wearable inertial measurement unit (IMU) sensors can revolutionize healthcare by enabling continual health monitoring, disease prediction, and routine recognition. Despite the high accuracy of Deep Learning (DL) HAR models, their robustness to real-world variabilities remains untested, as they have primarily been trained and tested on limited lab-confined data. In this study, we isolate subject, device, position, and orientation variability to determine their effect on DL HAR models and assess the robustness of these models in real-world conditions. We evaluated the DL HAR models using the HARVAR and REALDISP datasets, providing a comprehensive discussion on the impact of variability on data distribution shifts and changes in model performance. Our experiments measured shifts in data distribution using Maximum Mean Discrepancy (MMD) and observed DL model performance drops due to variability. We concur that studied variabilities affect DL HAR models differently, and there is an inverse relationship between data distribution shifts and model performance. The compounding effect of variability was analyzed, and the implications of variabilities in real-world scenarios were highlighted. MMD proved an effective metric for calculating data distribution shifts and explained the drop in performance due to variabilities in HARVAR and REALDISP datasets. Combining our understanding of variability with evaluating its effects will facilitate the development of more robust DL HAR models and optimal training techniques. Allowing Future models to not only be assessed based on their maximum F1 score but also on their ability to generalize effectively

Authors:Yağmur Kocaman, Taylan U. Bulut, Oğuzhan Özcan
Title: Mobile Food Printing in Professional Kitchens: An inquiry of potential use cases with novice chefs
Abstract:
The knowledge transfer from 3D printing technology paved the way for unlocking the innovative potential of 3D Food Printing (3DFP) technology. However, this technology-oriented approach neglects userderived issues that could be addressed with advancements in 3DFP technology. To explore potential new features and application areas for 3DFP technology, we created the Mobile Food Printer (MFP) prototype. We collected insights from novice chefs for MFP in the restaurant context through four online focus group sessions (N=12). Our results revealed how MFP can be applied in the current kitchen routines (preparation, serving, and eating) and introduce novel dining experiences. We discuss our learnings under two themes: 1) dealing with the kitchen rush and 2) streamlining workflows in the kitchen. The opportunities we present in this study act as a starting point for HCI and HFI researchers and encourage them to implement mobility in 3DFP with a useroriented lens. We further provide a ground for future research to uncover potentials for advancing 3DFP technology.

Authors:Jie Li, Anusha Withana, Alexandra Diening, Kai Kunze, Masahiko Inami
Title: Beyond Human: Cognitive and Physical Augmentation through AI, Robotics, and XR -- Opportunities and Risks
Abstract:
As human augmentation technologies evolve, the convergence of AI, robotics, and extended reality (XR) is redefining human potential -- enhancing cognition, perception, and physical abilities. However, these advancements also introduce ethical dilemmas, security risks, and concerns over loss of control. This workshop explores both the transformative potential and the unintended consequences of augmentation technologies. Bringing together experts from HCI, neuroscience, robotics, and ethics, we will examine real-world applications, emerging risks, and governance strategies for responsible augmentation. The session will feature keynote talks and interactive discussions, addressing topics such as AI-enhanced cognition, wearable robotics, neural interfaces, and XR-driven augmentation. By fostering multidisciplinary dialogue, this workshop aims to generate actionable insights for responsible innovation, proposing ethical frameworks to balance human empowerment with risk mitigation. We invite researchers, practitioners, and industry leaders to contribute their perspectives and help shape the future of human augmentation.

Authors:Sean Dallas, Hongjiao Qiang, Motaz AbuHijleh, Wonse Jo, Kayla Riegner, Jon Smereka, Lionel Robert, Wing-Yue Louie, Dawn M. Tilbury
Title: Training Human-Robot Teams by Improving Transparency Through a Virtual Spectator Interface
Abstract:
After-action reviews (AARs) are professional discussions that help operators and teams enhance their task performance by analyzing completed missions with peers and professionals. Previous studies that compared different formats of AARs have mainly focused on human teams. However, the inclusion of robotic teammates brings along new challenges in understanding teammate intent and communication. Traditional AAR between human teammates may not be satisfactory for human-robot teams. To address this limitation, we propose a new training review (TR) tool, called the Virtual Spectator Interface (VSI), to enhance human-robot team performance and situational awareness (SA) in a simulated search mission. The proposed VSI primarily utilizes visual feedback to review subjects' behavior. To examine the effectiveness of VSI, we took elements from AAR to conduct our own TR, designed a 1 x 3 between-subjects experiment with experimental conditions: TR with (1) VSI, (2) screen recording, and (3) non-technology (only verbal descriptions). The results of our experiments demonstrated that the VSI did not result in significantly better team performance than other conditions. However, the TR with VSI led to more improvement in the subjects SA over the other conditions.

Authors:Yihao Zhou, Tanusree Sharma
Title: Honey Trap or Romantic Utopia: A Case Study of Final Fantasy XIV Players PII Disclosure in Intimate Partner-Seeking Posts
Abstract:
Massively multiplayer online games (MMOGs) can foster social interaction and relationship formation, but they pose specific privacy and safety challenges, especially in the context of mediating intimate interpersonal connections. To explore the potential risks, we conducted a case study on Final Fantasy XIV (FFXIV) players intimate partner seeking posts on social media. We analyzed 1,288 posts from a public Weibo account using Latent Dirichlet Allocation (LDA) topic modeling and thematic analysis. Our findings reveal that players disclose sensitive personal information and share vulnerabilities to establish trust but face difficulties in managing identity and privacy across multiple platforms. We also found that players expectations regarding intimate partner are diversified, and mismatch of expectations may leads to issues like privacy leakage or emotional exploitation. Based on our findings, we propose design implications for reducing privacy and safety risks and fostering healthier social interactions in virtual worlds.

Authors:Rushiraj Gadhvi, Soham Petkar, Priyansh Desai, Shreyas Ramachandran, Siddharth Siddharth
Title: AdaptAI: A Personalized Solution to Sense Your Stress, Fix Your Mess, and Boost Productivity
Abstract:
Personalization is a critical yet often overlooked factor in boosting productivity and wellbeing in knowledge-intensive workplaces to better address individual preferences. Existing tools typically offer uniform guidance whether auto-generating email responses or prompting break reminders without accounting for individual behavioral patterns or stress triggers. We introduce AdaptAI, a multimodal AI solution combining egocentric vision and audio, heart and motion activities, and the agentic workflow of Large Language Models LLMs to deliver highly personalized productivity support and context-aware well-being interventions. AdaptAI not only automates peripheral tasks (e.g. drafting succinct document summaries, replying to emails etc.) but also continuously monitors the users unique physiological and situational indicators to dynamically tailor interventions such as micro-break suggestions or exercise prompts, at the exact point of need. In a preliminary study with 15 participants, AdaptAI demonstrated significant improvements in task throughput and user satisfaction by anticipating user stressors and streamlining daily workflows.

Authors:Jiaying Fu, Xiruo Wang, Zhouyi Li, Kate Vi, Chuyan Xu, Yuqian Sun
Title: "I Like Your Story!": A Co-Creative Story-Crafting Game with a Persona-Driven Character Based on Generative AI
Abstract:
While generative AI is advancing writing support tools, creative writing is often seen as the exclusive domain of skilled writers. This paper introduces "1001 Nights", a co-creative story-crafting game that transforms writing into a playful and rewarding activity. In this game, the AI agent takes on the role of a "moody" king with distinct storytelling preferences, not merely assisting but actively influencing the narrative. Players engage with the king agent through strategic storytelling, guiding him to mention weapon-related keywords, which materialize as battle equipment. The king agent provides dynamic feedback, expressing satisfaction or displeasure, prompting players to adjust their approach. By combining storytelling, game mechanics, and AI-driven responses, our system motivates creativity through playful constraints. Inspired by Oulipo's literary techniques, this approach demonstrates how AI-powered game experiences can make creative writing more accessible and engaging, encouraging players to explore their creative potential.

Authors:Shankar Gangisetty, Abdul Wasi, Shyam Nandan Rai, C. V. Jawahar, Sajay Raj, Manish Prajapati, Ayesha Choudhary, Aaryadev Chandra, Dev Chandan, Shireen Chand, Suvaditya Mukherjee
Title: ICPR 2024 Competition on Rider Intention Prediction
Abstract:
The recent surge in the vehicle market has led to an alarming increase in road accidents. This underscores the critical importance of enhancing road safety measures, particularly for vulnerable road users like motorcyclists. Hence, we introduce the rider intention prediction (RIP) competition that aims to address challenges in rider safety by proactively predicting maneuvers before they occur, thereby strengthening rider safety. This capability enables the riders to react to the potential incorrect maneuvers flagged by advanced driver assistance systems (ADAS). We collect a new dataset, namely, rider action anticipation dataset (RAAD) for the competition consisting of two tasks: single-view RIP and multi-view RIP. The dataset incorporates a spectrum of traffic conditions and challenging navigational maneuvers on roads with varying lighting conditions. For the competition, we received seventy-five registrations and five team submissions for inference of which we compared the methods of the top three performing teams on both the RIP tasks: one state-space model (Mamba2) and two learning-based approaches (SVM and CNN-LSTM). The results indicate that the state-space model outperformed the other methods across the entire dataset, providing a balanced performance across maneuver classes. The SVM-based RIP method showed the second-best performance when using random sampling and SMOTE. However, the CNN-LSTM method underperformed, primarily due to class imbalance issues, particularly struggling with minority classes. This paper details the proposed RAAD dataset and provides a summary of the submissions for the RIP 2024 competition.

Authors:Yiheng Yu, Sheng Liu, Yuan Feng, Min Xu, Zhelun Jin, Xuhua Yang
Title: OLMD: Orientation-aware Long-term Motion Decoupling for Continuous Sign Language Recognition
Abstract:
The primary challenge in continuous sign language recognition (CSLR) mainly stems from the presence of multi-orientational and long-term motions. However, current research overlooks these crucial aspects, significantly impacting accuracy. To tackle these issues, we propose a novel CSLR framework: Orientation-aware Long-term Motion Decoupling (OLMD), which efficiently aggregates long-term motions and decouples multi-orientational signals into easily interpretable components. Specifically, our innovative Long-term Motion Aggregation (LMA) module filters out static redundancy while adaptively capturing abundant features of long-term motions. We further enhance orientation awareness by decoupling complex movements into horizontal and vertical components, allowing for motion purification in both orientations. Additionally, two coupling mechanisms are proposed: stage and cross-stage coupling, which together enrich multi-scale features and improve the generalization capabilities of the model. Experimentally, OLMD shows SOTA performance on three large-scale datasets: PHOENIX14, PHOENIX14-T, and CSL-Daily. Notably, we improved the word error rate (WER) on PHOENIX14 by an absolute 1.6% compared to the previous SOTA

Authors:Maxim Lisnic, Vidya Setlur, Nicole Sultanum
Title: Plume: Scaffolding Text Composition in Dashboards
Abstract:
Text in dashboards plays multiple critical roles, including providing context, offering insights, guiding interactions, and summarizing key information. Despite its importance, most dashboarding tools focus on visualizations and offer limited support for text authoring. To address this gap, we developed Plume, a system to help authors craft effective dashboard text. Through a formative review of exemplar dashboards, we created a typology of text parameters and articulated the relationship between visual placement and semantic connections, which informed Plume's design. Plume employs large language models (LLMs) to generate contextually appropriate content and provides guidelines for writing clear, readable text. A preliminary evaluation with 12 dashboard authors explored how assisted text authoring integrates into workflows, revealing strengths and limitations of LLM-generated text and the value of our human-in-the-loop approach. Our findings suggest opportunities to improve dashboard authoring tools by better supporting the diverse roles that text plays in conveying insights.

Authors:Wenqi Li, Zhenyi Tang, Pengyi Zhang, Jun Wang
Title: Collaborative Data Behaviors in Digital Humanities Research Teams
Abstract:
The development of digital humanities necessitates scholars to adopt more data-intensive methods and engage in multidisciplinary collaborations. Understanding their collaborative data behaviors becomes essential for providing more curated data, tailored tools, and a collaborative research environment. This study explores how interdisciplinary researchers collaborate on data activities by conducting focus group interviews with 19 digital humanities research groups. Through inductive coding, the study identified seven primary and supportive data activities and found that different collaborative modes are adopted in various data activities. The collaborative modes include humanities-driven, technically-driven, and balanced, depending on how team members naturally adjusted their responsibilities based on their expertise. These findings establish a preliminary framework for examining collaborative data behavior and interdisciplinary collaboration in digital humanities.

Authors:Nimisha Karnatak, Adrien Baranes, Rob Marchant, Triona Butler, Kristen Olson
Title: ACAI for SBOs: AI Co-creation for Advertising and Inspiration for Small Business Owners
Abstract:
Small business owners (SBOs) often lack the resources and design experience needed to produce high-quality advertisements. To address this, we developed ACAI (AI Co-Creation for Advertising and Inspiration), an GenAI-powered multimodal advertisement creation tool, and conducted a user study with 16 SBOs in London to explore their perceptions of and interactions with ACAI in advertisement creation. Our findings reveal that structured inputs enhance user agency and control while improving AI outputs by facilitating better brand alignment, enhancing AI transparency, and offering scaffolding that assists novice designers, such as SBOs, in formulating prompts. We also found that ACAI's multimodal interface bridges the design skill gap for SBOs with a clear advertisement vision, but who lack the design jargon necessary for effective prompting. Building on our findings, we propose three capabilities: contextual intelligence, adaptive interactions, and data management, with corresponding design recommendations to advance the co-creative attributes of AI-mediated design tools.

Authors:Rajan Das Gupta, Md. Tanzib Hosain, M. F. Mridha, Salah Uddin Ahmed
Title: Multimodal Programming in Computer Science with Interactive Assistance Powered by Large Language Model
Abstract:
LLM chatbot interfaces allow students to get instant, interactive assistance with homework, but doing so carelessly may not advance educational objectives. In this study, an interactive homework help system based on DeepSeek R1 is developed and first implemented for students enrolled in a large computer science beginning programming course. In addition to an assist button in a well-known code editor, our assistant also has a feedback option in our command-line automatic evaluator. It wraps student work in a personalized prompt that advances our educational objectives without offering answers straight away. We have discovered that our assistant can recognize students' conceptual difficulties and provide ideas, plans, and template code in pedagogically appropriate ways. However, among other mistakes, it occasionally incorrectly labels the correct student code as incorrect or encourages students to use correct-but-lesson-inappropriate approaches, which can lead to long and frustrating journeys for the students. After discussing many development and deployment issues, we provide our conclusions and future actions.

Authors:Federico Mazzoni, Riccardo Guidotti, Alessio Malizia
Title: A Frank System for Co-Evolutionary Hybrid Decision-Making
Abstract:
We introduce Frank, a human-in-the-loop system for co-evolutionary hybrid decision-making aiding the user to label records from an un-labeled dataset. Frank employs incremental learning to ``evolve'' in parallel with the user's decisions, by training an interpretable machine learning model on the records labeled by the user. Furthermore, Frank advances state-of-the-art approaches by offering inconsistency controls, explanations, fairness checks, and bad-faith safeguards simultaneously. We evaluate our proposal by simulating the users' behavior with various levels of expertise and reliance on Frank's suggestions. The experiments show that Frank's intervention leads to improvements in the accuracy and the fairness of the decisions.

Authors:Ricardo E. Gonzalez Penuela, Ruiying Hu, Sharon Lin, Tanisha Shende, Shiri Azenkot
Title: Towards Understanding the Use of MLLM-Enabled Applications for Visual Interpretation by Blind and Low Vision People
Abstract:
Blind and Low Vision (BLV) people have adopted AI-powered visual interpretation applications to address their daily needs. While these applications have been helpful, prior work has found that users remain unsatisfied by their frequent errors. Recently, multimodal large language models (MLLMs) have been integrated into visual interpretation applications, and they show promise for more descriptive visual interpretations. However, it is still unknown how this advancement has changed people's use of these applications. To address this gap, we conducted a two-week diary study in which 20 BLV people used an MLLM-enabled visual interpretation application we developed, and we collected 553 entries. In this paper, we report a preliminary analysis of 60 diary entries from 6 participants. We found that participants considered the application's visual interpretations trustworthy (mean 3.75 out of 5) and satisfying (mean 4.15 out of 5). Moreover, participants trusted our application in high-stakes scenarios, such as receiving medical dosage advice. We discuss our plan to complete our analysis to inform the design of future MLLM-enabled visual interpretation systems.

Authors:Leon Pietschmann, Michel Schimpf, Zhu-Tian Chen, Hanspeter Pfister, Thomas Bohné
Title: Enhancing User Performance and Human Factors through Visual Guidance in AR Assembly Tasks
Abstract:
This study investigates the influence of Visual Guidance (VG) on user performance and human factors within Augmented Reality (AR) via a between-subjects experiment. VG is a crucial component in AR applications, serving as a bridge between digital information and real-world interactions. Unlike prior research, which often produced inconsistent outcomes, our study focuses on varying types of supportive visualisations rather than interaction methods. Our findings reveal a 31% reduction in task completion time, offset by a significant rise in errors, highlighting a compelling trade-off between speed and accuracy. Furthermore, we assess the detrimental effects of occlusion as part of our experimental design. In addition to examining other variables such as cognitive load, motivation, and usability, we identify specific directions and offer actionable insights for future research. Overall, our results underscore the promise of VG for enhancing user performance in AR, while emphasizing the importance of further investigating the underlying human factors.

Authors:Jinwook Kim, Sangmin Park, Qiushi Zhou, Mar Gonzalez-Franco, Jeongmi Lee, Ken Pfeuffer
Title: PinchCatcher: Enabling Multi-selection for Gaze+Pinch
Abstract:
This paper investigates multi-selection in XR interfaces based on eye and hand interaction. We propose enabling multi-selection using different variations of techniques that combine gaze with a semi-pinch gesture, allowing users to select multiple objects, while on the way to a full-pinch. While our exploration is based on the semi-pinch mode for activating a quasi-mode, we explore four methods for confirming subselections in multi-selection mode, varying in effort and complexity: dwell-time (SemiDwell), swipe (SemiSwipe), tilt (SemiTilt), and non-dominant hand input (SemiNDH), and compare them to a baseline technique. In the user study, we evaluate their effectiveness in reducing task completion time, errors, and effort. The results indicate the strengths and weaknesses of each technique, with SemiSwipe and SemiDwell as the most preferred methods by participants. We also demonstrate their utility in file managing and RTS gaming application scenarios. This study provides valuable insights to advance 3D input systems in XR.

Authors:Chase McDonald, Cleotilde Gonzalez
Title: Controllable Complementarity: Subjective Preferences in Human-AI Collaboration
Abstract:
Research on human-AI collaboration often prioritizes objective performance. However, understanding human subjective preferences is essential to improving human-AI complementarity and human experiences. We investigate human preferences for controllability in a shared workspace task with AI partners using Behavior Shaping (BS), a reinforcement learning algorithm that allows humans explicit control over AI behavior. In one experiment, we validate the robustness of BS in producing effective AI policies relative to self-play policies, when controls are hidden. In another experiment, we enable human control, showing that participants perceive AI partners as more effective and enjoyable when they can directly dictate AI behavior. Our findings highlight the need to design AI that prioritizes both task performance and subjective human preferences. By aligning AI behavior with human preferences, we demonstrate how human-AI complementarity can extend beyond objective outcomes to include subjective preferences.

Authors:Jiaying "Lizzy" Liu, Yiheng Su, Praneel Seth
Title: Can Large Language Models Grasp Concepts in Visual Content? A Case Study on YouTube Shorts about Depression
Abstract:
Large language models (LLMs) are increasingly used to assist computational social science research. While prior efforts have focused on text, the potential of leveraging multimodal LLMs (MLLMs) for online video studies remains underexplored. We conduct one of the first case studies on MLLM-assisted video content analysis, comparing AI's interpretations to human understanding of abstract concepts. We leverage LLaVA-1.6 Mistral 7B to interpret four abstract concepts regarding video-mediated self-disclosure, analyzing 725 keyframes from 142 depression-related YouTube short videos. We perform a qualitative analysis of MLLM's self-generated explanations and found that the degree of operationalization can influence MLLM's interpretations. Interestingly, greater detail does not necessarily increase human-AI alignment. We also identify other factors affecting AI alignment with human understanding, such as concept complexity and versatility of video genres. Our exploratory study highlights the need to customize prompts for specific concepts and calls for researchers to incorporate more human-centered evaluations when working with AI systems in a multimodal context.

Authors:Jesan Ahammed Ovi, Gabe Fierro, C. Estelle Smith
Title: Assessing Student Adoption of Generative Artificial Intelligence across Engineering Education from 2023 to 2024
Abstract:
Generative Artificial Intelligence (GenAI) tools and models have the potential to re-shape educational needs, norms, practices, and policies in all sectors of engineering education. Empirical data, rather than anecdata and assumptions, on how engineering students have adopted GenAI is essential to developing a foundational understanding of students' GenAI-related behaviors and needs during academic training. This data will also help formulate effective responses to GenAI by both academic institutions and industrial employers. We collected two representative survey samples at the Colorado School of Mines, a small engineering-focused R-1 university in the USA, in May 2023 ($n_1=601$) and September 2024 ($n_2=862$) to address research questions related to (RQ1) how GenAI has been adopted by engineering students, including motivational and demographic factors contributing to GenAI use, (RQ2) students' ethical concerns about GenAI, and (RQ3) students' perceived benefits v.s. harms for themselves, science, and society. Analysis revealed a statistically significant rise in GenAI adoption rates from 2023 to 2024. Students predominantly leverage GenAI tools to deepen understanding, enhance work quality, and stay informed about emerging technologies. Although most students assess their own usage of GenAI as ethical and beneficial, they nonetheless expressed significant concerns regarding GenAI and its impacts on society. We collected student estimates of ``P(doom)'' and discovered a bimodal distribution. Thus, we show that the student body at Mines is polarized with respect to future impacts of GenAI on the engineering workforce and society, despite being increasingly willing to explore GenAI over time. We discuss implications of these findings for future research and for integrating GenAI in engineering education.

Authors:Soya Park, J. D. Zamfirescu-Pereira, Chinmay Kulkarni
Title: Model Behavior Specification by Leveraging LLM Self-Playing and Self-Improving
Abstract:
Training AI models is challenging, particularly when crafting behavior instructions. Traditional methods rely on machines (supervised learning) or manual pattern discovery, which results in not interpretable models or time sink. While Large Language Models (LLMs) simplify instruction writing through natural language, articulating intended model behavior still remains difficult. We introduce Visionary Tuning, a human-in-the-loop self-playing followed by automatic self-refinement to improve behavior specification. Our system helps users clarify desired behavior through self-playing and generates prompts through self-improving, Our first evaluation involves user study conducted on a system implementation of Visionary Tuning within the context of chatbot behavior. Our system self-play itself by simulating user interactions to identify patterns and create effective prompts based on the pattern. In a within-subject study (N=12), participants pinpointed more patterns through self-playing and crafted better prompts. Surprisingly, users felt more or less success level in specifying the model behavior. Follow-up crowd studies (N=60) confirmed that the chatbot adhered to instructions without sacrificing quality. Our second evaluation is a case study on a real-world implementation using a movie rating dataset with Visionary Tuning, demonstrating its effectiveness and robustness in modeling a critic's preferences across the spectrum of low to highly rated movies. Together, these results suggest how AI improves the design process of interactive AI systems. Furthermore, they suggest how the benefits of these tools may be non-obvious to end-users. We reflect on these findings and suggest future directions.

Authors:Joongi Shin, Ankit Khatri, Michael A. Hedderich, Andrés Lucero, Antti Oulasvirta
Title: Facilitating Asynchronous Idea Generation and Selection with Chatbots
Abstract:
People can generate high-quality ideas by building on each other's ideas. By enabling individuals to contribute their ideas at their own comfortable time and method (i.e., asynchronous ideation), they can deeply engage in ideation and improve idea quality. However, running asynchronous ideation faces a practical constraint. Whereas trained human facilitators are needed to guide effective idea exchange, they cannot be continuously available to engage with individuals joining at varying hours. In this paper, we ask how chatbots can be designed to facilitate asynchronous ideation. For this, we adopted the guidelines found in the literature about human facilitators and designed two chatbots: one provides a structured ideation process, and another adapts the ideation process to individuals' ideation performance. We invited 48 participants to generate and select ideas by interacting with one of our chatbots and invited an expert facilitator to review our chatbots. We found that both chatbots can guide users to build on each other's ideas and converge them into a few satisfying ideas. However, we also found the chatbots' limitations in social interaction with collaborators, which only human facilitators can provide. Accordingly, we conclude that chatbots can be promising facilitators of asynchronous ideation, but hybrid facilitation with human facilitators would be needed to address the social aspects of collaborative ideation.

Authors:Mashrur Rashik, Shilpa Sweth, Nishtha Agrawal, Saiyyam Kochar, Kara M Smith, Fateme Rajabiyazdi, Vidya Setlur, Narges Mahyar, Ali Sarvghad
Title: AI-Enabled Conversational Journaling for Advancing Parkinson's Disease Symptom Tracking
Abstract:
Journaling plays a crucial role in managing chronic conditions by allowing patients to document symptoms and medication intake, providing essential data for long-term care. While valuable, traditional journaling methods often rely on static, self-directed entries, lacking interactive feedback and real-time guidance. This gap can result in incomplete or imprecise information, limiting its usefulness for effective treatment. To address this gap, we introduce PATRIKA, an AI-enabled prototype designed specifically for people with Parkinson's disease (PwPD). The system incorporates cooperative conversation principles, clinical interview simulations, and personalization to create a more effective and user-friendly journaling experience. Through two user studies with PwPD and iterative refinement of PATRIKA, we demonstrate conversational journaling's significant potential in patient engagement and collecting clinically valuable information. Our results showed that generating probing questions PATRIKA turned journaling into a bi-directional interaction. Additionally, we offer insights for designing journaling systems for healthcare and future directions for promoting sustained journaling.

Authors:Madhuka Thisuri De Silva, Jim Smiley, Sarah Goodwin, Leona M Holloway, Matthew Butler
Title: Sensing Movement: Contemporary Dance Workshops with People who are Blind or have Low Vision and Dance Teachers
Abstract:
Dance teachers rely primarily on verbal instructions and visual demonstrations to convey key dance concepts and movement. These techniques, however, have limitations in supporting students who are blind or have low vision (BLV). This work explores the role technology can play in supporting instruction for BLV students, as well as improvisation with their instructor. Through a series of design workshops with dance instructors and BLV students, ideas were generated by physically engaging with probes featuring diverse modalities including tactile objects, a body tracked sound and musical probe, and a body tracked controller with vibrational feedback. Implications for the design of supporting technologies were discovered for four contemporary dance learning goals: learning a phrase; improvising; collaborating through movement; and awareness of body and movement qualities. We discuss the potential of numerous multi-sensory methods and artefacts, and present design considerations for technologies to support meaningful dance instruction and participation.

Authors:Wei-Hao Chen, Weixi Tong, Amanda Case, Tianyi Zhang
Title: Dango: A Mixed-Initiative Data Wrangling System using Large Language Model
Abstract:
Data wrangling is a time-consuming and challenging task in a data science pipeline. While many tools have been proposed to automate or facilitate data wrangling, they often misinterpret user intent, especially in complex tasks. We propose Dango, a mixed-initiative multi-agent system for data wrangling. Compared to existing tools, Dango enhances user communication of intent by allowing users to demonstrate on multiple tables and use natural language prompts in a conversation interface, enabling users to clarify their intent by answering LLM-posed multiple-choice clarification questions, and providing multiple forms of feedback such as step-by-step natural language explanations and data provenance to help users evaluate the data wrangling scripts. We conducted a within-subjects user study with 38 participants and demonstrated that Dango's features can significantly improve intent clarification, accuracy, and efficiency in data wrangling. Furthermore, we demonstrated the generalizability of Dango by applying it to a broader set of data wrangling tasks.

Authors:Suleiman Saka, Sanchari Das
Title: "Watch My Health, Not My Data": Understanding Perceptions, Barriers, Emotional Impact, & Coping Strategies Pertaining to IoT Privacy and Security in Health Monitoring for Older Adults
Abstract:
The proliferation of "Internet of Things (IoT)" provides older adults with critical support for "health monitoring" and independent living, yet significant concerns about security and privacy persist. In this paper, we report on these issues through a two-phase user study, including a survey (N = 22) and semi-structured interviews (n = 9) with adults aged 65+. We found that while 81.82% of our participants are aware of security features like "two-factor authentication (2FA)" and encryption, 63.64% express serious concerns about unauthorized access to sensitive health data. Only 13.64% feel confident in existing protections, citing confusion over "data sharing policies" and frustration with "complex security settings" which lead to distrust and anxiety. To cope, our participants adopt various strategies, such as relying on family or professional support and limiting feature usage leading to disengagement. Thus, we recommend "adaptive security mechanisms," simplified interfaces, and real-time transparency notifications to foster trust and ensure "privacy and security by design" in IoT health systems for older adults.

Authors:Sojeong Yun, Youn-kyung Lim
Title: "What If Smart Homes Could See Our Homes?": Exploring DIY Smart Home Building Experiences with VLM-Based Camera Sensors
Abstract:
The advancement of Vision-Language Model (VLM) camera sensors, which enable autonomous understanding of household situations without user intervention, has the potential to completely transform the DIY smart home building experience. Will this simplify or complicate the DIY smart home process? Additionally, what features do users want to create using these sensors? To explore this, we conducted a three-week diary-based experience prototyping study with 12 participants. Participants recorded their daily activities, used GPT to analyze the images, and manually customized and tested smart home features based on the analysis. The study revealed three key findings: (1) participants' expectations for VLM camera-based smart homes, (2) the impact of VLM camera sensor characteristics on the DIY process, and (3) users' concerns. Through the findings of this study, we propose design implications to support the DIY smart home building process with VLM camera sensors, and discuss living with intelligence.

Authors:Kaisei Fukaya, Damon Daylamani-Zad, Harry Agius
Title: Heuristics for AI-driven Graphical Asset Generation Tools in Game Design and Development Pipelines: A User-Centred Approach
Abstract:
Graphical assets play an important role in the design and development of games. There is potential in the use of AI-driven generative tools, to aid in creating graphical assets, thus improving game design and development pipelines. However, there is little research to address how the generative methods can fit into the wider pipeline. There also no guidelines or heuristics for creating such tools. To address this gap we conducted a user study with 16 game designers and developers to examine their behaviour and interaction with generative tools for graphical assets. The findings highlight that early design stage is preferred by all participants. Designers and developers are inclined to use such tools for creating large amounts of variations at the cost of quality as they can improve the quality of the artefacts once they generate a suitable asset. The results also strongly raised the need for better integration of such tools in existing design and development environments and the need for the outputs to be in common data formats, to be manipulatable and smoothly integrate into existing environments. The study also highlights the requirement for further emphasis on the needs of the users to incorporate these tools effectively in existing pipelines. Informed by these results, we provide a set of heuristics for creating tools that meet the expectations and needs of game designers and developers.

Authors:Jiwan Kim, Mingyu Han, Ian Oakley
Title: BudsID: Mobile-Ready and Expressive Finger Identification Input for Earbuds
Abstract:
Wireless earbuds are an appealing platform for wearable computing on-the-go. However, their small size and out-of-view location mean they support limited different inputs. We propose finger identification input on earbuds as a novel technique to resolve these problems. This technique involves associating touches by different fingers with different responses. To enable it on earbuds, we adapted prior work on smartwatches to develop a wireless earbud featuring a magnetometer that detects fields from a magnetic ring. A first study reveals participants achieve rapid, precise earbud touches with different fingers, even while mobile (time: 0.98s, errors: 5.6%). Furthermore, touching fingers can be accurately classified (96.9%). A second study shows strong performance with a more expressive technique involving multi-finger double-taps (inter-touch time: 0.39s, errors: 2.8%) while maintaining high accuracy (94.7%). We close by exploring and evaluating the design of earbud finger identification applications and demonstrating the feasibility of our system on low-resource devices.

Authors:Jiwan Kim, Jiwan Son, Ian Oakley
Title: Cross, Dwell, or Pinch: Designing and Evaluating Around-Device Selection Methods for Unmodified Smartwatches
Abstract:
Smartwatches offer powerful features, but their small touchscreens limit the expressiveness of the input that can be achieved. To address this issue, we present, and open-source, the first sonar-based around-device input on an unmodified consumer smartwatch. We achieve this using a fine-grained, one-dimensional sonar-based finger-tracking system. In addition, we use this system to investigate the fundamental issue of how to trigger selections during around-device smartwatch input through two studies. The first examines the methods of double-crossing, dwell, and finger tap in a binary task, while the second considers a subset of these designs in a multi-target task and in the presence and absence of haptic feedback. Results showed double-crossing was optimal for binary tasks, while dwell excelled in multi-target scenarios, and haptic feedback enhanced comfort but not performance. These findings offer design insights for future around-device smartwatch interfaces that can be directly deployed on today's consumer hardware.

Authors:Nisha Devasia, Runhua Zhao, Jin Ha Lee
Title: Does the Story Matter? Applying Narrative Theory to an Educational Misinformation Escape Room Game
Abstract:
Rapid spread of harmful misinformation has led to a dire need for effective media literacy interventions, to which educational games have been suggested as a possible solution. Researchers and educators have created several games that increase media literacy and resilience to misinformation. However, the existing body of misinformation education games rarely focus upon the socio-emotional influences that factor into misinformation belief. Misinformation correction and serious games have both explored narrative as a method to engage with people on an emotional basis. To this end, we investigated how 123 young adults (mean age = 22.98) experienced narrative transportation and identification in two narrative-centered misinformation escape room games developed for library settings. We found that propensity for certain misinformation contexts, such as engagement with fan culture and likelihood to share on social media platforms, significantly affected how participants experienced specific measures of narrative immersion within the games. We discuss design implications for tailoring educational interventions to specific misinformation contexts.

Authors:Yueyang Wu, Sinan Yang, Yanming Wang, Jiajie He, Muhammad Mohsin Pathan, Bensheng Qiu, Xiaoxiao Wang
Title: Volume-Wise Task fMRI Decoding with Deep Learning:Enhancing Temporal Resolution and Cognitive Function Analysis
Abstract:
In recent years,the application of deep learning in task functional Magnetic Resonance Imaging (tfMRI) decoding has led to significant advancements. However,most studies remain constrained by assumption of temporal stationarity in neural activity,resulting in predominantly block-wise analysis with limited temporal resolution on the order of tens of seconds. This limitation restricts the ability to decode cognitive functions in detail. To address these limitations, this study proposes a deep neural network designed for volume-wise identification of task states within tfMRI data,thereby overcoming the constraints of conventional methods. Evaluated on Human Connectome Project (HCP) motor and gambling tfMRI datasets,the model achieved impressive mean accuracy rates of 94.0% and 79.6%,respectively. These results demonstrate a substantial enhancement in temporal resolution,enabling more detailed exploration of cognitive processes. The study further employs visualization algorithms to investigate dynamic brain mappings during different tasks,marking a significant step forward in deep learning-based frame-level tfMRI decoding. This approach offers new methodologies and tools for examining dynamic changes in brain activities and understanding the underlying cognitive mechanisms.

Authors:Shuyu Liu, Ruoxi Wang, Ling Zhang, Xuequan Zhu, Rui Yang, Xinzhu Zhou, Fei Wu, Zhi Yang, Cheng Jin, Gang Wang
Title: PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice
Abstract:
The advent of Large Language Models (LLMs) offers potential solutions to address problems such as shortage of medical resources and low diagnostic consistency in psychiatric clinical practice. Despite this potential, a robust and comprehensive benchmarking framework to assess the efficacy of LLMs in authentic psychiatric clinical environments is absent. This has impeded the advancement of specialized LLMs tailored to psychiatric applications. In response to this gap, by incorporating clinical demands in psychiatry and clinical data, we proposed a benchmarking system, PsychBench, to evaluate the practical performance of LLMs in psychiatric clinical settings. We conducted a comprehensive quantitative evaluation of 16 LLMs using PsychBench, and investigated the impact of prompt design, chain-of-thought reasoning, input text length, and domain-specific knowledge fine-tuning on model performance. Through detailed error analysis, we identified strengths and potential limitations of the existing models and suggested directions for improvement. Subsequently, a clinical reader study involving 60 psychiatrists of varying seniority was conducted to further explore the practical benefits of existing LLMs as supportive tools for psychiatrists of varying seniority. Through the quantitative and reader evaluation, we show that while existing models demonstrate significant potential, they are not yet adequate as decision-making tools in psychiatric clinical practice. The reader study further indicates that, as an auxiliary tool, LLM could provide particularly notable support for junior psychiatrists, effectively enhancing their work efficiency and overall clinical quality. To promote research in this area, we will make the dataset and evaluation framework publicly available, with the hope of advancing the application of LLMs in psychiatric clinical settings.

Authors:Joongi Shin, Anna Polyanskaya, Andrés Lucero, Antti Oulasvirta
Title: No Evidence for LLMs Being Useful in Problem Reframing
Abstract:
Problem reframing is a designerly activity wherein alternative perspectives are created to recast what a stated design problem is about. Generating alternative problem frames is challenging because it requires devising novel and useful perspectives that fit the given problem context. Large language models (LLMs) could assist this activity via their generative capability. However, it is not clear whether they can help designers produce high-quality frames. Therefore, we asked if there are benefits to working with LLMs. To this end, we compared three ways of using LLMs (N=280): 1) free-form, 2) direct generation, and 3) a structured approach informed by a theory of reframing. We found that using LLMs does not help improve the quality of problem frames. In fact, it increases the competence gap between experienced and inexperienced designers. Also, inexperienced ones perceived lower agency when working with LLMs. We conclude that there is no benefit to using LLMs in problem reframing and discuss possible factors for this lack of effect.

Authors:David Hartmann, Amin Oueslati, Dimitri Staufer, Lena Pohlmann, Simon Munzert, Hendrik Heuer
Title: Lost in Moderation: How Commercial Content Moderation APIs Over- and Under-Moderate Group-Targeted Hate Speech and Linguistic Variations
Abstract:
Commercial content moderation APIs are marketed as scalable solutions to combat online hate speech. However, the reliance on these APIs risks both silencing legitimate speech, called over-moderation, and failing to protect online platforms from harmful speech, known as under-moderation. To assess such risks, this paper introduces a framework for auditing black-box NLP systems. Using the framework, we systematically evaluate five widely used commercial content moderation APIs. Analyzing five million queries based on four datasets, we find that APIs frequently rely on group identity terms, such as ``black'', to predict hate speech. While OpenAI's and Amazon's services perform slightly better, all providers under-moderate implicit hate speech, which uses codified messages, especially against LGBTQIA+ individuals. Simultaneously, they over-moderate counter-speech, reclaimed slurs and content related to Black, LGBTQIA+, Jewish, and Muslim people. We recommend that API providers offer better guidance on API implementation and threshold setting and more transparency on their APIs' limitations. Warning: This paper contains offensive and hateful terms and concepts. We have chosen to reproduce these terms for reasons of transparency.

Authors:Ziyi Xia, Xincheng Huang, Sidney S Fels, Robert Xiao
Title: HaloTouch: Using IR Multi-path Interference to Support Touch Interactions With General Surfaces
Abstract:
Sensing touch on arbitrary surfaces has long been a goal of ubiquitous computing, but often requires instrumenting the surface. Depth camera-based systems have emerged as a promising solution for minimizing instrumentation, but at the cost of high touch-down detection error rates, high touch latency, and high minimum hover distance, limiting them to basic tasks. We developed HaloTouch, a vision-based system which exploits a multipath interference effect from an off-the-shelf time-of-flight depth camera to enable fast, accurate touch interactions on general surfaces. HaloTouch achieves a 99.2% touch-down detection accuracy across various materials, with a motion-to-photon latency of 150 ms. With a brief (20s) user-specific calibration, HaloTouch supports millimeter-accurate hover sensing as well as continuous pressure sensing. We conducted a user study with 12 participants, including a typing task demonstrating text input at 26.3 AWPM. HaloTouch shows promise for more robust, dynamic touch interactions without instrumenting surfaces or adding hardware to users.

Authors:Sitong Li, Stefano Padilla, Pierre Le Bras, Junyu Dong, Mike Chantler
Title: A Review of LLM-Assisted Ideation
Abstract:
We present a comprehensive, in-depth review of ideation assisted by large language models (LLMs), highlighting emerging trends and identifying unaddressed research gaps. In total, we examined 61 studies investigating the application of LLMs in both group and individual ideation processes. From these studies, we derived the Hourglass Ideation Framework for LLM-assisted ideation, comprising three phases and seven key ideation stages, which served as the basis for our systematic survey. Our analysis reveals that LLMs are most frequently used for idea generation and refinement, but their use in scope specification, foundational material structuring and multi-idea evaluation and selection remains limited. We provide our findings in extensive tabular and online formats. These catalogues detail research on LLM-assisted, purely LLM-based, and human-only activities across the seven ideation stages for each of the 61 studies. These also detail creative domains, publication outlets, interaction designs, user study designs, and assessment methods. Our analysis of system interaction design reveals a predominant focus on supporting individual ideation activities and text-based interaction, with a growing trend of incorporating multimedia elements. However, in group ideation, tools and interaction modalities targeting both synchronous and asynchronous collaboration are much scarcer. We synthesize the primary findings of our review and outline promising directions for future research in LLM-assisted ideation. We hope this review will help researchers quickly gain an overview of this rapidly expanding area, efficiently locate relevant work, and identify underexplored areas for further investigation. In addition, we believe the framework we present here will form the basis for the development of future problem and solution space taxonomies, and methodologies for LLM-assisted ideation development and use.

Authors:Daye Kim, Sebin Lee, Yoonseo Jun, Yujin Shin, Jungjin Lee
Title: VTuber's Atelier: The Design Space, Challenges, and Opportunities for VTubing
Abstract:
VTubing, the practice of live streaming using virtual avatars, has gained worldwide popularity among streamers seeking to maintain anonymity. While previous research has primarily focused on the social and cultural aspects of VTubing, there is a noticeable lack of studies examining the practical challenges VTubers face in creating and operating their avatars. To address this gap, we surveyed VTubers' equipment and expanded the live-streaming design space by introducing six new dimensions related to avatar creation and control. Additionally, we conducted interviews with 16 professional VTubers to comprehensively explore their practices, strategies, and challenges throughout the VTubing process. Our findings reveal that VTubers face significant burdens compared to real-person streamers due to fragmented tools and the multi-tasking nature of VTubing, leading to unique workarounds. Finally, we summarize these challenges and propose design opportunities to improve the effectiveness and efficiency of VTubing.

Authors:Yuhan Liu, Aadit Shah, Jordan Ackerman, Manaswi Saha
Title: Exploring the Design Space of Real-time LLM Knowledge Support Systems: A Case Study of Jargon Explanations
Abstract:
Knowledge gaps often arise during communication due to diverse backgrounds, knowledge bases, and vocabularies. With recent LLM developments, providing real-time knowledge support is increasingly viable, but is challenging due to shared and individual cognitive limitations (e.g., attention, memory, and comprehension) and the difficulty in understanding the user's context and internal knowledge. To address these challenges, we explore the key question of understanding how people want to receive real-time knowledge support. We built StopGap -- a prototype that provides real-time knowledge support for explaining jargon words in videos -- to conduct a design probe study (N=24) that explored multiple visual knowledge representation formats. Our study revealed individual differences in preferred representations and highlighted the importance of user agency, personalization, and mixed-initiative assistance. Based on our findings, we map out six key design dimensions for real-time LLM knowledge support systems and offer insights for future research in this space.

Authors:Lukas William Mayer, Sheer Karny, Jackie Ayoub, Miao Song, Danyang Tian, Ehsan Moradi-Pari, Mark Steyvers
Title: Human-AI Collaboration: Trade-offs Between Performance and Preferences
Abstract:
Despite the growing interest in collaborative AI, designing systems that seamlessly integrate human input remains a major challenge. In this study, we developed a task to systematically examine human preferences for collaborative agents. We created and evaluated five collaborative AI agents with strategies that differ in the manner and degree they adapt to human actions. Participants interacted with a subset of these agents, evaluated their perceived traits, and selected their preferred agent. We used a Bayesian model to understand how agents' strategies influence the Human-AI team performance, AI's perceived traits, and the factors shaping human-preferences in pairwise agent comparisons. Our results show that agents who are more considerate of human actions are preferred over purely performance-maximizing agents. Moreover, we show that such human-centric design can improve the likability of AI collaborators without reducing performance. We find evidence for inequality-aversion effects being a driver of human choices, suggesting that people prefer collaborative agents which allow them to meaningfully contribute to the team. Taken together, these findings demonstrate how collaboration with AI can benefit from development efforts which include both subjective and objective metrics.

Authors:Paula Ebner, Jessica Szczuka
Title: Predicting Romantic Human-Chatbot Relationships: A Mixed-Method Study on the Key Psychological Factors
Abstract:
Romantic relationships with social chatbots are becoming increasingly prevalent, raising important questions about their societal and psychological implications. Despite this growing trend, little is known about the individuals entering these synthetic relationships. This three-part study seeks to enhance understanding of the factors encompassing human-chatbot relationships by quantitatively examining the commonly discussed characteristics romantic and sexual fantasy, loneliness, attachment style, anthropomorphism, and sexual sensation seeking (Study 1A), comparing the impact of romantic and sexual fantasizing for human-chatbot versus human-human relationships (Study 1B), and providing qualitative insights into how individuals conceptualize romantic and sexual fantasies in their interactions with chatbots (Study 2). Individuals with romantic chatbot connections were interviewed (N=15) or surveyed (N=92), while participants in the comparison groups, long-distance (N=90) and cohabiting relationships (N=82), completed a questionnaire. Romantic fantasizing emerged as the strongest predictor of human-chatbot relationships, alongside anthropomorphism and anxious-avoidant attachment. Notably, romantic fantasy also predicted partner closeness across all relationship types, revealing shared psychological dynamics between human-chatbot and human-human bonds. Interviews further reinforced this, with all participants engaging in fantasy exploration while desiring their chatbot to feel as human as possible. This paper provides a novel and multifaceted examination of the psychological dynamics within human-chatbot relationships, highlighting the central yet understudied role of fantasy.

Authors:Zihan Wu, Yicheng Tang, Barbara Ericson
Title: Learner and Instructor Needs in AI-Supported Programming Learning Tools: Design Implications for Features and Adaptive Control
Abstract:
AI-supported tools can help learners overcome challenges in programming education by providing adaptive assistance. However, existing research often focuses on individual tools rather than deriving broader design recommendations. A key challenge in designing these systems is balancing learner control with system-driven guidance. To explore user preferences for AI-supported programming learning tools, we conducted a participatory design study with 15 undergraduate novice programmers and 10 instructors to gather insights on their desired help features and control preferences, as well as a follow-up survey with 172 introductory programming students. Our qualitative findings show that learners prefer help that is encouraging, incorporates visual aids, and includes peer-related insights, whereas instructors prioritize scaffolding that reflects learners' progress and reinforces best practices. Both groups favor shared control, though learners generally prefer more autonomy, while instructors lean toward greater system guidance to prevent cognitive overload. Additionally, our interviews revealed individual differences in control preferences. Based on our findings, we propose design guidelines for AI-supported programming tools, particularly regarding user-centered help features and adaptive control mechanisms. Our work contributes to the human-centered design of AI-supported learning environments by informing the development of systems that effectively balance autonomy and guidance, enhancing AI-supported educational tools for programming and beyond.

Authors:Nick Bryan-Kinns, Shuoyang Jasper Zheng, Francisco Castro, Makayla Lewis, Jia-Rey Chang, Gabriel Vigliensoni, Terence Broad, Michael Clemens, Elizabeth Wilson
Title: XAIxArts Manifesto: Explainable AI for the Arts
Abstract:
Explainable AI (XAI) is concerned with how to make AI models more understandable to people. To date these explanations have predominantly been technocentric - mechanistic or productivity oriented. This paper introduces the Explainable AI for the Arts (XAIxArts) manifesto to provoke new ways of thinking about explainability and AI beyond technocentric discourses. Manifestos offer a means to communicate ideas, amplify unheard voices, and foster reflection on practice. To supports the co-creation and revision of the XAIxArts manifesto we combine a World Café style discussion format with a living manifesto to question four core themes: 1) Empowerment, Inclusion, and Fairness; 2) Valuing Artistic Practice; 3) Hacking and Glitches; and 4) Openness. Through our interactive living manifesto experience we invite participants to actively engage in shaping this XIAxArts vision within the CHI community and beyond.

Authors:Edoardo Sebastiano De Duro, Giuseppe Alessandro Veltri, Hudson Golino, Massimo Stella
Title: Measuring and identifying factors of individuals' trust in Large Language Models
Abstract:
Large Language Models (LLMs) can engage in human-looking conversational exchanges. Although conversations can elicit trust between users and LLMs, scarce empirical research has examined trust formation in human-LLM contexts, beyond LLMs' trustworthiness or human trust in AI in general. Here, we introduce the Trust-In-LLMs Index (TILLMI) as a new framework to measure individuals' trust in LLMs, extending McAllister's cognitive and affective trust dimensions to LLM-human interactions. We developed TILLMI as a psychometric scale, prototyped with a novel protocol we called LLM-simulated validity. The LLM-based scale was then validated in a sample of 1,000 US respondents. Exploratory Factor Analysis identified a two-factor structure. Two items were then removed due to redundancy, yielding a final 6-item scale with a 2-factor structure. Confirmatory Factor Analysis on a separate subsample showed strong model fit ($CFI = .995$, $TLI = .991$, $RMSEA = .046$, $p_{X^2} > .05$). Convergent validity analysis revealed that trust in LLMs correlated positively with openness to experience, extraversion, and cognitive flexibility, but negatively with neuroticism. Based on these findings, we interpreted TILLMI's factors as "closeness with LLMs" (affective dimension) and "reliance on LLMs" (cognitive dimension). Younger males exhibited higher closeness with- and reliance on LLMs compared to older women. Individuals with no direct experience with LLMs exhibited lower levels of trust compared to LLMs' users. These findings offer a novel empirical foundation for measuring trust in AI-driven verbal communication, informing responsible design, and fostering balanced human-AI collaboration.

Authors:Siting Liang, Daniel Sonntag
Title: Explainable Biomedical Claim Verification with Large Language Models
Abstract:
Verification of biomedical claims is critical for healthcare decision-making, public health policy and scientific research. We present an interactive biomedical claim verification system by integrating LLMs, transparent model explanations, and user-guided justification. In the system, users first retrieve relevant scientific studies from a persistent medical literature corpus and explore how different LLMs perform natural language inference (NLI) within task-adaptive reasoning framework to classify each study as "Support," "Contradict," or "Not Enough Information" regarding the claim. Users can examine the model's reasoning process with additional insights provided by SHAP values that highlight word-level contributions to the final result. This combination enables a more transparent and interpretable evaluation of the model's decision-making process. A summary stage allows users to consolidate the results by selecting a result with narrative justification generated by LLMs. As a result, a consensus-based final decision is summarized for each retrieved study, aiming safe and accountable AI-assisted decision-making in biomedical contexts. We aim to integrate this explainable verification system as a component within a broader evidence synthesis framework to support human-AI collaboration.

Authors:Tejasvi Chebrolu, Ponnurangam Kumaraguru, Ashwin Rajadesingan
Title: Personal Narratives Empower Politically Disinclined Individuals to Engage in Political Discussions
Abstract:
Engaging in political discussions is crucial in democratic societies, yet many individuals remain politically disinclined due to various factors such as perceived knowledge gaps, conflict avoidance, or a sense of disconnection from the political system. In this paper, we explore the potential of personal narratives-short, first-person accounts emphasizing personal experiences-as a means to empower these individuals to participate in online political discussions. Using a text classifier that identifies personal narratives, we conducted a large-scale computational analysis to evaluate the relationship between the use of personal narratives and participation in political discussions on Reddit. We find that politically disinclined individuals (PDIs) are more likely to use personal narratives than more politically active users. Personal narratives are more likely to attract and retain politically disinclined individuals in political discussions than other comments. Importantly, personal narratives posted by politically disinclined individuals are received more positively than their other comments in political communities. These results emphasize the value of personal narratives in promoting inclusive political discourse.

Authors:Linkun Zhou, Jian Li, Yadong Mo, Xiangyan Zhang, Ying Zhang, Shimin Wei
Title: AoECR: AI-ization of Elderly Care Robot
Abstract:
Autonomous interaction is crucial for the effective use of elderly care robots. However, developing universal AI architectures is extremely challenging due to the diversity in robot configurations and a lack of dataset. We proposed a universal architecture for the AI-ization of elderly care robots, called AoECR. Specifically, based on a nursing bed, we developed a patient-nurse interaction dataset tailored for elderly care scenarios and fine-tuned a large language model to enable it to perform nursing manipulations. Additionally, the inference process included a self-check chain to ensure the security of control commands. An expert optimization process further enhanced the humanization and personalization of the interactive responses. The physical experiment demonstrated that the AoECR exhibited zero-shot generalization capabilities across diverse scenarios, understood patients' instructions, implemented secure control commands, and delivered humanized and personalized interactive responses. In general, our research provides a valuable dataset reference and AI-ization solutions for elderly care robots.

Authors:Konstantina Christakopoulou, Iris Qu, John Canny, Andrew Goodridge, Cj Adams, Minmin Chen, Maja Matarić
Title: Conversational Planning for Personal Plans
Abstract:
The language generation and reasoning capabilities of large language models (LLMs) have enabled conversational systems with impressive performance in a variety of tasks, from code generation, to composing essays, to passing STEM and legal exams, to a new paradigm for knowledge search. Besides those short-term use applications, LLMs are increasingly used to help with real-life goals or tasks that take a long time to complete, involving multiple sessions across days, weeks, months, or even years. Thus to enable conversational systems for long term interactions and tasks, we need language-based agents that can plan for long horizons. Traditionally, such capabilities were addressed by reinforcement learning agents with hierarchical planning capabilities. In this work, we explore a novel architecture where the LLM acts as the meta-controller deciding the agent's next macro-action, and tool use augmented LLM-based option policies execute the selected macro-action. We instantiate this framework for a specific set of macro-actions enabling adaptive planning for users' personal plans through conversation and follow-up questions collecting user feedback. We show how this paradigm can be applicable in scenarios ranging from tutoring for academic and non-academic tasks to conversational coaching for personal health plans.

Authors:Soobin Park, Hankyung Kim, Youn-kyung Lim
Title: Reimagining Personal Data: Unlocking the Potential of AI-Generated Images in Personal Data Meaning-Making
Abstract:
Image-generative AI provides new opportunities to transform personal data into alternative visual forms. In this paper, we illustrate the potential of AI-generated images in facilitating meaningful engagement with personal data. In a formative autobiographical design study, we explored the design and use of AI-generated images derived from personal data. Informed by this study, we designed a web-based application as a probe that represents personal data through generative images utilizing Open AI's GPT-4 model and DALL-E 3. We then conducted a 21-day diary study and interviews using the probe with 16 participants to investigate users' in-depth experiences with images generated by AI in everyday lives. Our findings reveal new qualities of experiences in users' engagement with data, highlighting how participants constructed personal meaning from their data through imagination and speculation on AI-generated images. We conclude by discussing the potential and concerns of leveraging image-generative AI for personal data meaning-making.

Authors:Xingyu Bruce Liu, Haijun Xia, Xiang Anthony Chen
Title: Interacting with Thoughtful AI
Abstract:
We envision the concept of Thoughtful AI, a new human-AI interaction paradigm in which the AI behaves as a continuously thinking entity. Unlike conventional AI systems that operate on a turn-based, input-output model, Thoughtful AI autonomously generates, develops, and communicates its evolving thought process throughout an interaction. In this position paper, we argue that this thoughtfulness unlocks new possibilities for human-AI interaction by enabling proactive AI behavior, facilitating continuous cognitive alignment with users, and fostering more dynamic interaction experiences. We outline the conceptual foundations of Thoughtful AI, illustrate its potential through example projects, and envision how this paradigm can transform human-AI interaction in the future.

Authors:Zhuoran Lu, Qian Zhou, Yi Wang
Title: WhatELSE: Shaping Narrative Spaces at Configurable Level of Abstraction for AI-bridged Interactive Storytelling
Abstract:
Generative AI significantly enhances player agency in interactive narratives (IN) by enabling just-in-time content generation that adapts to player actions. While delegating generation to AI makes IN more interactive, it becomes challenging for authors to control the space of possible narratives - within which the final story experienced by the player emerges from their interaction with AI. In this paper, we present WhatELSE, an AI-bridged IN authoring system that creates narrative possibility spaces from example stories. WhatELSE provides three views (narrative pivot, outline, and variants) to help authors understand the narrative space and corresponding tools leveraging linguistic abstraction to control the boundaries of the narrative space. Taking innovative LLM-based narrative planning approaches, WhatELSE further unfolds the narrative space into executable game events. Through a user study (N=12) and technical evaluations, we found that WhatELSE enables authors to perceive and edit the narrative space and generates engaging interactive narratives at play-time.

Authors:Aline Xavier Fidêncio, Felix Grün, Christian Klaes, Ioannis Iossifidis
Title: Error-related Potential driven Reinforcement Learning for adaptive Brain-Computer Interfaces
Abstract:
Brain-computer interfaces (BCIs) provide alternative communication methods for individuals with motor disabilities by allowing control and interaction with external devices. Non-invasive BCIs, especially those using electroencephalography (EEG), are practical and safe for various applications. However, their performance is often hindered by EEG non-stationarities, caused by changing mental states or device characteristics like electrode impedance. This challenge has spurred research into adaptive BCIs that can handle such variations. In recent years, interest has grown in using error-related potentials (ErrPs) to enhance BCI performance. ErrPs, neural responses to errors, can be detected non-invasively and have been integrated into different BCI paradigms to improve performance through error correction or adaptation. This research introduces a novel adaptive ErrP-based BCI approach using reinforcement learning (RL). We demonstrate the feasibility of an RL-driven adaptive framework incorporating ErrPs and motor imagery. Utilizing two RL agents, the framework adapts dynamically to EEG non-stationarities. Validation was conducted using a publicly available motor imagery dataset and a fast-paced game designed to boost user engagement. Results show the framework's promise, with RL agents learning control policies from user interactions and achieving robust performance across datasets. However, a critical insight from the game-based protocol revealed that motor imagery in a high-speed interaction paradigm was largely ineffective for participants, highlighting task design limitations in real-time BCI applications. These findings underscore the potential of RL for adaptive BCIs while pointing out practical constraints related to task complexity and user responsiveness.

Authors:Shutaro Aoyama, Rintaro Chujo, Ari Hautasaari, Takeshi Naemura
Title: Intersubjective Model of AI-mediated Communication: Augmenting Human-Human Text Chat through LLM-based Adaptive Agent Pair
Abstract:
The growing prevalence of Large Language Models (LLMs) is reshaping online text-based communication; a transformation that is extensively studied as AI-mediated communication. However, much of the existing research remains bound by traditional communication models, where messages are created and transmitted directly between humans despite LLMs being able to play a more active role in transforming messages. In this work, we propose the Intersubjective Model of AI-mediated Communication, an alternative communication model that leverages LLM-based adaptive agents to augment human-human communication. Unlike traditional communication models that focus on the accurate transmission of information, the Intersubjective Model allows for communication to be designed in an adaptive and customizable way to create alternative interactions by dynamically shaping messages in real time and facilitating shared understanding between the human participants. In this paper, we have developed a prototype text chat system based on the Intersubjective Model to describe the potential of this model, as well as the design space it affords.

Authors:Yubin Choi, Jeanne Choi, Joseph Seering
Title: Leveling Up Together: Fostering Positive Growth and Safe Online Spaces for Teen Roblox Developers
Abstract:
Creating games together is both a playful and effective way to develop skills in computational thinking, collaboration, and more. However, game development can be challenging for younger developers who lack formal training. While teenage developers frequently turn to online communities for peer support, their experiences may vary. To better understand the benefits and challenges teens face within online developer communities, we conducted interviews with 18 teenagers who created games or elements in Roblox and received peer support from one or more online Roblox developer communities. Our findings show that developer communities provide teens with valuable resources for technical, social, and career growth. However, teenagers also struggle with inter-user conflicts and a lack of community structure, leading to difficulties in handling complex issues that may arise, such as financial scams. Based on these insights, we propose takeaways for creating positive and safe online spaces for teenage game creators.

Authors:Changyo Han, Yosuke Nakagawa, Takeshi Naemura
Title: corobos: A Design for Mobile Robots Enabling Cooperative Transitions between Table and Wall Surfaces
Abstract:
Swarm User Interfaces allow dynamic arrangement of user environments through the use of multiple mobile robots, but their operational range is typically confined to a single plane due to constraints imposed by their two-wheel propulsion systems. We present corobos, a proof-of-concept design that enables these robots to cooperatively transition between table (horizontal) and wall (vertical) surfaces seamlessly, without human intervention. Each robot is equipped with a uniquely designed slope structure that facilitates smooth rotation when another robot pushes it toward a target surface. Notably, this design relies solely on passive mechanical elements, eliminating the need for additional active electrical components. We investigated the design parameters of this structure and evaluated its transition success rate through experiments. Furthermore, we demonstrate various application examples to showcase the potential of corobos in enhancing user environments.

Authors:Yudong Xie, Zhifeng Han, Qinfan Xiao, Liwei Liang, Lu-Qi Tao, Tian-Ling Ren
Title: Silent Speech Sentence Recognition with Six-Axis Accelerometers using Conformer and CTC Algorithm
Abstract:
Silent speech interfaces (SSI) are being actively developed to assist individuals with communication impairments who have long suffered from daily hardships and a reduced quality of life. However, silent sentences are difficult to segment and recognize due to elision and linking. A novel silent speech sentence recognition method is proposed to convert the facial motion signals collected by six-axis accelerometers into transcribed words and sentences. A Conformer-based neural network with the Connectionist-Temporal-Classification algorithm is used to gain contextual understanding and translate the non-acoustic signals into words sequences, solely requesting the constituent words in the database. Test results show that the proposed method achieves a 97.17% accuracy in sentence recognition, surpassing the existing silent speech recognition methods with a typical accuracy of 85%-95%, and demonstrating the potential of accelerometers as an available SSI modality for high-accuracy silent speech sentence recognition.

Authors:Avishek Choudhury, Yeganeh Shahsavar, Hamid Shamszare
Title: User Intent to Use DeepSeek for Healthcare Purposes and their Trust in the Large Language Model: Multinational Survey Study
Abstract:
Large language models (LLMs) increasingly serve as interactive healthcare resources, yet user acceptance remains underexplored. This study examines how ease of use, perceived usefulness, trust, and risk perception interact to shape intentions to adopt DeepSeek, an emerging LLM-based platform, for healthcare purposes. A cross-sectional survey of 556 participants from India, the United Kingdom, and the United States was conducted to measure perceptions and usage patterns. Structural equation modeling assessed both direct and indirect effects, including potential quadratic relationships. Results revealed that trust plays a pivotal mediating role: ease of use exerts a significant indirect effect on usage intentions through trust, while perceived usefulness contributes to both trust development and direct adoption. By contrast, risk perception negatively affects usage intent, emphasizing the importance of robust data governance and transparency. Notably, significant non-linear paths were observed for ease of use and risk, indicating threshold or plateau effects. The measurement model demonstrated strong reliability and validity, supported by high composite reliabilities, average variance extracted, and discriminant validity measures. These findings extend technology acceptance and health informatics research by illuminating the multifaceted nature of user adoption in sensitive domains. Stakeholders should invest in trust-building strategies, user-centric design, and risk mitigation measures to encourage sustained and safe uptake of LLMs in healthcare. Future work can employ longitudinal designs or examine culture-specific variables to further clarify how user perceptions evolve over time and across different regulatory environments. Such insights are critical for harnessing AI to enhance outcomes.

Authors:Federico Scarì, Nitin Jonathan Myers, Chen Quan, Arkady Zgonnikov
Title: Hybrid Human-Machine Perception via Adaptive LiDAR for Advanced Driver Assistance Systems
Abstract:
Accurate environmental perception is critical for advanced driver assistance systems (ADAS). Light detection and ranging (LiDAR) systems play a crucial role in ADAS; they can reliably detect obstacles and help ensure traffic safety. Existing research on LiDAR sensing has demonstrated that adapting the LiDAR's resolution and range based on environmental characteristics can improve machine perception. However, current adaptive LiDAR approaches for ADAS have not explored the possibility of combining the perception abilities of the vehicle and the human driver, which can potentially further enhance the detection performance. In this paper, we propose a novel system that adapts LiDAR characteristics to human driver's visual perception to enhance LiDAR sensing outside human's field of view. We develop a proof-of-concept prototype of the system in the virtual environment CARLA. Our system integrates real-time data on the driver's gaze to identify regions in the environment that the driver is monitoring. This allows the system to optimize LiDAR resources by dynamically increasing the LiDAR's range and resolution in peripheral areas that the driver may not be attending to. Our simulations show that this gaze-aware LiDAR enhances detection performance compared to a baseline standalone LiDAR, particularly in challenging environmental conditions like fog. Our hybrid human-machine sensing approach potentially offers improved safety and situational awareness in real-time driving scenarios for ADAS applications.

Authors:Arghavan Sanei, Jinghui Cheng
Title: Untold Stories: Unveiling the Scarce Contributions of UX Professionals to Usability Issue Discussions of Open Source Software Projects
Abstract:
Previous work established that open source software (OSS) projects can benefit from the involvement of UX professionals, who offer user-centric perspectives and contributions to improve software usability. However, their participation in OSS issue discussions (places where design and implementation decisions are often made) is relatively scarce since those platforms are created with a developer-centric mindset. Analyzing a dataset sampled from five OSS projects, this study identifies UX professionals' distinct approaches to raising and following up on usability issues. Compared to other contributors, UX professionals addressed a broader range of usability issues, well-supported their stances, and were more factual than emotional. They also actively engage in discussions to provide additional insights and clarifications in comments following up on the issues they posted. Results from this study provide useful insights for increasing UX professionals' involvement in OSS communities to improve usability and end-user satisfaction.

Authors:Eunhye Kim, Kiroong Choe, Minju Yoo, Sadat Shams Chowdhury, Jinwook Seo
Title: Beyond Tools: Understanding How Heavy Users Integrate LLMs into Everyday Tasks and Decision-Making
Abstract:
Large language models (LLMs) are increasingly used for both everyday and specialized tasks. While HCI research focuses on domain-specific applications, little is known about how heavy users integrate LLMs into everyday decision-making. Through qualitative interviews with heavy LLM users (n=7) who employ these systems for both intuitive and analytical thinking tasks, our findings show that participants use LLMs for social validation, self-regulation, and interpersonal guidance, seeking to build self-confidence and optimize cognitive resources. These users viewed LLMs either as rational, consistent entities or average human decision-makers. Our findings suggest that heavy LLM users develop nuanced interaction patterns beyond simple delegation, highlighting the need to reconsider how we study LLM integration in decision-making processes.

Authors:Sojeong Yun, Youn-kyung Lim
Title: User Experience with LLM-powered Conversational Recommendation Systems: A Case of Music Recommendation
Abstract:
The advancement of large language models (LLMs) now allows users to actively interact with conversational recommendation systems (CRS) and build their own personalized recommendation services tailored to their unique needs and goals. This experience offers users a significantly higher level of controllability compared to traditional RS, enabling an entirely new dimension of recommendation experiences. Building on this context, this study explored the unique experiences that LLM-powered CRS can provide compared to traditional RS. Through a three-week diary study with 12 participants using custom GPTs for music recommendations, we found that LLM-powered CRS can (1) help users clarify implicit needs, (2) support unique exploration, and (3) facilitate a deeper understanding of musical preferences. Based on these findings, we discuss the new design space enabled by LLM-powered CRS and highlight its potential to support more personalized, user-driven recommendation experiences.

Authors:Liudas Panavas, Tarik Crnovrsanin, Racquel Fygenson, Eamon Conway, Derek Millard, Norbou Buchler, Cody Dunne
Title: Set Visualizations for Comparing and Evaluating Machine Learning Models
Abstract:
Machine learning practitioners often need to compare multiple models to select the best one for their application. However, current methods of comparing models fall short because they rely on aggregate metrics that can be difficult to interpret or do not provide enough information to understand the differences between models. To better support the comparison of models, we propose set visualizations of model outputs to enable easier model-to-model comparison. We outline the requirements for using sets to compare machine learning models and demonstrate how this approach can be applied to various machine learning tasks. We also introduce SetMLVis, an interactive system that utilizes set visualizations to compare object detection models. Our evaluation shows that SetMLVis outperforms traditional visualization techniques in terms of task completion and reduces cognitive workload for users. Supplemental materials can be found at https://osf.io/afksu/?view_only=bb7f259426ad425f81d0518a38c597be.

Authors:Premankur Banerjee, Jiaxuan Wang, Lauren Tomita, Mia P Montiel, Heather Culbertson
Title: Virtual Encounters of the Haptic Kind: Towards a Multi-User VR System for Real-Time Social Touch
Abstract:
Physical touch, a fundamental aspect of human social interaction, remains largely absent in real-time virtual communication. We present a haptic-enabled multi-user Virtual Reality (VR) system that facilitates real-time, bi-directional social touch communication among physically distant users. We developed wearable gloves and forearm sleeves, embedded with 26 vibrotactile actuators for each hand and arm, actuated via a WiFi-based communication system. The system enables VR-transmitted data to be universally interpreted by haptic devices, allowing feedback rendering based on their capabilities. Users can perform and receive social touch gestures such as stroke, pat, poke, and squeeze, with other users within a shared virtual space or interact with other virtual objects, and they receive vibrotactile feedback. Through a two-part user study involving six pairs of participants, we investigate the impact of gesture speed, haptic feedback modality, and user roles, during real-time haptic communication in VR, on affective and sensory experiences, as well as evaluate the overall system usability. Our findings highlight key design considerations that significantly improve affective experiences, presence, embodiment, pleasantness, and naturalness, to foster more immersive and expressive mediated social touch experiences in VR.

Authors:Daniel Björkegren, Jun Ho Choi, Divya Budihal, Dominic Sobhani, Oliver Garrod, Paul Atherton
Title: Could AI Leapfrog the Web? Evidence from Teachers in Sierra Leone
Abstract:
Although 85% of sub-Saharan Africa's population is covered by mobile broadband signal, only 37% use the internet, and those who do seldom use the web. The most frequently cited reason for low internet usage is the cost of data. We investigate whether AI can bridge this gap by analyzing 40,350 queries submitted to an AI chatbot by 469 teachers in Sierra Leone over 17 months. Teachers use AI for teaching assistance more frequently than web search. We compare the AI responses to the corresponding top search results for the same queries from the most popular local web search engine, google.com.sl. Only 2% of results for corresponding web searches contain content from in country. Additionally, the average web search result consumes 3,107 times more data than an AI response. Bandwidth alone costs \$2.41 per thousand web search results loaded, while the total cost of AI is \$0.30 per thousand responses. As a result, AI is 87% less expensive than web search. In blinded evaluations, an independent sample of teachers rate AI responses as more relevant, helpful, and correct than web search results. These findings suggest that AI-driven solutions can cost-effectively bridge information gaps in low-connectivity regions.

Authors:Shreya Shukla, Jose Torres, Abhijit Mishra, Jacek Gwizdka, Shounak Roychowdhury
Title: A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond
Abstract:
Integration of Brain-Computer Interfaces (BCIs) and Generative Artificial Intelligence (GenAI) has opened new frontiers in brain signal decoding, enabling assistive communication, neural representation learning, and multimodal integration. BCIs, particularly those leveraging Electroencephalography (EEG), provide a non-invasive means of translating neural activity into meaningful outputs. Recent advances in deep learning, including Generative Adversarial Networks (GANs) and Transformer-based Large Language Models (LLMs), have significantly improved EEG-based generation of images, text, and speech. This paper provides a literature review of the state-of-the-art in EEG-based multimodal generation, focusing on (i) EEG-to-image generation through GANs, Variational Autoencoders (VAEs), and Diffusion Models, and (ii) EEG-to-text generation leveraging Transformer based language models and contrastive learning methods. Additionally, we discuss the emerging domain of EEG-to-speech synthesis, an evolving multimodal frontier. We highlight key datasets, use cases, challenges, and EEG feature encoding methods that underpin generative approaches. By providing a structured overview of EEG-based generative AI, this survey aims to equip researchers and practitioners with insights to advance neural decoding, enhance assistive technologies, and expand the frontiers of brain-computer interaction.

Authors:Fan Zhang, Yun Chen, Xiaoke Zeng, Tianqi Wang, Long Ling, RAY LC
Title: "An Image of Ourselves in Our Minds": How College-educated Online Dating Users Construct Profiles for Effective Self Presentation
Abstract:
Online dating is frequently used by individuals looking for potential relationships and intimate connections. Central to dating apps is the creation and refinement of a dating profile, which represents the way individuals desire to present themselves to potential mates, while hiding information they do not care to share. To investigate the way frequent users of dating apps construct their online profiles and perceive the effectiveness of strategies taken in making profiles, we conducted semi-structured interviews with 20 experienced users who are Chinese college-educated young adults and uncovered the processes and rationales by which they make profiles for online dating, particularly in selecting images for inclusion. We found that participants used idealized photos that exaggerated their positive personality traits, sometimes traits that they do not possess but perceive others to desire, and sometimes even traits they wish they had possessed. Users also strategically used photos that show personality and habits without showing themselves, and often hid certain identifying information to reduce privacy risks. This analysis signals potential factors that are key in building online dating profiles, providing design implications for systems that limit the use of inaccurate information while still promoting self-expression in relationship platforms.

Authors:Marina Estévez-Almenzar, Ricardo Baeza-Yates, Carlos Castillo
Title: A Comparison of Human and Machine Learning Errors in Face Recognition
Abstract:
Machine learning applications in high-stakes scenarios should always operate under human oversight. Developing an optimal combination of human and machine intelligence requires an understanding of their complementarities, particularly regarding the similarities and differences in the way they make mistakes. We perform extensive experiments in the area of face recognition and compare two automated face recognition systems against human annotators through a demographically balanced user study. Our research uncovers important ways in which machine learning errors and human errors differ from each other, and suggests potential strategies in which human-machine collaboration can improve accuracy in face recognition.

Authors:Yoshee Jain, Mehmet Arif Demirtaş, Kathryn Cunningham
Title: PLAID: Supporting Computing Instructors to Identify Domain-Specific Programming Plans at Scale
Abstract:
Pedagogical approaches focusing on stereotypical code solutions, known as programming plans, can increase problem-solving ability and motivate diverse learners. However, plan-focused pedagogies are rarely used beyond introductory programming. Our formative study (N=10 educators) showed that identifying plans is a tedious process. To advance plan-focused pedagogies in application-focused domains, we created an LLM-powered pipeline that automates the effortful parts of educators' plan identification process by providing use-case-driven program examples and candidate plans. In design workshops (N=7 educators), we identified design goals to maximize instructors' efficiency in plan identification by optimizing interaction with this LLM-generated content. Our resulting tool, PLAID, enables instructors to access a corpus of relevant programs to inspire plan identification, compare code snippets to assist plan refinement, and facilitates them in structuring code snippets into plans. We evaluated PLAID in a within-subjects user study (N=12 educators) and found that PLAID led to lower cognitive demand and increased productivity compared to the state-of-the-art. Educators found PLAID beneficial for generating instructional material. Thus, our findings suggest that human-in-the-loop approaches hold promise for supporting plan-focused pedagogies at scale.

Authors:Ryoya Komatsu, Ayumu Ogura, Shigeo Yoshida, Kazutoshi Tanaka, Yuichi Itoh
Title: Transtiff: A Stylus-shaped Interface for Rendering Perceived Stiffness of Virtual Objects via Stylus Stiffness Control
Abstract:
The replication of object stiffness is essential for enhancing haptic feedback in virtual environments. However, existing research has overlooked how stylus stiffness influences the perception of virtual object stiffness during tool-mediated interactions. To address this, we conducted a psychophysical experiment demonstrating that changing stylus stiffness combined with visual stimuli altered users' perception of virtual object stiffness. Based on these insights, we developed Transtiff, a stylus-shaped interface capable of on-demand stiffness control using a McKibben artificial muscle mechanism. Unlike previous approaches, our method manipulates the perceived stiffness of virtual objects via the stylus by controlling the stiffness of the stylus without altering the properties of the real object being touched, creating the illusion of a hard object feeing soft. Our user study confirmed that Transtiff effectively simulates a range of material properties, such as sponge, plastic, and tennis balls, providing haptic rendering that is closely aligned with the perceived material characteristics. By addressing the challenge of delivering realistic haptic feedback through tool-based interactions, Transtiff represents a significant advancement in the haptic interface design for VR applications.

Authors:Wenqi Li, Jui-Ching Kuo, Manyu Sheng, Pengyi Zhang, Qunfang Wu
Title: Beyond Explicit and Implicit: How Users Provide Feedback to Shape Personalized Recommendation Content
Abstract:
As personalized recommendation algorithms become integral to social media platforms, users are increasingly aware of their ability to influence recommendation content. However, limited research has explored how users provide feedback through their behaviors and platform mechanisms to shape the recommendation content. We conducted semi-structured interviews with 34 active users of algorithmic-driven social media platforms (e.g., Xiaohongshu, Douyin). In addition to explicit and implicit feedback, this study introduced intentional implicit feedback, highlighting the actions users intentionally took to refine recommendation content through perceived feedback mechanisms. Additionally, choices of feedback behaviors were found to align with specific purposes. Explicit feedback was primarily used for feed customization, while unintentional implicit feedback was more linked to content consumption. Intentional implicit feedback was employed for multiple purposes, particularly in increasing content diversity and improving recommendation relevance. This work underscores the user intention dimension in the explicit-implicit feedback dichotomy and offers insights for designing personalized recommendation feedback that better responds to users' needs.

Authors:Shreyan Biswas, Alexander Erlei, Ujwal Gadiraju
Title: Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages
Abstract:
Recent advances in generative AI have precipitated a proliferation of novel writing assistants. These systems typically rely on multilingual large language models (LLMs), providing globalized workers the ability to revise or create diverse forms of content in different languages. However, there is substantial evidence indicating that the performance of multilingual LLMs varies between languages. Users who employ writing assistance for multiple languages are therefore susceptible to disparate output quality. Importantly, recent research has shown that people tend to generalize algorithmic errors across independent tasks, violating the behavioral axiom of choice independence. In this paper, we analyze whether user utilization of novel writing assistants in a charity advertisement writing task is affected by the AI's performance in a second language. Furthermore, we quantify the extent to which these patterns translate into the persuasiveness of generated charity advertisements, as well as the role of peoples' beliefs about LLM utilization in their donation choices. Our results provide evidence that writers who engage with an LLM-based writing assistant violate choice independence, as prior exposure to a Spanish LLM reduces subsequent utilization of an English LLM. While these patterns do not affect the aggregate persuasiveness of the generated advertisements, people's beliefs about the source of an advertisement (human versus AI) do. In particular, Spanish-speaking female participants who believed that they read an AI-generated advertisement strongly adjusted their donation behavior downwards. Furthermore, people are generally not able to adequately differentiate between human-generated and LLM-generated ads. Our work has important implications for the design, development, integration, and adoption of multilingual LLMs as assistive agents -- particularly in writing tasks.

Authors:Sam Cohen, Ravi Chugh
Title: Code Style Sheets: CSS for Code
Abstract:
Program text is rendered using impoverished typographic styles. Beyond choice of fonts and syntax-highlighting colors, code editors and related tools utilize very few text decorations. These limited styles are, furthermore, applied in monolithic fashion, regardless of the programs and tasks at hand. We present the notion of _code style sheets_ for styling program text. Motivated by analogy to cascading style sheets (CSS) for styling HTML documents, code style sheets provide mechanisms for defining rules to select elements from an abstract syntax tree (AST) in order to style their corresponding visual representation. Technically, our selector language generalizes essential constructs from CSS to a programming-language setting with algebraic data types (such as ASTs). Practically, code style sheets allow ASTs to be styled granularly, based on semantic information -- such as the structure of abstract syntax, static type information, and corresponding run-time values -- as well as design choices on the part of authors and readers of a program. Because programs are heavily nested in structure, a key aspect of our design is a layout algorithm that renders nested, multiline text blocks more compactly than in existing box-based layout systems such as HTML. In this paper, we design and implement a code style sheets system for a subset of Haskell, using it to illustrate several code presentation and visualization tasks. These examples demonstrate that code style sheets provide a uniform framework for rendering programs in multivarious ways, which could be employed in future designs for text-based as well as structure editors.

Authors:Karahan Sarıtaş, Kıvanç Tezören, Yavuz Durmazkeser
Title: A Systematic Review on the Evaluation of Large Language Models in Theory of Mind Tasks
Abstract:
In recent years, evaluating the Theory of Mind (ToM) capabilities of large language models (LLMs) has received significant attention within the research community. As the field rapidly evolves, navigating the diverse approaches and methodologies has become increasingly complex. This systematic review synthesizes current efforts to assess LLMs' ability to perform ToM tasks, an essential aspect of human cognition involving the attribution of mental states to oneself and others. Despite notable advancements, the proficiency of LLMs in ToM remains a contentious issue. By categorizing benchmarks and tasks through a taxonomy rooted in cognitive science, this review critically examines evaluation techniques, prompting strategies, and the inherent limitations of LLMs in replicating human-like mental state reasoning. A recurring theme in the literature reveals that while LLMs demonstrate emerging competence in ToM tasks, significant gaps persist in their emulation of human cognitive abilities.

Authors:Wei Xuan, Meghna Roy Chowdhury, Yi Ding, Yixue Zhao
Title: Unlocking Mental Health: Exploring College Students' Well-being through Smartphone Behaviors
Abstract:
The global mental health crisis is a pressing concern, with college students particularly vulnerable to rising mental health disorders. The widespread use of smartphones among young adults, while offering numerous benefits, has also been linked to negative outcomes such as addiction and regret, significantly impacting well-being. Leveraging the longest longitudinal dataset collected over four college years through passive mobile sensing, this study is the first to examine the relationship between students' smartphone unlocking behaviors and their mental health at scale in real-world settings. We provide the first evidence demonstrating the predictability of phone unlocking behaviors for mental health outcomes based on a large dataset, highlighting the potential of these novel features for future predictive models. Our findings reveal important variations in smartphone usage across genders and locations, offering a deeper understanding of the interplay between digital behaviors and mental health. We highlight future research directions aimed at mitigating adverse effects and promoting digital well-being in this population.

Authors:Alexander Htet Kyaw, Arvin Xu, Sasa Zivkovic, Gwyllim Jahn, Cameron Newnham, Nick Van Den Berg
Title: AR Glulam: Accurate Augmented Reality Using Multiple Fiducial Markers for Glulam Fabrication
Abstract:
Recent advancements in Augmented Reality (AR) have demonstrated applications in architecture, design, and fabrication. Compared to conventional 2D construction drawings, AR can be used to superimpose contextual instructions, display 3D spatial information and enable on-site engagement. Despite the potential of AR, the widespread adoption of the technology in the industry is limited by its precision. Precision is important for projects requiring strict construction tolerances, design fidelity, and fabrication feedback. For example, the manufacturing of glulam beams requires tolerances of less than 2mm. The goal of this project is to explore the industrial application of using multiple fiducial markers for high-precision AR fabrication. While the method has been validated in lab settings with a precision of 0.97, this paper focuses on fabricating glulam beams in a factory setting with an industry manufacturer, Unalam Factory.

Authors:Zhimin Wang, Maohang Rao, Shanghua Ye, Weitao Song, Feng Lu
Title: Towards spatial computing: recent advances in multimodal natural interaction for XR headsets
Abstract:
With the widespread adoption of Extended Reality (XR) headsets, spatial computing technologies are gaining increasing attention. Spatial computing enables interaction with virtual elements through natural input methods such as eye tracking, hand gestures, and voice commands, thus placing natural human-computer interaction at its core. While previous surveys have reviewed conventional XR interaction techniques, recent advancements in natural interaction, particularly driven by artificial intelligence (AI) and large language models (LLMs), have introduced new paradigms and technologies. In this paper, we review research on multimodal natural interaction for wearable XR, focusing on papers published between 2022 and 2024 in six top venues: ACM CHI, UIST, IMWUT (Ubicomp), IEEE VR, ISMAR, and TVCG. We classify and analyze these studies based on application scenarios, operation types, and interaction modalities. This analysis provides a structured framework for understanding how researchers are designing advanced natural interaction techniques in XR. Based on these findings, we discuss the challenges in natural interaction techniques and suggest potential directions for future research. This review provides valuable insights for researchers aiming to design natural and efficient interaction systems for XR, ultimately contributing to the advancement of spatial computing.

Authors:Thea Christoffersen, Annika Tidemand Jensen, Chris Hall, Christofer Meinecke, Stefan Jänicke
Title: Quantitative Analysis of Objects in Prisoner Artworks
Abstract:
Prisoners of Nazi concentration camps created paintings as a means to express their daily life experiences and feelings. Several thousand such paintings exist, but a quantitative analysis of them has not been carried out. We created an extensive dataset of 1,939 Holocaust prisoner artworks, and we employed an object detection framework that found 19,377 objects within these artworks. To support the quantitative and qualitative analysis of the art collection and its objects, we have developed an intuitive and interactive dashboard to promote a deeper engagement with these visual testimonies. The dashboard features various visual interfaces, e.g., a word cloud showing the detected objects and a map of artwork origins, and options for filtering. We presented the interface to domain experts, whose feedback highlights the dashboard's intuitiveness and potential for both quantitative and qualitative analysis while also providing relevant suggestions for improvement. Our project demonstrates the benefit of digital methods such as machine learning and visual analytics for Holocaust remembrance and educational purposes.

Authors:Chen Ji, Katherine Isbister
Title: MetaphorChat: A Metaphorical Chatting Space for Expressing and Understanding Inner Feelings
Abstract:
Metaphors have been used during therapy sessions to facilitate the communication of inner feelings between clients and therapists. Can we create a digital metaphorical chatting space for daily use within close relationships? As the first step towards this vision, this work follows the autobiographical design approach to prototype MetaphorChat, which comprises two metaphorical chatting scenes tailored to meet researchers' genuine needs for discussing specific life topics in close relationships. Along with typing-based chatting, each scene offers a metaphorical narrative experience, composed of graphics and sound, with interactive mechanisms that deliver metaphorical meanings. This pictorial details the process of mapping abstract feelings into metaphor concepts, then how these concepts are translated into various interaction design elements, and the reflections from self-usage. We discuss the vision for such a metaphorical chatting space, uniquely positioned between messaging apps and video games, for the future design of empathetic communication applications.

Authors:Roel Vertegaal, Timothy Merritt, Saul Greenberg, Aneesh P. Tarun, Zhen Li, Zafeirios Fountas
Title: Interactive Inference: A Neuromorphic Theory of Human-Computer Interaction
Abstract:
Neuromorphic Human-Computer Interaction (HCI) is a theoretical approach to designing better user experiences (UX) motivated by advances in the understanding of the neurophysiology of the brain. Inspired by the neuroscientific theory of Active Inference, Interactive Inference is a first example of such approach. It offers a simplified interpretation of Active Inference that allows designers to more readily apply this theory to design and evaluation. In Interactive Inference, user behaviour is modeled as Bayesian inference on progress and goal distributions that predicts the next action. We show how the error between goal and progress distributions, or Bayesian surprise, can be modeled as a simple mean square error of the signal-to-noise ratio (SNR) of a task. The problem is that the user's capacity to process Bayesian surprise follows the logarithm of this SNR. This means errors rise quickly once average capacity is exceeded. Our model allows the quantitative analysis of performance and error using one framework that can provide real-time estimates of the mental load in users that needs to be minimized by design. We show how three basic laws of HCI, Hick's Law, Fitts' Law and the Power Law can be expressed using our model. We then test the validity of the model by empirically measuring how well it predicts human performance and error in a car following task. Results suggest that driver processing capacity indeed is a logarithmic function of the SNR of the distance to a lead car. This result provides initial evidence that Interactive Interference can be useful as a new theoretical design tool.

Authors:David Pearl, James Intriligator, Xuanjiang Liu
Title: Seamless Integration: The Evolution, Design, and Future Impact of Wearable Technology
Abstract:
The rapid evolution of wearable technology marks a transformative phase in human-computer interaction, seamlessly integrating digital functionality into daily life. This paper explores the historical trajectory, current advancements, and future potential of wearables, emphasizing their impact on healthcare, productivity, and personal well-being. Key developments include the integration of artificial intelligence (AI), Internet of Things (IoT), and augmented reality (AR), driving personalization, real-time adaptability, and enhanced user experiences. The study highlights user-centered design principles, ethical considerations, and interdisciplinary collaboration as critical factors in creating wearables that are intuitive, inclusive, and secure. Furthermore, the paper examines sustainability trends, such as modular designs and eco-friendly materials, aligning innovation with environmental responsibility. By addressing challenges like data privacy, algorithmic bias, and usability, wearable technology is poised to redefine the interaction between humans and technology, offering unprecedented opportunities for enrichment and empowerment in diverse contexts. This comprehensive analysis provides a roadmap for advancing wearables to meet emerging societal needs while fostering ethical and sustainable growth.

Authors:Lisa Orii, Elizabeth K Harrington, Serah Gitome, Nelson Kiprotich Cheruiyot, Elizabeth Anne Bukusi, Sandy Cheng, Ariel Fu, Khushi Khandelwal, Shrimayee Narasimhan, Richard Anderson
Title: Supporting Contraceptive Decision-Making in the Intermediated Pharmacy Setting in Kenya
Abstract:
Adolescent girls and young women (AGYW) in sub-Saharan Africa face unique barriers to contraceptive access and lack AGYW-centered contraceptive decision-support resources. To empower AGYW to make informed choices and improve reproductive health outcomes, we developed a tablet-based application to provide contraceptive education and decision-making support in the pharmacy setting - a key source of contraceptive services for AGYW - in Kenya. We conducted workshops with AGYW and pharmacy providers in Kenya to gather app feedback and understand how to integrate the intervention into the pharmacy setting. Our analysis highlights how intermediated interactions - a multiuser, cooperative effort to enable technology use and information access - could inform a successful contraceptive intervention in Kenya. The potential strengths of intermediation in our setting inform implications for technological health interventions in intermediated scenarios in \lrem{LMICs}\ladd{low- and middle-income countries}, including challenges and opportunities for extending impact to different populations and integrating technology into resource-constrained healthcare settings.

Authors:Kazuya Matsuo, Yoko Ishii, Atsushi Otsuka, Ryo Ishii, Hiroaki Sugiyama, Masahiro Mizukami, Tsunehiro Arimoto, Narichika Nomoto, Yoshihide Sato, Tetsuya Yamaguchi
Title: Enhancing Impression Change Prediction in Speed Dating Simulations Based on Speakers' Personalities
Abstract:
This paper focuses on simulating text dialogues in which impressions between speakers improve during speed dating. This simulation involves selecting an utterance from multiple candidates generated by a text generation model that replicates a specific speaker's utterances, aiming to improve the impression of the speaker. Accurately selecting an utterance that improves the impression is crucial for the simulation. We believe that whether an utterance improves a dialogue partner's impression of the speaker may depend on the personalities of both parties. However, recent methods for utterance selection do not consider the impression per utterance or the personalities. To address this, we propose a method that predicts whether an utterance improves a partner's impression of the speaker, considering the personalities. The evaluation results showed that personalities are useful in predicting impression changes per utterance. Furthermore, we conducted a human evaluation of simulated dialogues using our method. The results showed that it could simulate dialogues more favorably received than those selected without considering personalities.

Authors:Tao Lu, Qian Zhu, Tiffany Ma, Wong Kam-Kwai, Anlan Xie, Alex Endert, Yalong Yang
Title: Ego vs. Exo and Active vs. Passive: Investigating the Effects of Viewpoint and Navigation on Spatial Immersion and Understanding in Immersive Storytelling
Abstract:
Visual storytelling combines visuals and narratives to communicate important insights. While web-based visual storytelling is well-established, leveraging the next generation of digital technologies for visual storytelling, specifically immersive technologies, remains underexplored. We investigated the impact of the story viewpoint (from the audience's perspective) and navigation (when progressing through the story) on spatial immersion and understanding. First, we collected web-based 3D stories and elicited design considerations from three VR developers. We then adapted four selected web-based stories to an immersive format. Finally, we conducted a user study (N=24) to examine egocentric and exocentric viewpoints, active and passive navigation, and the combinations they form. Our results indicated significantly higher preferences for egocentric+active (higher agency and engagement) and exocentric+passive (higher focus on content). We also found a marginal significance of viewpoints on story understanding and a strong significance of navigation on spatial immersion.

Authors:Ruishi Zou, Yinqi Tang, Jingzhu Chen, Siyu Lu, Yan Lu, Yingfan Yang, Chen Ye
Title: GistVis: Automatic Generation of Word-scale Visualizations from Data-rich Documents
Abstract:
Data-rich documents are ubiquitous in various applications, yet they often rely solely on textual descriptions to convey data insights. Prior research primarily focused on providing visualization-centric augmentation to data-rich documents. However, few have explored using automatically generated word-scale visualizations to enhance the document-centric reading process. As an exploratory step, we propose GistVis, an automatic pipeline that extracts and visualizes data insight from text descriptions. GistVis decomposes the generation process into four modules: Discoverer, Annotator, Extractor, and Visualizer, with the first three modules utilizing the capabilities of large language models and the fourth using visualization design knowledge. Technical evaluation including a comparative study on Discoverer and an ablation study on Annotator reveals decent performance of GistVis. Meanwhile, the user study (N=12) showed that GistVis could generate satisfactory word-scale visualizations, indicating its effectiveness in facilitating users' understanding of data-rich documents (+5.6% accuracy) while significantly reducing their mental demand (p=0.016) and perceived effort (p=0.033).

Authors:Muhammad Raees, Vassilis-Javed Khan, Konstantinos Papangelis
Title: UX Challenges in Implementing an Interactive B2B Customer Segmentation Tool
Abstract:
In our effort to implement an interactive customer segmentation tool for a global manufacturing company, we identified user experience (UX) challenges with technical implications. The main challenge relates to domain users' effort, in our case sales experts, to interpret the clusters produced by an unsupervised Machine Learning (ML) algorithm, for creating a customer segmentation. An additional challenge is what sort of interactions should such a tool support to enable meaningful interpretations of the output of clustering models. In this case study, we describe what we learned from implementing an Interactive Machine Learning (IML) prototype to address such UX challenges. We leverage a multi-year real-world dataset and domain experts' feedback from a global manufacturing company to evaluate our tool. We report what we found to be effective and wish to inform designers of IML systems in the context of customer segmentation and other related unsupervised ML tools.

Authors:Ade Satria Saloka Santosa, Yu-Wei Chang, Andreas B. Dahlin, Lars Osterlund, Giovanni Volpe, Kunli Xiong
Title: Retina electronic paper with video-rate-tunable 45000 pixels per inch
Abstract:
As demand for immersive experiences grows, displays are moving closer to the eye with smaller sizes and higher resolutions. However, shrinking pixel emitters reduce intensity, making them harder to perceive. Electronic Papers utilize ambient light for visibility, maintaining optical contrast regardless of pixel size, but cannot achieve high resolution. We show electrically tunable meta-pixels down to ~560 nm in size (>45,000 PPI) consisting of WO3 nanodiscs, allowing one-to-one pixel-photodetector mapping on the retina when the display size matches the pupil diameter, which we call Retina Electronic Paper. Our technology also supports video display (25 Hz), high reflectance (~80%), and optical contrast (~50%), which will help create the ultimate virtual reality display.

Authors:Xiangzhi Eric Wang, Zackary P. T. Sin, Ye Jia, Daniel Archer, Wynonna H. Y. Fong, Qing Li, Chen Li
Title: Can You Move These Over There? An LLM-based VR Mover for Supporting Object Manipulation
Abstract:
In our daily lives, we can naturally convey instructions for the spatial manipulation of objects using words and gestures. Transposing this form of interaction into virtual reality (VR) object manipulation can be beneficial. We propose VR Mover, an LLM-empowered solution that can understand and interpret the user's vocal instruction to support object manipulation. By simply pointing and speaking, the LLM can manipulate objects without structured input. Our user study demonstrates that VR Mover enhances user usability, overall experience and performance on multi-object manipulation, while also reducing workload and arm fatigue. Users prefer the proposed natural interface for broad movements and may complementarily switch to gizmos or virtual hands for finer adjustments. These findings are believed to contribute to design implications for future LLM-based object manipulation interfaces, highlighting the potential for more intuitive and efficient user interactions in VR environments.

Authors:Aimee Allen, Tom Drummond, Dana Kulić
Title: Sound Judgment: Properties of Consequential Sounds Affecting Human-Perception of Robots
Abstract:
Positive human-perception of robots is critical to achieving sustained use of robots in shared environments. One key factor affecting human-perception of robots are their sounds, especially the consequential sounds which robots (as machines) must produce as they operate. This paper explores qualitative responses from 182 participants to gain insight into human-perception of robot consequential sounds. Participants viewed videos of different robots performing their typical movements, and responded to an online survey regarding their perceptions of robots and the sounds they produce. Topic analysis was used to identify common properties of robot consequential sounds that participants expressed liking, disliking, wanting or wanting to avoid being produced by robots. Alongside expected reports of disliking high pitched and loud sounds, many participants preferred informative and audible sounds (over no sound) to provide predictability of purpose and trajectory of the robot. Rhythmic sounds were preferred over acute or continuous sounds, and many participants wanted more natural sounds (such as wind or cat purrs) in-place of machine-like noise. The results presented in this paper support future research on methods to improve consequential sounds produced by robots by highlighting features of sounds that cause negative perceptions, and providing insights into sound profile changes for improvement of human-perception of robots, thus enhancing human robot interaction.

Authors:Daniel A. Adler, Yuewen Yang, Thalia Viranda, Anna R. Van Meter, Emma Elizabeth McGinty, Tanzeem Choudhury
Title: Designing Technologies for Value-based Mental Healthcare: Centering Clinicians' Perspectives on Outcomes Data Specification, Collection, and Use
Abstract:
Health information technologies are transforming how mental healthcare is paid for through value-based care programs, which tie payment to data quantifying care outcomes. But, it is unclear what outcomes data these technologies should store, how to engage users in data collection, and how outcomes data can improve care. Given these challenges, we conducted interviews with 30 U.S.-based mental health clinicians to explore the design space of health information technologies that support outcomes data specification, collection, and use in value-based mental healthcare. Our findings center clinicians' perspectives on aligning outcomes data for payment programs and care; opportunities for health technologies and personal devices to improve data collection; and considerations for using outcomes data to hold stakeholders including clinicians, health insurers, and social services financially accountable in value-based mental healthcare. We conclude with implications for future research designing and developing technologies supporting value-based care across stakeholders involved with mental health service delivery.

Authors:Mengyi Wei, Chenjing Jiao, Chenyu Zuo, Lorenz Hurni, Liqiu Meng
Title: Constructing AI ethics narratives based on real-world data: Human-AI collaboration in data-driven visual storytelling
Abstract:
AI ethics narratives have the potential to shape the public accurate understanding of AI technologies and promote communication among different stakeholders. However, AI ethics narratives are largely lacking. Existing limited narratives tend to center on works of science fiction or corporate marketing campaigns of large technology companies. Misuse of "socio-technical imaginary" can blur the line between speculation and reality for the public, undermining the responsibility and regulation of technology development. Therefore, constructing authentic AI ethics narratives is an urgent task. The emergence of generative AI offers new possibilities for building narrative systems. This study is dedicated to data-driven visual storytelling about AI ethics relying on the human-AI collaboration. Based on the five key elements of story models, we proposed a conceptual framework for human-AI collaboration, explored the roles of generative AI and humans in the creation of visual stories. We implemented the conceptual framework in a real AI news case. This research leveraged advanced generative AI technologies to provide a reference for constructing genuine AI ethics narratives. Our goal is to promote active public engagement and discussions through authentic AI ethics narratives, thereby contributing to the development of better AI policies.

Authors:Hyeok Kim, Mingyoung J. Jeng, Kaitlin N. Smith
Title: Toward Human-Quantum Computer Interaction: Interface Techniques for Usable Quantum Computing
Abstract:
By leveraging quantum-mechanical properties like superposition, entanglement, and interference, quantum computing (QC) offers promising solutions for problems that classical computing has not been able to solve efficiently, such as drug discovery, cryptography, and physical simulation. Unfortunately, adopting QC remains difficult for potential users like QC beginners and application-specific domain experts, due to limited theoretical and practical knowledge, the lack of integrated interface-wise support, and poor documentation. For example, to use quantum computers, one has to convert conceptual logic into low-level codes, analyze quantum program results, and share programs and results. To support the wider adoption of QC, we, as designers and QC experts, propose interaction techniques for QC through design iterations. These techniques include writing quantum codes conceptually, comparing initial quantum programs with optimized programs, sharing quantum program results, and exploring quantum machines. We demonstrate the feasibility and utility of these techniques via use cases with high-fidelity prototypes.

Authors:Brianna M White, Rameshwari Prasad, Nariman Ammar, Jason A Yaun, Arash Shaban-Nejad
Title: Digital Health Innovations for Screening and Mitigating Mental Health Impacts of Adverse Childhood Experiences: Narrative Review
Abstract:
This study presents a narrative review of the use of digital health technologies (DHTs) and artificial intelligence to screen and mitigate risks and mental health consequences associated with ACEs among children and youth. Several databases were searched for studies published from August 2017 to August 2022. Selected studies (1) explored the relationship between digital health interventions and mitigation of negative health outcomes associated with mental health in childhood and adolescence and (2) examined prevention of ACE occurrence associated with mental illness in childhood and adolescence. A total of 18 search papers were selected, according to our inclusion and exclusion criteria, to evaluate and identify means by which existing digital solutions may be useful in mitigating the mental health consequences associated with the occurrence of ACEs in childhood and adolescence and preventing ACE occurrence due to mental health consequences. We also highlighted a few knowledge gaps or barriers to DHT implementation and usability. Findings from the search suggest that the incorporation of DHTs, if implemented successfully, has the potential to improve the quality of related care provisions for the management of mental health consequences of adverse or traumatic events in childhood, including posttraumatic stress disorder, suicidal behavior or ideation, anxiety or depression, and attention-deficit/hyperactivity disorder. The use of DHTs, machine learning tools, natural learning processing, and artificial intelligence can positively help in mitigating ACEs and associated risk factors. Under proper legal regulations, security, privacy, and confidentiality assurances, digital technologies could also assist in promoting positive childhood experiences in children and young adults, bolstering resilience, and providing reliable public health resources to serve populations in need.

Authors:Daniel Pahr, Henry Ehlers, Velitchko Filipov
Title: HoloGraphs: An Interactive Physicalization for Dynamic Graphs
Abstract:
We present HoloGraphs, a novel approach for physically representing, explaining, exploring, and interacting with dynamic networks. HoloGraphs addresses the challenges of visualizing and understanding evolving network structures by providing an engaging method of interacting and exploring dynamic network structures using physicalization techniques. In contrast to traditional digital interfaces, our approach leverages tangible artifacts made from transparent materials to provide an intuitive way for people with low visualization literacy to explore network data. The process involves printing network embeddings on transparent media and assembling them to create a 3D representation of dynamic networks, maintaining spatial perception and allowing the examination of each timeslice individually. Interactivity is envisioned using optional Focus+Context layers and overlays for node trajectories and labels. Focus layers highlight nodes of interest, context layers provide an overview of the network structure, and global overlays show node trajectories over time. In this paper, we outline the design principles and implementation of HoloGraphs and present how elementary digital interactions can be mapped to physical interactions to manipulate the elements of a network and temporal dimension in an engaging matter. We demonstrate the capabilities of our concept in a case study. Using a dynamic network of character interactions from a popular book series, we showcase how it represents and supports understanding complex concepts such as dynamic networks.

Authors:Keon Ju M. Lee, Philippe Pasquier
Title: Musical Agent Systems: MACAT and MACataRT
Abstract:
Our research explores the development and application of musical agents, human-in-the-loop generative AI systems designed to support music performance and improvisation within co-creative spaces. We introduce MACAT and MACataRT, two distinct musical agent systems crafted to enhance interactive music-making between human musicians and AI. MACAT is optimized for agent-led performance, employing real-time synthesis and self-listening to shape its output autonomously, while MACataRT provides a flexible environment for collaborative improvisation through audio mosaicing and sequence-based learning. Both systems emphasize training on personalized, small datasets, fostering ethical and transparent AI engagement that respects artistic integrity. This research highlights how interactive, artist-centred generative AI can expand creative possibilities, empowering musicians to explore new forms of artistic expression in real-time, performance-driven and music improvisation contexts.

Authors:Abidullah Khan, Atefeh Shokrizadeh, Jinghui Cheng
Title: Beyond Automation: How UI/UX Designers Perceive AI as a Creative Partner in the Divergent Thinking Stages
Abstract:
Divergent thinking activities, like research and ideation, are key drivers of innovation in UI/UX design. Existing research has explored AI's role in automating design tasks, but leaves a critical gap in understanding how AI specifically influences divergent thinking. To address this, we conducted interviews with 19 professional UI/UX designers, examining their use and perception of AI in these creative activities. We found that in this context, participants valued AI tools that offer greater control over ideation, facilitate collaboration, enhance efficiency to liberate creativity, and align with their visual habits. Our results indicated four key roles AI plays in supporting divergent thinking: aiding research, kick-starting creativity, generating design alternatives, and facilitating prototype exploration. Through this study, we provide insights into the evolving role of AI in the less-investigated area of divergent thinking in UI/UX design, offering recommendations for future AI tools that better support design innovation.

Authors:Frank Wencheng Liu, Mason Manetta, Prasad Borkar, Byron Lahey, Assegid Kidane, Robert LiKamWa
Title: Pneutouch: Exploring the affordances and interactions of haptic inflatables through a wrist-worn interface
Abstract:
Haptic sensations that align with virtual reality (VR) experiences have a profound impact on presence and enjoyment. There is potential to explore the dynamic capabilities of pneumatic inflatables to offer immersive sensations in virtual environments, including variations in shape, size, and stiffness. We introduce Pneutouch, an ungrounded and untethered wrist-worn device designed as a pneumatic haptic interface for VR interactions. Pneutouch's dynamic inflation ability enables programmable stiffness and shape change of haptic proxies. Additionally, multiple haptic proxies can be delivered into and out of the user's hand grasp. We describe the implementation of the Pneutouch device. We conducted user studies to demonstrate the affordances of pneumatic inflatables and assessed the device's efficacy in providing haptic feedback. With Pneutouch, our goal is to expand what can be touched in the virtual space and bring more immersion into virtual reality.

Authors:Frank Wencheng Liu, Ryan Wirjadi, Yanjun Lyu, Shiling Dai, Byron Lahey, Assegid Kidane, Robert LiKamWa
Title: Vibr-eau: Emulating Fluid Behavior in Vessel Handling through Vibrotactile Actuators
Abstract:
Existing methods of haptic feedback for virtual fluids are challenging to scale, lack durability for long-term rough use, and fail to fully capture the expressive haptic qualities of fluids. To overcome these limitations, we present Vibr-eau, a physical system designed to emulate the sensation of virtual fluids in vessels using vibrotactile actuators. Vibr-eau uses spatial and temporal vibrotactile feedback to create realistic haptic sensations within a 3D-printed vessel. When the users are in the virtual environment and interact with the physical vessel, vibration impulses are triggered and the user will feel like there is fluid in the vessel. We explore the impact of motor density, direct touch, and vibration strength on users' perception of virtual fluid sensations. User studies reveal that Vibr-eau effectively simulates dynamic weight shifts and fluid-like sensations, with participants reporting experiences closely resembling real-world interactions with fluids. Our findings contribute to the development of adaptable and scalable haptic applications for virtual fluids, providing insights into optimizing parameters for realistic and perceptually faithful simulated fluid experiences in VR environments.

Authors:Atefeh Shokrizadeh, Boniface Bahati Tadjuidje, Shivam Kumar, Sohan Kamble, Jinghui Cheng
Title: Dancing With Chains: Ideating Under Constraints With UIDEC in UI/UX Design
Abstract:
UI/UX designers often work under constraints like brand identity, design norms, and industry guidelines. How these constraints impact designers' ideation and exploration processes should be addressed in creativity-support tools for design. Through an exploratory interview study, we identified three designer personas with varying views on having constraints in the ideation process, which guided the creation of UIDEC, a GenAI-powered tool for supporting creativity under constraints. UIDEC allows designers to specify project details, such as purpose, target audience, industry, and design styles, based on which it generates diverse design examples that adhere to these constraints, with minimal need to write prompts. In a user evaluation involving designers representing the identified personas, participants found UIDEC compatible with their existing ideation process and useful for creative inspiration, especially when starting new projects. Our work provides design implications to AI-powered tools that integrate constraints during UI/UX design ideation to support creativity.

Authors:Sarah Bonna, Yu-Cheng Huang, Ekaterina Novozhilova, Sejin Paik, Zhengyang Shan, Michelle Yilin Feng, Ge Gao, Yonish Tayal, Rushil Kulkarni, Jialin Yu, Nupur Divekar, Deepti Ghadiyaram, Derry Wijaya, Margrit Betke
Title: DebiasPI: Inference-time Debiasing by Prompt Iteration of a Text-to-Image Generative Model
Abstract:
Ethical intervention prompting has emerged as a tool to counter demographic biases of text-to-image generative AI models. Existing solutions either require to retrain the model or struggle to generate images that reflect desired distributions on gender and race. We propose an inference-time process called DebiasPI for Debiasing-by-Prompt-Iteration that provides prompt intervention by enabling the user to control the distributions of individuals' demographic attributes in image generation. DebiasPI keeps track of which attributes have been generated either by probing the internal state of the model or by using external attribute classifiers. Its control loop guides the text-to-image model to select not yet sufficiently represented attributes, With DebiasPI, we were able to create images with equal representations of race and gender that visualize challenging concepts of news headlines. We also experimented with the attributes age, body type, profession, and skin tone, and measured how attributes change when our intervention prompt targets the distribution of an unrelated attribute type. We found, for example, if the text-to-image model is asked to balance racial representation, gender representation improves but the skin tone becomes less diverse. Attempts to cover a wide range of skin colors with various intervention prompts showed that the model struggles to generate the palest skin tones. We conducted various ablation studies, in which we removed DebiasPI's attribute control, that reveal the model's propensity to generate young, male characters. It sometimes visualized career success by generating two-panel images with a pre-success dark-skinned person becoming light-skinned with success, or switching gender from pre-success female to post-success male, thus further motivating ethical intervention prompting with DebiasPI.

Authors:Shivani Kapania, Stephanie Ballard, Alex Kessler, Jennifer Wortman Vaughan
Title: Examining the Expanding Role of Synthetic Data Throughout the AI Development Pipeline
Abstract:
Alongside the growth of generative AI, we are witnessing a surge in the use of synthetic data across all stages of the AI development pipeline. It is now common practice for researchers and practitioners to use one large generative model (which we refer to as an auxiliary model) to generate synthetic data that is used to train or evaluate another, reconfiguring AI workflows and reshaping the very nature of data. While scholars have raised concerns over the risks of synthetic data, policy guidance and best practices for its responsible use have not kept up with these rapidly evolving industry trends, in part because we lack a clear picture of current practices and challenges. Our work aims to address this gap. Through 29 interviews with AI practitioners and responsible AI experts, we examine the expanding role of synthetic data in AI development. Our findings reveal how auxiliary models are now widely used across the AI development pipeline. Practitioners describe synthetic data as crucial for addressing data scarcity and providing a competitive edge, noting that evaluation of generative AI systems at scale would be infeasible without auxiliary models. However, they face challenges controlling the outputs of auxiliary models, generating data that accurately depict underrepresented groups, and scaling data validation practices that are based primarily on manual inspection. We detail general limitations of and ethical considerations for synthetic data and conclude with a proposal of concrete steps towards the development of best practices for its responsible use.

Authors:Ruyuan Wan, Lingbo Tong, Tiffany Knearem, Toby Jia-Jun Li, Ting-Hao 'Kenneth' Huang, Qunfang Wu
Title: Hashtag Re-Appropriation for Audience Control on Recommendation-Driven Social Media Xiaohongshu (rednote)
Abstract:
Algorithms have played a central role in personalized recommendations on social media. However, they also present significant obstacles for content creators trying to predict and manage their audience reach. This issue is particularly challenging for marginalized groups seeking to maintain safe spaces. Our study explores how women on Xiaohongshu (rednote), a recommendation-driven social platform, proactively re-appropriate hashtags (e.g., #Baby Supplemental Food) by using them in posts unrelated to their literal meaning. The hashtags were strategically chosen from topics that would be uninteresting to the male audience they wanted to block. Through a mixed-methods approach, we analyzed the practice of hashtag re-appropriation based on 5,800 collected posts and interviewed 24 active users from diverse backgrounds to uncover users' motivations and reactions towards the re-appropriation. This practice highlights how users can reclaim agency over content distribution on recommendation-driven platforms, offering insights into self-governance within algorithmic-centered power structures.

Authors:JiWoo Kim, Minsuk Chang, JinYeong Bak
Title: Beyond Turn-taking: Introducing Text-based Overlap into Human-LLM Interactions
Abstract:
Traditional text-based human-AI interactions often adhere to a strict turn-taking approach. In this research, we propose a novel approach that incorporates overlapping messages, mirroring natural human conversations. Through a formative study, we observed that even in text-based contexts, users instinctively engage in overlapping behaviors like "A: Today I went to-" "B: yeah." To capitalize on these insights, we developed OverlapBot, a prototype chatbot where both AI and users can initiate overlapping. Our user study revealed that OverlapBot was perceived as more communicative and immersive than traditional turn-taking chatbot, fostering faster and more natural interactions. Our findings contribute to the understanding of design space for overlapping interactions. We also provide recommendations for implementing overlap-capable AI interactions to enhance the fluidity and engagement of text-based conversations.

Authors:Sharon Temtsin, Diane Proudfoot, David Kaber, Christoph Bartneck
Title: The Imitation Game According To Turing
Abstract:
The current cycle of hype and anxiety concerning the benefits and risks to human society of Artificial Intelligence is fuelled, not only by the increasing use of generative AI and other AI tools by the general public, but also by claims made on behalf of such technology by popularizers and scientists. In particular, recent studies have claimed that Large Language Models (LLMs) can pass the Turing Test-a goal for AI since the 1950s-and therefore can "think". Large-scale impacts on society have been predicted as a result. Upon detailed examination, however, none of these studies has faithfully applied Turing's original instructions. Consequently, we conducted a rigorous Turing Test with GPT-4-Turbo that adhered closely to Turing's instructions for a three-player imitation game. We followed established scientific standards where Turing's instructions were ambiguous or missing. For example, we performed a Computer-Imitates-Human Game (CIHG) without constraining the time duration and conducted a Man-Imitates-Woman Game (MIWG) as a benchmark. All but one participant correctly identified the LLM, showing that one of today's most advanced LLMs is unable to pass a rigorous Turing Test. We conclude that recent extravagant claims for such models are unsupported, and do not warrant either optimism or concern about the social impact of thinking machines.

Authors:Winona Graham, Russell Drinkwater, Joshua Kelson, Muhammad Ashad Kabir
Title: Self-Guided Virtual Reality Therapy for Anxiety: A Systematic Review
Abstract:
Virtual reality (VR) technology can be used to treat anxiety symptoms and disorders. However, most VR interventions for anxiety have been therapist guided rather than self-guided. This systematic review aimed to examine the effectiveness and user experience (i.e., usability, acceptability, safety, and attrition rates) of self-guided VR therapy interventions in people with any anxiety condition as well as provide future research directions. Peer-reviewed journal articles reporting on self-guided VR interventions for anxiety were sought from the Cochrane Library, IEEE Explore Digital Library, PsycINFO, PubMED, Scopus, and Web of Science databases. Study data from the eligible articles were extracted, tabulated, and addressed with a narrative synthesis. A total of 21 articles met the inclusion criteria. The findings revealed that self-guided VR interventions for anxiety can provide an effective treatment of social anxiety disorder, public speaking anxiety, and specific phobias. User experiences outcomes of safety, usability, and acceptability were generally positive and the average attrition rate was low. However, there was a lack of standardised assessments to measure user experiences. Self-guided VR for anxiety can provide an engaging approach for effectively and safely treating common anxiety conditions. Nevertheless, more experimental studies are required to examine their use in underrepresented anxiety populations, their long-term treatment effects beyond 12 months, and compare their effectiveness against other self-help interventions for anxiety (e.g., internet interventions and bibliotherapy).

Authors:Emily Tseng, Meg Young, Marianne Aubin Le Quéré, Aimee Rinehart, Harini Suresh
Title: "Ownership, Not Just Happy Talk": Co-Designing a Participatory Large Language Model for Journalism
Abstract:
Journalism has emerged as an essential domain for understanding the uses, limitations, and impacts of large language models (LLMs) in the workplace. News organizations face divergent financial incentives: LLMs already permeate newswork processes within financially constrained organizations, even as ongoing legal challenges assert that AI companies violate their copyright. At stake are key questions about what LLMs are created to do, and by whom: How might a journalist-led LLM work, and what can participatory design illuminate about the present-day challenges about adapting ``one-size-fits-all'' foundation models to a given context of use? In this paper, we undertake a co-design exploration to understand how a participatory approach to LLMs might address opportunities and challenges around AI in journalism. Our 20 interviews with reporters, data journalists, editors, labor organizers, product leads, and executives highlight macro, meso, and micro tensions that designing for this opportunity space must address. From these desiderata, we describe the result of our co-design work: organizational structures and functionality for a journalist-controlled LLM. In closing, we discuss the limitations of commercial foundation models for workplace use, and the methodological implications of applying participatory methods to LLM co-design.

Authors:Shalutha Rajapakshe, Jean-Marc Odobez, Emmanuel Senft
Title: Giving Sense to Inputs: Toward an Accessible Control Framework for Shared Autonomy
Abstract:
While shared autonomy offers significant potential for assistive robotics, key questions remain about how to effectively map 2D control inputs to 6D robot motions. An intuitive framework should allow users to input commands effortlessly, with the robot responding as expected, without users needing to anticipate the impact of their inputs. In this article, we propose a dynamic input mapping framework that links joystick movements to motions on control frames defined along a trajectory encoded with canal surfaces. We evaluate our method in a user study with 20 participants, demonstrating that our input mapping framework reduces the workload and improves usability compared to a baseline mapping with similar motion encoding. To prepare for deployment in assistive scenarios, we built on the development from the accessible gaming community to select an accessible control interface. We then tested the system in an exploratory study, where three wheelchair users controlled the robot for both daily living activities and a creative painting task, demonstrating its feasibility for users closer to our target population.

Authors:Olya Rezaeian, Alparslan Emrah Bayrak, Onur Asan
Title: Explainability and AI Confidence in Clinical Decision Support Systems: Effects on Trust, Diagnostic Performance, and Cognitive Load in Breast Cancer Care
Abstract:
Artificial Intelligence (AI) has demonstrated potential in healthcare, particularly in enhancing diagnostic accuracy and decision-making through Clinical Decision Support Systems (CDSSs). However, the successful implementation of these systems relies on user trust and reliance, which can be influenced by explainable AI. This study explores the impact of varying explainability levels on clinicians trust, cognitive load, and diagnostic performance in breast cancer detection. Utilizing an interrupted time series design, we conducted a web-based experiment involving 28 healthcare professionals. The results revealed that high confidence scores substantially increased trust but also led to overreliance, reducing diagnostic accuracy. In contrast, low confidence scores decreased trust and agreement while increasing diagnosis duration, reflecting more cautious behavior. Some explainability features influenced cognitive load by increasing stress levels. Additionally, demographic factors such as age, gender, and professional role shaped participants' perceptions and interactions with the system. This study provides valuable insights into how explainability impact clinicians' behavior and decision-making. The findings highlight the importance of designing AI-driven CDSSs that balance transparency, usability, and cognitive demands to foster trust and improve integration into clinical workflows.

Authors:Huichen Will Wang, Larry Birnbaum, Vidya Setlur
Title: Jupybara: Operationalizing a Design Space for Actionable Data Analysis and Storytelling with LLMs
Abstract:
Mining and conveying actionable insights from complex data is a key challenge of exploratory data analysis (EDA) and storytelling. To address this challenge, we present a design space for actionable EDA and storytelling. Synthesizing theory and expert interviews, we highlight how semantic precision, rhetorical persuasion, and pragmatic relevance underpin effective EDA and storytelling. We also show how this design space subsumes common challenges in actionable EDA and storytelling, such as identifying appropriate analytical strategies and leveraging relevant domain knowledge. Building on the potential of LLMs to generate coherent narratives with commonsense reasoning, we contribute Jupybara, an AI-enabled assistant for actionable EDA and storytelling implemented as a Jupyter Notebook extension. Jupybara employs two strategies -- design-space-aware prompting and multi-agent architectures -- to operationalize our design space. An expert evaluation confirms Jupybara's usability, steerability, explainability, and reparability, as well as the effectiveness of our strategies in operationalizing the design space framework with LLMs.

Authors:Joy Ming, Hawi H Tolera, Jiamin Tu, Ella Yitzhaki, Chit Sum Eunice Ngai, Madeline Sterling, Ariel C Avgar, Aditya Vashistha, Nicola Dell
Title: Exploring Data-Driven Advocacy in Home Health Care Work
Abstract:
This paper explores opportunities and challenges for data-driven advocacy to support home care workers, an often overlooked group of low-wage, frontline health workers. First, we investigate what data to collect and how to collect it in ways that preserve privacy and avoid burdening workers. Second, we examine how workers and advocates could use collected data to strengthen individual and collective advocacy efforts. Our qualitative study with 11 workers and 15 advocates highlights tensions between workers' desires for individual and immediate benefits and advocates' preferences to prioritize more collective and long-term benefits. We also uncover discrepancies between participants' expectations for how data might transform advocacy and their on-the-ground experiences collecting and using real data. Finally, we discuss future directions for data-driven worker advocacy, including combining different kinds of data to ameliorate challenges, leveraging advocates as data stewards, and accounting for workers' and organizations' heterogeneous goals.

Authors:Runhua Zhang, Jiaqi Gan, Shangyuan Gao, Siyi Chen, Xinyu Wu, Dong Chen, Yulin Tian, Qi Wang, Pengcheng An
Title: Walk in Their Shoes to Navigate Your Own Path: Learning About Procrastination Through A Serious Game
Abstract:
Procrastination, the voluntary delay of tasks despite potential negative consequences, has prompted numerous time and task management interventions in the HCI community. While these interventions have shown promise in addressing specific behaviors, psychological theories suggest that learning about procrastination itself may help individuals develop their own coping strategies and build mental resilience. However, little research has explored how to support this learning process through HCI approaches. We present ProcrastiMate, a text adventure game where players learn about procrastination's causes and experiment with coping strategies by guiding in-game characters in managing relatable scenarios. Our field study with 27 participants revealed that ProcrastiMate facilitated learning and self-reflection while maintaining psychological distance, motivating players to integrate newly acquired knowledge in daily life. This paper contributes empirical insights on leveraging serious games to facilitate learning about procrastination and offers design implications for addressing psychological challenges through HCI approaches.

Authors:Hemant Purohit, Cody Buntain, Amanda Lee Hughes, Steve Peterson, Valerio Lorini, Carlos Castillo
Title: Engage and Mobilize! Understanding Evolving Patterns of Social Media Usage in Emergency Management
Abstract:
The work of Emergency Management (EM) agencies requires timely collection of relevant data to inform decision-making for operations and public communication before, during, and after a disaster. However, the limited human resources available to deploy for field data collection is a persistent problem for EM agencies. Thus, many of these agencies have started leveraging social media as a supplemental data source and a new venue to engage with the public. While prior research has analyzed the potential benefits and attitudes of practitioners and the public when leveraging social media during disasters, a gap exists in the critical analysis of the actual practices and uses of social media among EM agencies, across both geographical regions and phases of the EM lifecycle - typically mitigation, preparedness, response, and recovery. In this paper, we conduct a mixed-method analysis to update and fill this gap on how EM practitioners in the U.S. and Europe use social media, building on a survey study of about 150 professionals and a follow-up interview study with 11 participants. The results indicate that using social media is no longer a non-traditional practice in operational and informational processes for the decision-making of EM agencies working at both the local level (e.g., county or town) and non-local level (e.g., state/province, federal/national) for emergency management. Especially, the practitioners affiliated with agencies working at the local level have a very high perceived value of social media for situational awareness (e.g., analyzing disaster extent and impact) and public communication (e.g., disseminating timely information and correcting errors in crisis coverage). We conclude with the policy, technological, and socio-technical needs to design future social media analytics systems to support the work of EM agencies in such communication including the applications of AI.

Authors:Yinuo Qin, Richard T. Lee, Paul Sajda
Title: Perception of an AI Teammate in an Embodied Control Task Affects Team Performance, Reflected in Human Teammates' Behaviors and Physiological Responses
Abstract:
The integration of artificial intelligence (AI) into human teams is widely expected to enhance performance and collaboration. However, our study reveals a striking and counterintuitive result: human-AI teams performed worse than human-only teams, especially when task difficulty increased. Using a virtual reality-based sensorimotor task, we observed that the inclusion of an active human-like AI teammate disrupted team dynamics, leading to elevated arousal, reduced engagement, and diminished communication intensity among human participants. These effects persisted even as the human teammates' perception of the AI teammate improved over time. These findings challenge prevailing assumptions about the benefits of AI in team settings and highlight the critical need for human-centered AI design to mitigate adverse physiological and behavioral impacts, ensuring more effective human-AI collaboration.

Authors:Yue Fu, Michele Newman, Lewis Going, Qiuzi Feng, Jin Ha Lee
Title: Exploring the Collaborative Co-Creation Process with AI: A Case Study in Novice Music Production
Abstract:
Artificial intelligence is reshaping creative domains, yet its co-creative processes, especially in group settings with novice users, remain under explored. To bridge this gap, we conducted a case study in a college-level course where nine undergraduate students were tasked with creating three original music tracks using AI tools over 10 weeks. The study spanned the entire creative journey from ideation to releasing these songs on Spotify. Participants leveraged AI for music and lyric production, cover art, and distribution. Our findings highlight how AI transforms creative workflows: accelerating ideation but compressing the traditional preparation stage, and requiring novices to navigate a challenging idea selection and validation phase. We also identified a new "collaging and refinement" stage, where participants creatively combined diverse AI-generated outputs into cohesive works. Furthermore, AI influenced group social dynamics and role division among human creators. Based on these insights, we propose the Human-AI Co-Creation Stage Model and the Human-AI Agency Model, offering new perspectives on collaborative co-creation with AI.

Authors:Leah Hope Ajmani, Talia Bhatt, Michael Ann Devito
Title: Moving Towards Epistemic Autonomy: A Paradigm Shift for Centering Participant Knowledge
Abstract:
Justice, epistemology, and marginalization are rich areas of study in HCI. And yet, we repeatedly find platforms and algorithms that push communities further into the margins. In this paper, we propose epistemic autonomy -- one's ability to govern knowledge about themselves -- as a necessary HCI paradigm for working with marginalized communities. We establish epistemic autonomy by applying the transfeminine principle of autonomy to the problem of epistemic injustice. To articulate the harm of violating one's epistemic autonomy, we present six stories from two trans women: (1) a transfem online administrator and (2) a transfem researcher. We then synthesize our definition of epistemic autonomy in research into a research paradigm. Finally, we present two variants of common HCI methods, autoethnography and asynchronous remote communities, that stem from these beliefs. We discuss how CHI is uniquely situated to champion this paradigm and, thereby, the epistemic autonomy of our research participants.

Authors:Ethan Wilson, Naveen Sendhilnathan, Charlie S. Burlingham, Yusuf Mansour, Robert Cavin, Sai Deep Tetali, Ajoy Savio Fernandes, Michael J. Proulx
Title: Eye Gaze as a Signal for Conveying User Attention in Contextual AI Systems
Abstract:
Advanced multimodal AI agents can now collaborate with users to solve challenges in the world. Yet, these emerging contextual AI systems rely on explicit communication channels between the user and system. We hypothesize that implicit communication of the user's interests and intent would reduce friction and improve user experience when collaborating with AI agents. In this work, we explore the potential of wearable eye tracking to convey signals about user attention. We measure the eye tracking signal quality requirements to effectively map gaze traces to physical objects, then conduct experiments that provide visual scanpath history as additional context when querying vision language models. Our results show that eye tracking provides high value as a user attention signal and can convey important context about the user's current task and interests, improving understanding of contextual AI agents.

Authors:Yoonsang Kim, Zainab Aamir, Mithilesh Singh, Saeed Boorboor, Klaus Mueller, Arie E. Kaufman
Title: Explainable XR: Understanding User Behaviors of XR Environments using LLM-assisted Analytics Framework
Abstract:
We present Explainable XR, an end-to-end framework for analyzing user behavior in diverse eXtended Reality (XR) environments by leveraging Large Language Models (LLMs) for data interpretation assistance. Existing XR user analytics frameworks face challenges in handling cross-virtuality - AR, VR, MR - transitions, multi-user collaborative application scenarios, and the complexity of multimodal data. Explainable XR addresses these challenges by providing a virtuality-agnostic solution for the collection, analysis, and visualization of immersive sessions. We propose three main components in our framework: (1) A novel user data recording schema, called User Action Descriptor (UAD), that can capture the users' multimodal actions, along with their intents and the contexts; (2) a platform-agnostic XR session recorder, and (3) a visual analytics interface that offers LLM-assisted insights tailored to the analysts' perspectives, facilitating the exploration and analysis of the recorded XR session data. We demonstrate the versatility of Explainable XR by demonstrating five use-case scenarios, in both individual and collaborative XR applications across virtualities. Our technical evaluation and user studies show that Explainable XR provides a highly usable analytics solution for understanding user actions and delivering multifaceted, actionable insights into user behaviors in immersive environments.

Authors:Kaitlynn Taylor Pineda, Ethan Brown, Chien-Ming Huang
Title: "See You Later, Alligator": Impacts of Robot Small Talk on Task, Rapport, and Interaction Dynamics in Human-Robot Collaboration
Abstract:
Small talk can foster rapport building in human-human teamwork; yet how non-anthropomorphic robots, such as collaborative manipulators commonly used in industry, may capitalize on these social communications remains unclear. This work investigates how robot-initiated small talk influences task performance, rapport, and interaction dynamics in human-robot collaboration. We developed an autonomous robot system that assists a human in an assembly task while initiating and engaging in small talk. A user study ($N = 58$) was conducted in which participants worked with either a functional robot, which engaged in only task-oriented speech, or a social robot, which also initiated small talk. Our study found that participants in the social condition reported significantly higher levels of rapport with the robot. Moreover, all participants in the social condition responded to the robot's small talk attempts; 59% initiated questions to the robot, and 73% engaged in lingering conversations after requesting the final task item. Although active working times were similar across conditions, participants in the social condition recorded longer task durations than those in the functional condition. We discuss the design and implications of robot small talk in shaping human-robot collaboration.

Authors:Maryam Arab, Jenny T. Liang, Valentina Hong, Thomas D. LaToza
Title: How Developers Choose Debugging Strategies for Challenging Web Application Defects
Abstract:
Effective debugging is a crucial aspect of software development, demanding problem-solving skills, expertise, and appropriate tools. Although previous research has studied expert developers' debugging strategies, the specific factors influencing strategy choice in complex scenarios remain underexplored. To investigate these contextual factors, we conducted two studies. First, we surveyed 35 developers to identify experiences with challenging debugging problems and contextual complexities. Second, we held semi-structured interviews with 16 experienced developers to gain deeper insight into strategic reasoning for complex debugging tasks. Insights from both groups enriched our understanding of debugging strategies at different expertise levels. We found that contextual factors interact in complex ways, and combinations of factors influence strategy choice, evolving throughout the debugging process. Hypothesis making is the baseline for debugging, with experience and code familiarity crucial for strategy selection. Our results show a gap between learning and effectively practicing strategies in challenging contexts, highlighting the need for carefully designed debugging tools and educational frameworks that align with problem contexts.

Authors:Shuangjiang Xue, Pierre Le Bras, David A. Robb, Mike J. Chantler, Stefano Padilla
Title: Visual Exploration of Stopword Probabilities in Topic Models
Abstract:
Stopword removal is a critical stage in many Machine Learning methods but often receives little consideration, it interferes with the model visualizations and disrupts user confidence. Inappropriately chosen or hastily omitted stopwords not only lead to suboptimal performance but also significantly affect the quality of models, thus reducing the willingness of practitioners and stakeholders to rely on the output visualizations. This paper proposes a novel extraction method that provides a corpus-specific probabilistic estimation of stopword likelihood and an interactive visualization system to support their analysis. We evaluated our approach and interface using real-world data, a commonly used Machine Learning method (Topic Modelling), and a comprehensive qualitative experiment probing user confidence. The results of our work show that our system increases user confidence in the credibility of topic models by (1) returning reasonable probabilities, (2) generating an appropriate and representative extension of common stopword lists, and (3) providing an adjustable threshold for estimating and analyzing stopwords visually. Finally, we discuss insights, recommendations, and best practices to support practitioners while improving the output of Machine Learning methods and topic model visualizations with robust stopword analysis and removal.

Authors:P. D. Magnus, Alessandra Buccella, Jason D'Cruz
Title: Chatbot apologies: Beyond bullshit
Abstract:
Apologies serve essential functions for moral agents such as expressing remorse, taking responsibility, and repairing trust. LLM-based chatbots routinely produce output that has the linguistic form of an apology. However, they do this simply because they are echoing the kinds of things that humans say. Moreover, there are reasons to think that chatbots are not the kind of linguistic or moral agents capable of apology. To put the point bluntly: Chatbot apologies are bullshit. This paper explores this concern and develops it beyond the epithet, drawing on the nature of morally serious apologies, the linguistic agency required to perform them, and the moral agency required for them to matter. We conclude by considering some consequences for how chatbots should be designed and how we ought to think about them.

Authors:Omar Mena, Alexandre Kouyoumdjian, Lonni Besançon, Michael Gleicher, Ivan Viola, Anders Ynnerman
Title: Augmenting a Large Language Model with a Combination of Text and Visual Data for Conversational Visualization of Global Geospatial Data
Abstract:
We present a method for augmenting a Large Language Model (LLM) with a combination of text and visual data to enable accurate question answering in visualization of scientific data, making conversational visualization possible. LLMs struggle with tasks like visual data interaction, as they lack contextual visual information. We address this problem by merging a text description of a visualization and dataset with snapshots of the visualization. We extract their essential features into a structured text file, highly compact, yet descriptive enough to appropriately augment the LLM with contextual information, without any fine-tuning. This approach can be applied to any visualization that is already finally rendered, as long as it is associated with some textual description.

Authors:Ji Eun Kim, Seura Ha, Sangmi Kim, Libby Hemphill
Title: The Spread of Virtual Gifting in Live Streaming: The Case of Twitch
Abstract:
This paper examines how gifting spreads among viewers on Twitch, one of the largest live streaming platforms worldwide. Twitch users can give gift subscriptions to other viewers in the chat room, with the majority of gifters opting for community gifting, which is gifting to randomly selected viewers. We identify the random nature of gift-receiving in our data as a natural experiment setting. We investigate whether gift recipients pay it forward, considering various gift types that may either promote or deter the spread of gifting. Our findings reveal that Twitch viewers who receive gift subscriptions are generally more likely to pay it forward than non-recipients, and the positive impact of gift-receiving becomes stronger when the recipient is the sole beneficiary of the giver's gifting behavior. However, we found that gifts from frequent gifters discourage recipients from paying it forward, and gifts from anonymous gifters do not influence the likelihood of viewers becoming future gifters. This research contributes to the existing literature on the spread of online prosocial behavior by providing robust evidence and suggests practical strategies for promoting online gifting.

Authors:Jihun Han, Dominik Karbowski, Ayman Moawad, Namdoo Kim, Aymeric Rousseau, Shihong Fan, Jason Hoon Lee, Jinho Ha
Title: Processing and Analyzing Real-World Driving Data: Insights on Trips, Scenarios, and Human Driving Behaviors
Abstract:
Analyzing large volumes of real-world driving data is essential for providing meaningful and reliable insights into real-world trips, scenarios, and human driving behaviors. To this end, we developed a multi-level data processing approach that adds new information, segments data, and extracts desired parameters. Leveraging a confidential but extensive dataset (over 1 million km), this approach leads to three levels of in-depth analysis: trip, scenario, and driving. The trip-level analysis explains representative properties observed in real-world trips, while the scenario-level analysis focuses on scenario conditions resulting from road events that reduce vehicle speed. The driving-level analysis identifies the cause of driving regimes for specific situations and characterizes typical human driving behaviors. Such analyses can support the design of both trip- and scenario-based tests, the modeling of human drivers, and the establishment of guidelines for connected and automated vehicles.

Authors:Li Zhang, Jiyao Liu
Title: Subject Disentanglement Neural Network for Speech Envelope Reconstruction from EEG
Abstract:
Reconstructing speech envelopes from EEG signals is essential for exploring neural mechanisms underlying speech perception. Yet, EEG variability across subjects and physiological artifacts complicate accurate reconstruction. To address this problem, we introduce Subject Disentangling Neural Network (SDN-Net), which disentangles subject identity information from reconstructed speech envelopes to enhance cross-subject reconstruction accuracy. SDN-Net integrates three key components: MLA-Codec, MPN-MI, and CTA-MTDNN. The MLA-Codec, a fully convolutional neural network, decodes EEG signals into speech envelopes. The CTA-MTDNN module, a multi-scale time-delay neural network with channel and temporal attention, extracts subject identity features from EEG signals. Lastly, the MPN-MI module, a mutual information estimator with a multi-layer perceptron, supervises the removal of subject identity information from the reconstructed speech envelope. Experiments on the Auditory EEG Decoding Dataset demonstrate that SDN-Net achieves superior performance in inner- and cross-subject speech envelope reconstruction compared to recent state-of-the-art methods.

Authors:Nessrine Farhat, Amine Bohi, Leila Ben Letaifa, Rim Slama
Title: CG-MER: A Card Game-based Multimodal dataset for Emotion Recognition
Abstract:
The field of affective computing has seen significant advancements in exploring the relationship between emotions and emerging technologies. This paper presents a novel and valuable contribution to this field with the introduction of a comprehensive French multimodal dataset designed specifically for emotion recognition. The dataset encompasses three primary modalities: facial expressions, speech, and gestures, providing a holistic perspective on emotions. Moreover, the dataset has the potential to incorporate additional modalities, such as Natural Language Processing (NLP) to expand the scope of emotion recognition research. The dataset was curated through engaging participants in card game sessions, where they were prompted to express a range of emotions while responding to diverse questions. The study included 10 sessions with 20 participants (9 females and 11 males). The dataset serves as a valuable resource for furthering research in emotion recognition and provides an avenue for exploring the intricate connections between human emotions and digital technologies.

Authors:Femi Olugbon, Nozhan Ghoreishi, Ming-Chun Huang, Wenyao Xu, Diliang Chen
Title: Reliable Vertical Ground Reaction Force Estimation with Smart Insole During Walking
Abstract:
The vertical ground reaction force (vGRF) and its characteristic weight acceptance and push-off peaks measured during walking are important for gait and biomechanical analysis. Current wearable vGRF estimation methods suffer from drifting errors or low generalization performances, limiting their practical application. This paper proposes a novel method for reliably estimating vGRF and its characteristic peaks using data collected from the smart insole, including inertial measurement unit data and the newly introduced center of the pressed sensor data. These data were fused with machine learning algorithms including artificial neural networks, random forest regression, and bi-directional long-short-term memory. The proposed method outperformed the state-of-the-art methods with the root mean squared error, normalized root mean squared error, and correlation coefficient of 0.024 body weight (BW), 1.79% BW, and 0.997 in intra-participant testing, and 0.044 BW, 3.22% BW, and 0.991 in inter-participant testing, respectively. The difference between the reference and estimated weight acceptance and push-off peak values are 0.022 BW and 0.017 BW with a delay of 1.4% and 1.8% of the gait cycle for the intra-participant testing and 0.044 BW and 0.025 BW with a delay of 1.5% and 2.3% of the gait cycle for the inter-participant testing. The results indicate that the proposed vGRF estimation method has the potential to achieve accurate vGRF measurement during walking in free living environments.

Authors:Diego Vaquero-Melchor, Ana M. Bernardos
Title: Enhancing Interaction with Augmented Reality through Mid-Air Haptic Feedback: Architecture Design and User Feedback
Abstract:
The integration of haptics within Augmented Reality may help to deliver an enriched experience, while facilitating the performance of specific actions (e.g. repositioning or resizin ) that are still dependent on the user's skills. This paper gathers the description of a flexible architecture designed to deploy haptically-enabled AR applications. The haptic feedback may be generated through a variety of devices (e.g., wearable, graspable, or mid-air ones), and the architecture facilitates handling the specificity of each. For this reason, it is discussed how to generate a haptic representation of a 3D digital object depending on the application and the target device. Additionally, it is included an analysis of practical, relevant issues that arise when setting up a system to work with specific devices like Head-Mounted Displays (e.g., HoloLens) and mid-air haptic devices (e.g., Ultrahaptics UHK), such as the alignment between the real world and the virtual one. The architecture applicability is demonstrated through the implementation of two applications: Form Inspector and Simon Game, built for HoloLens and iOS mobile phones for visualization and for UHK for mid-air haptics delivery. These applications have been used by nine users to explore the efficiency, meaningfulness, and usefulness of mid-air haptics for form perception, object resizing, and push interaction tasks. Results show that, although mobile interaction is preferred when this option is available, haptics turn out to be more meaningful in identifying shapes when compared to what users initially expect and in contributing to the execution of resizing tasks. Moreover, this preliminary user study reveals that users may be expecting a tailored interface metaphor, not necessarily inspired in natural interaction.

Authors:Mohamed Ala Yahyaoui, Mouaad Oujabour, Leila Ben Letaifa, Amine Bohi
Title: Multi-face emotion detection for effective Human-Robot Interaction
Abstract:
The integration of dialogue interfaces in mobile devices has become ubiquitous, providing a wide array of services. As technology progresses, humanoid robots designed with human-like features to interact effectively with people are gaining prominence, and the use of advanced human-robot dialogue interfaces is continually expanding. In this context, emotion recognition plays a crucial role in enhancing human-robot interaction by enabling robots to understand human intentions. This research proposes a facial emotion detection interface integrated into a mobile humanoid robot, capable of displaying real-time emotions from multiple individuals on a user interface. To this end, various deep neural network models for facial expression recognition were developed and evaluated under consistent computer-based conditions, yielding promising results. Afterwards, a trade-off between accuracy and memory footprint was carefully considered to effectively implement this application on a mobile humanoid robot.

Authors:Gianna Williams, Maya De Los Santos, Alexandra To, Saiph Savage
Title: Data Enrichment Work and AI Labor in Latin America and the Caribbean
Abstract:
The global AI surge demands crowdworkers from diverse languages and cultures. They are pivotal in labeling data for enabling global AI systems. Despite global significance, research has primarily focused on understanding the perspectives and experiences of US and India crowdworkers, leaving a notable gap. To bridge this, we conducted a survey with 100 crowdworkers across 16 Latin American and Caribbean countries. We discovered that these workers exhibited pride and respect for their digital labor, with strong support and admiration from their families. Notably, crowd work was also seen as a stepping stone to financial and professional independence. Surprisingly, despite wanting more connection, these workers also felt isolated from peers and doubtful of others' labor quality. They resisted collaboration and gender-based tools, valuing gender-neutrality. Our work advances HCI understanding of Latin American and Caribbean crowdwork, offering insights for digital resistance tools for the region.

Authors:Pedro Rodrigues, Claudia Quaresma, Maria Costa, Filipe Luz, Maria Micaela Fonseca
Title: Virtual Reality-Based Telerehabilitation for Upper Limb Recovery Post-Stroke: A Systematic Review of Design Principles, Monitoring, Safety, and Engagement Strategies
Abstract:
Stroke rehabilitation continues to face challenges in accessibility and patient engagement, where traditional approaches often fall short. Virtual reality (VR)-based telerehabilitation offers a promising avenue, by enabling home-based recovery through immersive environments and gamification. This systematic review evaluates current VR solutions for upper-limb post-stroke recovery, focusing on design principles, safety measures, patient-therapist communication, and strategies to promote motivation and adherence. Following PRISMA 2020 guidelines, a comprehensive search was conducted across PubMed, IEEE Xplore, and ScienceDirect. The review reveals a scarcity of studies meeting the inclusion criteria, possibly reflecting the challenges inherent in the current paradigm of VR telerehabilitation systems. Although these systems have potential to enhance accessibility and patient autonomy, they often lack standardized safety protocols and reliable real-time monitoring. Human-centered design principles are evident in some solutions, but inconsistent patient involvement during the development process limits their usability and clinical relevance. Furthermore, communication between patients and therapists remains constrained by technological barriers, although advancements in real-time feedback and adaptive systems offer promising solutions. This review underscores the potential of VR telerehabilitation to address critical needs in upper-limb stroke recovery while highlighting the importance of addressing existing limitations to ensure broader clinical implementation and improved patient outcomes.

Authors:Alice Nardelli, Lorenzo Landolfi, Dario Pasquali, Antonio Sgorbissa, Francesco Rea, Carmine Recchiuto
Title: Toward a Universal Concept of Artificial Personality: Implementing Robotic Personality in a Kinova Arm
Abstract:
The fundamental role of personality in shaping interactions is increasingly being exploited in robotics. A carefully designed robotic personality has been shown to improve several key aspects of Human-Robot Interaction (HRI). However, the fragmentation and rigidity of existing approaches reveal even greater challenges when applied to non-humanoid robots. On one hand, the state of the art is very dispersed; on the other hand, Industry 4.0 is moving towards a future where humans and industrial robots are going to coexist. In this context, the proper design of a robotic personality can lead to more successful interactions. This research takes a first step in that direction by integrating a comprehensive cognitive architecture built upon the definition of robotic personality - validated on humanoid robots - into a robotic Kinova Jaco2 arm. The robot personality is defined through the cognitive architecture as a vector in the three-dimensional space encompassing Conscientiousness, Extroversion, and Agreeableness, affecting how actions are executed, the action selection process, and the internal reaction to environmental stimuli. Our main objective is to determine whether users perceive distinct personalities in the robot, regardless of its shape, and to understand the role language plays in shaping these perceptions. To achieve this, we conducted a user study comprising 144 sessions of a collaborative game between a Kinova Jaco2 arm and participants, where the robot's behavior was influenced by its assigned personality. Furthermore, we compared two conditions: in the first, the robot communicated solely through gestures and action choices, while in the second, it also utilized verbal interaction.

Authors:Divya Mani Adhikari, Alexander Hartland, Ingmar Weber, Vikram Kamath Cannanure
Title: Exploring LLMs for Automated Generation and Adaptation of Questionnaires
Abstract:
Effective questionnaire design improves the validity of the results, but creating and adapting questionnaires across contexts is challenging due to resource constraints and limited expert access. Recently, the emergence of LLMs has led researchers to explore their potential in survey research. In this work, we focus on the suitability of LLMs in assisting the generation and adaptation of questionnaires. We introduce a novel pipeline that leverages LLMs to create new questionnaires, pretest with a target audience to determine potential issues and adapt existing standardized questionnaires for different contexts. We evaluated our pipeline for creation and adaptation through two studies on Prolific, involving 238 participants from the US and 118 participants from South Africa. Our findings show that participants found LLM-generated text clearer, LLM-pretested text more specific, and LLM-adapted questions slightly clearer and less biased than traditional ones. Our work opens new opportunities for LLM-driven questionnaire support in survey research.

Authors:Yang Hong, Ru-Yun Tseng, Ying-Yu Chen
Title: Balancing Sleep and Study: Cultural Contexts in Family Informatics for Taiwanese Parents and Children
Abstract:
This study examines the intersection of academic pressure and sleep within Taiwanese families, revealing how cultural norms and expectations shape sleep practices. Through interviews and two-week diaries from eleven families, we found that academic demands significantly influence children's sleep patterns, leading to reduced sleep duration and varied sleep schedules. Our research highlights the importance of integrating care and attuning into the design of sleep-tracking technologies, advocating for a family informatics approach that considers both health needs and social expectations. By exploring these dynamics, we contribute to a broader understanding of family contexts in diverse cultural settings and offer insights for more inclusive technology design.

Authors:Rosalie Lin, Aditi Maheshwari, Jung Wook Park, Andreea Danielescu
Title: ExoFabric: A Re-moldable Textile System for Creating Customizable Soft Goods and Wearable Applications
Abstract:
Fabric has been a fundamental part of human life for thousands of years, providing comfort, protection, and aesthetic expression. While modern advancements have enhanced fabric's functionality, it remains static and unchangeable, failing to adapt to our evolving body shapes and preferences. This lack of adaptability can lead to unsustainable practices, as consumers often buy more items to meet their changing needs. In this paper, we propose ExoFabric, a re-moldable fabric system for customized soft goods applications. We created ExoFabric by embedding thermoplastic threads into fabric through computerized embroidery to allow for tunability between rigid plastic and conformable fabric. We defined a library of design primitives to enable geometric formability, stiffness, and stretchability by identifying suitable fabrics, threads, embroidery parameters, and machine limitations. To facilitate practical applications, we demonstrated practical methods for linking parameters to application requirements, showcasing form-fitting wearables, structural support, and shape-changeable furniture for repeatable or one-time customization.

Authors:Michael F. Xu, Bilge Mutlu
Title: Exploring the Use of Robots for Diary Studies
Abstract:
As interest in studying in-the-wild human-robot interaction grows, there is a need for methods to collect data over time and in naturalistic or potentially private environments. HRI researchers have increasingly used the diary method for these studies, asking study participants to self-administer a structured data collection instrument, i.e., a diary, over a period of time. Although the diary method offers a unique window into settings that researchers may not have access to, they also lack the interactivity and probing that interview-based methods offer. In this paper, we explore a novel data collection method in which a robot plays the role of an interactive diary. We developed the Diary Robot system and performed in-home deployments for a week to evaluate the feasibility and effectiveness of this approach. Using traditional text-based and audio-based diaries as benchmarks, we found that robots are able to effectively elicit the intended information. We reflect on our findings, and describe scenarios where the utilization of robots in diary studies as a data collection instrument may be especially applicable.

Authors:Phillip Richter, Heiko Wersing, Anna-Lisa Vollmer
Title: Improving Human-Robot Teaching by Quantifying and Reducing Mental Model Mismatch
Abstract:
The rapid development of artificial intelligence and robotics has had a significant impact on our lives, with intelligent systems increasingly performing tasks traditionally performed by humans. Efficient knowledge transfer requires matching the mental model of the human teacher with the capabilities of the robot learner. This paper introduces the Mental Model Mismatch (MMM) Score, a feedback mechanism designed to quantify and reduce mismatches by aligning human teaching behavior with robot learning behavior. Using Large Language Models (LLMs), we analyze teacher intentions in natural language to generate adaptive feedback. A study with 150 participants teaching a virtual robot to solve a puzzle game shows that intention-based feedback significantly outperforms traditional performance-based feedback or no feedback. The results suggest that intention-based feedback improves instructional outcomes, improves understanding of the robot's learning process and reduces misconceptions. This research addresses a critical gap in human-robot interaction (HRI) by providing a method to quantify and mitigate discrepancies between human mental models and robot capabilities, with the goal of improving robot learning and human teaching effectiveness.

Authors:Noyon Kumar Sarkar, Moumita Roy, Md. Maniruzzaman
Title: Brain Controlled Wheelchair with Smart Feature
Abstract:
In Asia, many individuals with disabilities rely on wheelchairs for mobility. However, some people, such as those who are fully disabled or paralyzed, cannot use traditional wheelchairs despite having fully functioning cognitive abilities. To address this issue, we propose the development of an electric wheelchair that can be controlled using EEG signals and eye blinks. The project utilizes a MindWave Mobile device and Arduino to enable seamless control. Additionally, various sensors are incorporated to enhance the system's reliability. An ultrasonic sensor helps avoid unexpected collisions, while a smoke sensor detects hazardous smoke levels, triggering an automatic alert via a short message to a designated person. Similarly, if the passenger falls from the wheelchair, a notification will also be sent. The wheelchair's movement is controlled via an Android application, with eye-blink detection serving as the primary input method for navigation. This innovative design offers a cost-effective solution, making it accessible for widespread use. By integrating these advanced features, the system can be implemented on motorized wheelchairs to better support individuals with disabilities and enhance their independence.

Authors:Yasaman Hakiminejad, Elizabeth Pantesco, Arash Tavakoli
Title: Shaping Passenger Experience: An Eye-Tracking Study of Public Transportation Built Environment
Abstract:
Designing public transportation cabins that effectively engage passengers and encourage more sustainable mobility options requires a deep understanding of how users from different backgrounds, visually interact with these environments. The following study employs eye-tracking technology to investigate visual attention patterns across six distinct cabin designs, ranging from the current and poorly maintained versions to enhanced, biophilic focused, cyclist-friendly, and productivity-focused configurations. A total of N:304 participants engaged with each cabin design while their eye movements such as Fixation Counts, Time to First Fixation (TFF), First Fixation Duration (FFD), Stationary Gaze Entropy (SGE), and Gaze Transition Entropy (GTE) were recorded. Results revealed that alternative cabin configurations consistently exhibited shorter TFFs and lower entropy measures compared to the baseline current version. Specifically, designs incorporating natural elements and biophilic aspects, streamlined layouts, or functional amenities, facilitated quicker orientation and more structured gaze patterns, indicating enhanced visual engagement and possibly reduced cognitive load. In contrast, the poorly maintained cabin design was associated with higher entropy values, suggesting more scattered and less predictable visual exploration. Demographic factors, particularly ethnicity, significantly influenced FFD in certain designs, with Non-white participants showing reduced fixation durations in the enhanced and poorly maintained environments highlighting the importance of inclusive design considerations. Moreover, transportation-related demographic factors such as frequency of public transport use, trip purpose, and duration of use significantly influenced visual attention metrics in various cabin designs.

Authors:Jonas Oppenlaender, Simo Hosio
Title: Keeping Score: A Quantitative Analysis of How the CHI Community Appreciates Its Milestones
Abstract:
The ACM CHI Conference has a tradition of citing its intellectual heritage. At the same time, we know CHI is highly diverse and evolving. In this highly dynamic context, it is not clear how the CHI community continues to appreciate its milestones (within and outside of CHI). We present an investigation into how the community's citations to milestones have evolved over 43 years of CHI Proceedings (1981-2024). Forgetting curves plotted for each year suggest that milestones are slowly fading from the CHI community's collective memory. However, the picture is more nuanced when we trace citations to the top-cited milestones over time. We identify three distinct types of milestones cited at CHI, a typology of milestone contributions, and define the Milestone Coefficient as a metric to assess the impact of milestone papers on a continuous scale. Further, we provide empirical evidence of a Matthew effect at CHI. We discuss the broader ramifications for the CHI community and the field of HCI.

Authors:Sebastian Kruegel, Andreas Ostermaier, Matthias Uhl
Title: ChatGPT's advice drives moral judgments with or without justification
Abstract:
Why do users follow moral advice from chatbots? A chatbot is not an authoritative moral advisor, but it can generate seemingly plausible arguments. Users do not follow reasoned more readily than unreasoned advice, though, we find in an experiment. However, this is also true if we attribute advice to a moral advisor, not a chatbot. Hence, it seems that advice offers users a cheap way to escape from a moral dilemma. This is a concern that chatbots do not raise, but they exacerbate it as they make advice easily accessible. We conclude that it takes ethical in addition to digital literacy to harness users against moral advice from chatbots.

Authors:Jumana Almahmoud, Marc Facciotti, Michele Igo, Kamali Sripathi, David Karger
Title: Enhancing User Engagement in Large-Scale Social Annotation Platforms: Community-Based Design Interventions and Implications for Large Language Models (LLMs)
Abstract:
Social annotation platforms enable student engagement by integrating discussions directly into course materials. However, in large online courses, the sheer volume of comments can overwhelm students and impede learning. This paper investigates community-based design interventions on a social annotation platform (NB) to address this challenge and foster more meaningful online educational discussions. By examining student preferences and reactions to different curation strategies, this research aims to optimize the utility of social annotations in educational contexts. A key emphasis is placed on how the visibility of comments shapes group interactions, guides conversational flows, and enriches learning experiences. The study combined iterative design and development with two large-scale experiments to create and refine comment curation strategies, involving thousands of students. The study introduced specific features of the platform, such as targeted comment visibility controls, which demonstrably improved peer interactions and reduced discussion overload. These findings inform the design of next-generation social annotation systems and highlight opportunities to integrate Large Language Models (LLMs) for key activities like summarizing annotations, improving clarity in student writing, and assisting instructors with efficient comment curation.

Authors:Kenneth Ge, JooYoung Seo
Title: StereoMath: An Accessible and Musical Equation Editor
Abstract:
For blind and low-vision (BLV) individuals, digital math communication is uniquely difficult due to the lack of accessible tools. Currently, the state of the art is either code-based, like LaTeX, or WYSIWYG, like visual editors. However, both paradigms view math communication as primarily a visual typesetting problem, and may be accessible but difficult to use. In this paper, we present an equation editor that is built from the ground up with BLV accessibility in mind. Specifically, we notice that two of the biggest barriers with current technology are the high cognitive load and the lack of spatial relationships. Thus, we build an editor that uses spatial audio cues, muscle memory, tones, and more intuitive navigation to properly contextualize math equations. We discuss how this new paradigm can enable new levels of math communication, engagement, and literacy. Finally, we discuss natural next steps.

Authors:Chun-Hsiung Tseng, Hao-Chiang Koong Lin, Andrew Chih-Wei Huang, Jia-Rou Lin
Title: Personalized Programming Education: Using Machine Learning to Boost Learning Performance Based on Students' Personality Traits
Abstract:
Studies have indicated that personality is related to achievement, and several personality assessment models have been developed. However, most are either questionnaires or based on marker systems, which entails limitations. We proposed a physiological signal based model, thereby ensuring the objectivity of the data and preventing unreliable responses. Thirty participants were recruited from the Department of Electrical Engineering of Yuan Ze University in Taiwan. Wearable sensors were used to collect physiological signals as the participants watched and summarized a video. They then completed a personality questionnaire based on the big five factor markers system. The results were used to construct a personality prediction model, which revealed that galvanic skin response and heart rate variance were key factors predicting extroversion; heart rate variance also predicted agreeableness and conscientiousness. The results of this experiment can elucidate students personality traits, which can help educators select the appropriate pedagogical methods.

Authors:Jie Gao, Zhiyao Shu, Shun Yi Yeo
Title: MindCoder: Automated and Controllable Reasoning Chain in Qualitative Analysis
Abstract:
Extracting insights from qualitative analysis involves a series of reasoning steps, such as open coding, grouping, and identifying themes. We introduce the MindCoder reasoning chain, built on Chain-of-Thought (CoT) prompting, to support the insight extraction process step by step-including topic clustering, code labeling, conceptualization, and reporting. We designed the MindCoder web application to help users 1) automatically run this reasoning chain (i.e., obtain analysis report results in approximately 3-5 minutes) and 2) interactively control the reasoning process on demand. Our technical evaluations assess its reliability across various data types and demonstrate that simulated human iteration can potentially enhance coding quality. A user study further confirmed positive feedback regarding MindCoder's automation and its on-demand reasoning functionality.

Authors:Chun-Hsiung Tseng, Hao-Chiang Koong Lin, Yung-Hui Chen, Jia-Rou Lin, Andrew Chih-Wei Huang
Title: Do Students with Different Personality Traits Demonstrate Different Physiological Signals in Video-based Learning?
Abstract:
Past researches show that personality trait is a strong predictor for ones academic performance. Today, mature and verified marker systems for assessing personality traits already exist. However, marker systems-based assessing methods have their own limitations. For example, dishonest responses cannot be avoided. In this research, the goal is to develop a method that can overcome the limitations. The proposed method will rely on physiological signals for the assessment. Thirty participants have participated in this experiment. Based on the statistical results, we found that there are correlations between students personality traits and their physiological signal change when learning via videos. Specifically, we found that participants degree of extraversion, agreeableness, conscientiousness, and openness to experiences are correlated with the variance of heart rates, the variance of GSR values, and the skewness of voice frequencies, etc.

Authors:Xingyu Bruce Liu, Shitao Fang, Weiyan Shi, Chien-Sheng Wu, Takeo Igarashi, Xiang Anthony Chen
Title: Proactive Conversational Agents with Inner Thoughts
Abstract:
One of the long-standing aspirations in conversational AI is to allow them to autonomously take initiatives in conversations, i.e., being proactive. This is especially challenging for multi-party conversations. Prior NLP research focused mainly on predicting the next speaker from contexts like preceding conversations. In this paper, we demonstrate the limitations of such methods and rethink what it means for AI to be proactive in multi-party, human-AI conversations. We propose that just like humans, rather than merely reacting to turn-taking cues, a proactive AI formulates its own inner thoughts during a conversation, and seeks the right moment to contribute. Through a formative study with 24 participants and inspiration from linguistics and cognitive psychology, we introduce the Inner Thoughts framework. Our framework equips AI with a continuous, covert train of thoughts in parallel to the overt communication process, which enables it to proactively engage by modeling its intrinsic motivation to express these thoughts. We instantiated this framework into two real-time systems: an AI playground web app and a chatbot. Through a technical evaluation and user studies with human participants, our framework significantly surpasses existing baselines on aspects like anthropomorphism, coherence, intelligence, and turn-taking appropriateness.

Authors:Olivia Richards, Keith L. Obstein, Nabil Simaan
Title: Exploring Accelerated Skill Acquisition via Tandem Training for Colonoscopy
Abstract:
New endoscopists require a large volume of expert-proctored colonoscopies to attain minimal competency. Developing multi-fingered, synchronized control of a colonoscope requires significant time and exposure to the device. Current training methods inhibit this development by relying on tool hand-off for expert demonstrations. There is a need for colonoscopy training tools that enable in-hand expert guidance in real-time. We present a new concept of a tandem training system that uses a telemanipulated preceptor colonoscope to guide novice users as they perform a colonoscopy. This system is capable of dual-control and can automatically toggle between expert and novice control of a standard colonoscope's angulation control wheels. Preliminary results from a user study with novice and expert users show the effectiveness of this device as a skill acquisition tool. We believe that this device has the potential to accelerate skill acquisition for colonoscopy and, in the future, enable individualized instruction and responsive teaching through bidirectional actuation.

Authors:Stefan Buijsman, Sarah E. Carter, Juan Pablo Bermúdez
Title: Autonomy by Design: Preserving Human Autonomy in AI Decision-Support
Abstract:
AI systems increasingly support human decision-making across domains of professional, skill-based, and personal activity. While previous work has examined how AI might affect human autonomy globally, the effects of AI on domain-specific autonomy -- the capacity for self-governed action within defined realms of skill or expertise -- remain understudied. We analyze how AI decision-support systems affect two key components of domain-specific autonomy: skilled competence (the ability to make informed judgments within one's domain) and authentic value-formation (the capacity to form genuine domain-relevant values and preferences). By engaging with prior investigations and analyzing empirical cases across medical, financial, and educational domains, we demonstrate how the absence of reliable failure indicators and the potential for unconscious value shifts can erode domain-specific autonomy both immediately and over time. We then develop a constructive framework for autonomy-preserving AI support systems. We propose specific socio-technical design patterns -- including careful role specification, implementation of defeater mechanisms, and support for reflective practice -- that can help maintain domain-specific autonomy while leveraging AI capabilities. This framework provides concrete guidance for developing AI systems that enhance rather than diminish human agency within specialized domains of action.

Authors:Israel Fianyi, Soonja Yeom, Ju-Hyun Shin
Title: Comparative Studies: Cloud-Enabled Adaptive Learning System for Scalable Education in Sub-Saharan
Abstract:
The integration of cloud computing in education can revolutionise learning in advanced (Australia & South Korea) and middle-income (Ghana & Nigeria) countries, while offering scalable, cost-effective and equitable access to adaptive learning systems. This paper explores how cloud computing and adaptive learning technologies are deployed across different socio-economic and infrastructure contexts. The study identifies enabling factors and systematic challenges, providing insights into how cloud-based education can be tailored to bridge the digital and educational divide globally.

Authors:Xiaoxiao Yang, Chao Feng, Jiancheng Chen
Title: Neuro-Informed Joint Learning Enhances Cognitive Workload Decoding in Portable BCIs
Abstract:
Portable and wearable consumer-grade electroencephalography (EEG) devices, like Muse headbands, offer unprecedented mobility for daily brain-computer interface (BCI) applications, including cognitive load detection. However, the exacerbated non-stationarity in portable EEG signals constrains data fidelity and decoding accuracy, creating a fundamental trade-off between portability and performance. To mitigate such limitation, we propose MuseCogNet (Muse-based Cognitive Network), a unified joint learning framework integrating self-supervised and supervised training paradigms. In particular, we introduce an EEG-grounded self-supervised reconstruction loss based on average pooling to capture robust neurophysiological patterns, while cross-entropy loss refines task-specific cognitive discriminants. This joint learning framework resembles the bottom-up and top-down attention in humans, enabling MuseCogNet to significantly outperform state-of-the-art methods on a publicly available Muse dataset and establish an implementable pathway for neurocognitive monitoring in ecological settings.

Authors:Noverah Khan, Hira Eiraj Daud, Suleman Shahid
Title: Mind the Dark: A Gamified Exploration of Deceptive Design Awareness for Children in the Digital Age
Abstract:
This paper addresses the critical issue of deceptive design elements prevalent in technology, and their potential impact on children. Recent research highlights the impact of dark patterns on adults and adolescents, while studies involving children are scarce. In an era where children wield greater independence with digital devices, their vulnerability to dark patterns amplifies without early education. Our findings show a significant positive impact of dark pattern education on children's awareness, revealing that heightened awareness considerably alters children's navigation of social media, video games, and streaming platforms. To this end, we developed a gamified application aimed at instructing children on identifying and responding to various dark patterns. Our evaluation results emphasize the critical role of early education in empowering children to recognize and counter deceptive design, thereby cultivating a digitally literate generation capable of making informed choices in the complex landscape of digital technology.

Authors:Tomás Silva Santos Rocha, Anastasiia Mikhailova, Moreno I. Coco, José Santos-Victor
Title: Deep Learning in Mild Cognitive Impairment Diagnosis using Eye Movements and Image Content in Visual Memory Tasks
Abstract:
The global prevalence of dementia is projected to double by 2050, highlighting the urgent need for scalable diagnostic tools. This study utilizes digital cognitive tasks with eye-tracking data correlated with memory processes to distinguish between Healthy Controls (HC) and Mild Cognitive Impairment (MCI), a precursor to dementia. A deep learning model based on VTNet was trained using eye-tracking data from 44 participants (24 MCI, 20 HCs) who performed a visual memory task. The model utilizes both time series and spatial data derived from eye-tracking. It was modified to incorporate scan paths, heat maps, and image content. These modifications also enabled testing parameters such as image resolution and task performance, analyzing their impact on model performance. The best model, utilizing $700\times700px$ resolution heatmaps, achieved 68% sensitivity and 76% specificity. Despite operating under more challenging conditions (e.g., smaller dataset size, shorter task duration, or a less standardized task), the model's performance is comparable to an Alzheimer's study using similar methods (70% sensitivity and 73% specificity). These findings contribute to the development of automated diagnostic tools for MCI. Future work should focus on refining the model and using a standardized long-term visual memory task.

Authors:Varun Sangwan, Heidi Makitalo
Title: Context, Credibility, and Control: User Reflections on AI Assisted Misinformation Tools
Abstract:
This paper investigates how collaborative AI systems can enhance user agency in identifying and evaluating misinformation on social media platforms. Traditional methods, such as personal judgment or basic fact-checking, often fall short when faced with emotionally charged or context-deficient content. To address this, we designed and evaluated an interactive interface that integrates collaborative AI features, including real-time explanations, source aggregation, and debate-style interaction. These elements aim to support critical thinking by providing contextual cues and argumentative reasoning in a transparent, user-centered format. In a user study with 14 participants, 79% found the debate mode more effective than standard chatbot interfaces, and the multiple-source view received an average usefulness rating of 4.6 out of 5. Our findings highlight the potential of context-rich, dialogic AI systems to improve media literacy and foster trust in digital information environments. We argue that future tools for misinformation mitigation should prioritize ethical design, explainability, and interactive engagement to empower users in a post-truth era.

Authors:Zoe Anastasiadou, Andreas Lanitis
Title: Immersive Technologies and Elderly Users: Current use, Limitations and Future Perspectives
Abstract:
The increase of the percentage of elderly population in modern societies dictates the use of emerging technologies as a means of supporting elder members of the society. Within this scope, Extended Reality (XR) technologies pose as a promising technology for improving the daily lives of the elderly population. This paper presents a literature review that describes the most common characteristics of the physical and mental state of the elderly, allowing readers, and specifically XR developers, to understand the main difficulties faced by elderly users of extended reality applications so they can develop accessible, user friendly and engaging applications for the target audience. Furthermore, a review of existing extended reality applications that target the elder population is presented, allowing readers to get acquainted with existing design paradigms that can inspire future developments.

Authors:George Bell, Alma Cantu
Title: Dichoptic Opacity: Managing Occlusion in Stereoscopic Displays via Dichoptic Presentation
Abstract:
Adjusting transparency is a common method of mitigating occlusion but is often detrimental for understanding the relative depth relationships between objects as well as removes potentially important information from the occluding object. We propose using dichoptic opacity, a novel method for occlusion management that contrasts the transparency of occluders presented to each eye. This allows for better simultaneous understanding of both occluder and occluded. A user study highlights the technique's potential, showing strong user engagement and a clear preference for dichoptic opacity over traditional presentations. While it does not determine optimal transparency values, it reveals promising trends in both percentage and range that merit further investigation.

Authors:Mustafa Demir, Jacob Miratsky, Jonathan Nguyen, Chun Kit Chan, Punya Mishra, Abhishek Singharoy
Title: Exploring Artificial Intelligence Tutor Teammate Adaptability to Harness Discovery Curiosity and Promote Learning in the Context of Interactive Molecular Dynamics
Abstract:
This study examines the impact of an Artificial Intelligence tutor teammate (AI) on student curiosity-driven engagement and learning effectiveness during Interactive Molecular Dynamics (IMD) tasks on the Visual Molecular Dynamics platform. It explores the role of the AI's curiosity-triggering and response behaviors in stimulating and sustaining student curiosity, affecting the frequency and complexity of student-initiated questions. The study further assesses how AI interventions shape student engagement, foster discovery curiosity, and enhance team performance within the IMD learning environment. Using a Wizard-of-Oz paradigm, a human experimenter dynamically adjusts the AI tutor teammate's behavior through a large language model. By employing a mixed-methods exploratory design, a total of 11 high school students participated in four IMD tasks that involved molecular visualization and calculations, which increased in complexity over a 60-minute period. Team performance was evaluated through real-time observation and recordings, whereas team communication was measured by question complexity and AI's curiosity-triggering and response behaviors. Cross Recurrence Quantification Analysis (CRQA) metrics reflected structural alignment in coordination and were linked to communication behaviors. High-performing teams exhibited superior task completion, deeper understanding, and increased engagement. Advanced questions were associated with AI curiosity-triggering, indicating heightened engagement and cognitive complexity. CRQA metrics highlighted dynamic synchronization in student-AI interactions, emphasizing structured yet adaptive engagement to promote curiosity. These proof-of-concept findings suggest that the AI's dual role as a teammate and educator indicates its capacity to provide adaptive feedback, sustaining engagement and epistemic curiosity.

Authors:A. Subedi, S. De, L. Cavuoto, S. Schwaitzberg, M. Hackett, J. Norfleet
Title: An Interpretable Transformer-Based Foundation Model for Cross-Procedural Skill Assessment Using Raw fNIRS Signals
Abstract:
Objective skill assessment in high-stakes procedural environments requires models that not only decode underlying cognitive and motor processes but also generalize across tasks, individuals, and experimental contexts. While prior work has demonstrated the potential of functional near-infrared spectroscopy (fNIRS) for evaluating cognitive-motor performance, existing approaches are often task-specific, rely on extensive preprocessing, and lack robustness to new procedures or conditions. Here, we introduce an interpretable transformer-based foundation model trained on minimally processed fNIRS signals for cross-procedural skill assessment. Pretrained using self-supervised learning on data from laparoscopic surgical tasks and endotracheal intubation (ETI), the model achieves greater than 88% classification accuracy on all tasks, with Matthews Correlation Coefficient exceeding 0.91 on ETI. It generalizes to a novel emergency airway procedure--cricothyrotomy--using fewer than 30 labeled samples and a lightweight (less than 2k parameter) adapter module, attaining an AUC greater than 87%. Interpretability is achieved via a novel channel attention mechanism--developed specifically for fNIRS--that identifies functionally coherent prefrontal sub-networks validated through ablation studies. Temporal attention patterns align with task-critical phases and capture stress-induced changes in neural variability, offering insight into dynamic cognitive states.

Authors:Marvin Kopka, Markus A. Feufel
Title: How to Evaluate the Accuracy of Online and AI-Based Symptom Checkers: A Standardized Methodological Framework
Abstract:
Online and AI-based symptom checkers are applications that assist medical laypeople in diagnosing their symptoms and determining which course of action to take. When evaluating these tools, previous studies primarily used an approach introduced a decade ago that lacked any type of quality control. Numerous studies have criticized this approach, and several empirical studies have sought to improve specific aspects of evaluations. However, even after a decade, a high-quality methodological framework for standardizing the evaluation of symptom checkers remains missing. This article synthesizes empirical studies to outline a framework for standardized evaluations based on representative case selection, an externally and internally valid evaluation design, and metrics that increase cross-study comparability. This approach is backed up by several open-access resources to facilitate implementation. Ultimately, this approach should enhance the quality and comparability of future evaluations of online and AI-based symptom checkers to enable meta-analyses and help stakeholders make more informed decisions.

Authors:Galvin Brice S. Lim, Brian Godwin S. Lim, Argel A. Bandala, John Anthony C. Jose, Timothy Scott C. Chu, Edwin Sybingco
Title: AGTCNet: A Graph-Temporal Approach for Principled Motor Imagery EEG Classification
Abstract:
Brain-computer interface (BCI) technology utilizing electroencephalography (EEG) marks a transformative innovation, empowering motor-impaired individuals to engage with their environment on equal footing. Despite its promising potential, developing subject-invariant and session-invariant BCI systems remains a significant challenge due to the inherent complexity and variability of neural activity across individuals and over time, compounded by EEG hardware constraints. While prior studies have sought to develop robust BCI systems, existing approaches remain ineffective in capturing the intricate spatiotemporal dependencies within multichannel EEG signals. This study addresses this gap by introducing the attentive graph-temporal convolutional network (AGTCNet), a novel graph-temporal model for motor imagery EEG (MI-EEG) classification. Specifically, AGTCNet leverages the topographic configuration of EEG electrodes as an inductive bias and integrates graph convolutional attention network (GCAT) to jointly learn expressive spatiotemporal EEG representations. The proposed model significantly outperformed existing MI-EEG classifiers, achieving state-of-the-art performance while utilizing a compact architecture, underscoring its effectiveness and practicality for BCI deployment. With a 49.87% reduction in model size, 64.65% faster inference time, and shorter input EEG signal, AGTCNet achieved a moving average accuracy of 66.82% for subject-independent classification on the BCI Competition IV Dataset 2a, which further improved to 82.88% when fine-tuned for subject-specific classification. On the EEG Motor Movement/Imagery Dataset, AGTCNet achieved moving average accuracies of 64.14% and 85.22% for 4-class and 2-class subject-independent classifications, respectively, with further improvements to 72.13% and 90.54% for subject-specific classifications.

Authors:Zihao You, Michael Crabb
Title: Subtitled Media Adaptations for People with Aphasia: Ongoing Accessibility Barriers and Emerging Design Practices
Abstract:
The consumption of subtitles via TVs, laptops and smartphones has the potential to marginalize people based on their complex accessibility needs. The current one-size-fits-all approach to this accessibility aid is no longer fit for purpose and work is required to look at how it can be adapted to be personalised for individual users based on individual context, content, and consumption habits. People with Aphasia, for example, encounter significant challenges in understanding subtitle texts. We see our work as a call to action for more inclusive practices, focusing on how the thoughts and opinions of people with aphasia can be included in media research. Our work investigates how to develop future media solutions for people with aphasia to create a more inclusive media viewing environment. We believe the key to this is appropriate prototyping tools and methods to allow equitable inclusion in the system design process.

Authors:Meira Gilbert, Miranda Wei, Lindah Kotut
Title: "TikTok, Do Your Thing": User Reactions to Social Surveillance in the Public Sphere
Abstract:
''TikTok, Do Your Thing'' is a viral trend where users attempt to identify strangers they see in public via information crowd-sourcing. The trend started as early as 2021 and users typically engage with it for romantic purposes (similar to a ''Missed Connections'' personal advertisement). This practice includes acts of surveillance and identification in the public sphere, although by peers rather than governments or corporations. To understand users' reactions to this trend we conducted a qualitative analysis of 60 TikTok videos and 1,901 user comments. Of the 60 videos reviewed, we find 19 individuals were successfully identified. We also find that while there were comments expressing disapproval (n=310), more than double the number expressed support (n=883). Supportive comments demonstrated genuine interest and empathy, reflecting evolving conceptions of community and algorithmic engagement. On the other hand, disapproving comments highlighted concerns about inappropriate relationships, stalking, consent, and gendered double standards. We discuss these insights in relation to the normalization of interpersonal surveillance, online stalking, and as an evolution of social surveillance to offer a new perspective on user perceptions surrounding interpersonal surveillance and identification in the public sphere.

Authors:Bei Yi Ng, Jiarui Li, Xinyuan Tong, Kevin Ye, Gauthami Yenne, Varun Chandrasekaran, Jingjie Li
Title: Analyzing Security and Privacy Challenges in Generative AI Usage Guidelines for Higher Education
Abstract:
Educators and learners worldwide are embracing the rise of Generative Artificial Intelligence (GenAI) as it reshapes higher education. However, GenAI also raises significant privacy and security concerns, as models and privacy-sensitive user data, such as student records, may be misused by service providers. Unfortunately, end-users often have little awareness of or control over how these models operate. To address these concerns, universities are developing institutional policies to guide GenAI use while safeguarding security and privacy. This work examines these emerging policies and guidelines, with a particular focus on the often-overlooked privacy and security dimensions of GenAI integration in higher education, alongside other academic values. Through a qualitative analysis of GenAI usage guidelines from universities across 12 countries, we identify key challenges and opportunities institutions face in providing effective privacy and security protections, including the need for GenAI safeguards tailored specifically to the academic context.

Authors:Xuefei Hou, Xizhao Tan
Title: Irec: A Metacognitive Scaffolding for Self-Regulated Learning through Just-in-Time Insight Recall: A Conceptual Framework and System Prototype
Abstract:
The core challenge in learning has shifted from knowledge acquisition to effective Self-Regulated Learning (SRL): planning, monitoring, and reflecting on one's learning. Existing digital tools, however, inadequately support metacognitive reflection. Spaced Repetition Systems (SRS) use de-contextualized review, overlooking the role of context, while Personal Knowledge Management (PKM) tools require high manual maintenance. To address these challenges, this paper introduces "Insight Recall," a novel paradigm that conceptualizes the context-triggered retrieval of personal past insights as a metacognitive scaffold to promote SRL. We formalize this paradigm using the Just-in-Time Adaptive Intervention (JITAI) framework and implement a prototype system, Irec, to demonstrate its feasibility. At its core, Irec uses a dynamic knowledge graph of the user's learning history. When a user faces a new problem, a hybrid retrieval engine recalls relevant personal "insights." Subsequently, a large language model (LLM) performs a deep similarity assessment to filter and present the most relevant scaffold in a just-in-time manner. To reduce cognitive load, Irec features a human-in-the-loop pipeline for LLM-based knowledge graph construction. We also propose an optional "Guided Inquiry" module, where users can engage in a Socratic dialogue with an expert LLM, using the current problem and recalled insights as context. The contribution of this paper is a solid theoretical framework and a usable system platform for designing next-generation intelligent learning systems that enhance metacognition and self-regulation.

Authors:M. Michelessa, J. Ng, C. Hurter, B. Y. Lim
Title: Varif.ai to Vary and Verify User-Driven Diversity in Scalable Image Generation
Abstract:
Diversity in image generation is essential to ensure fair representations and support creativity in ideation. Hence, many text-to-image models have implemented diversification mechanisms. Yet, after a few iterations of generation, a lack of diversity becomes apparent, because each user has their own diversity goals (e.g., different colors, brands of cars), and there are diverse attributions to be specified. To support user-driven diversity control, we propose Varif.ai that employs text-to-image and Large Language Models to iteratively i) (re)generate a set of images, ii) verify if user-specified attributes have sufficient coverage, and iii) vary existing or new attributes. Through an elicitation study, we uncovered user needs for diversity in image generation. A pilot validation showed that Varif.ai made achieving diverse image sets easier. In a controlled evaluation with 20 participants, Varif.ai proved more effective than baseline methods across various scenarios. Thus, this supports user control of diversity in image generation for creative ideation and scalable image generation.

Authors:Russell Beale, Eugenia Sergueeva
Title: 5 Days, 5 Stories: Using Technology to Promote Empathy in the Workplace
Abstract:
Empathy is widely recognized as a vital attribute for effective collaboration and communication in the workplace, yet developing empathic skills and fostering it among colleagues remains a challenge. This study explores the potential of a collaborative digital storytelling platform - In Your Shoes - designed to promote empathic listening and interpersonal understanding through the structured exchange of personal narratives. A one-week intervention was conducted with employees from multiple organizations using the platform. Employing a mixed methods approach, we assessed quantitative changes in empathy using the Empathy Quotient (EQ) and qualitatively analyzed participant experiences through grounded theory. While quantitative analysis revealed no statistically significant shift in dispositional empathy, qualitative findings suggested the tool facilitated situational empathy, prompted self-reflection, improved emotional resonance, and enhanced workplace relationships. Participants reported feelings of psychological safety, connection, and, in some cases, therapeutic benefits from sharing and responding to stories. These results highlight the promise of asynchronous, structured narrative-based digital tools for supporting empathic engagement in professional settings, offering insights for the design of emotionally intelligent workplace technologies.

Authors:Jonathan Haberl, Philipp Fleck, Clemens Arth
Title: Virtual Memory for 3D Gaussian Splatting
Abstract:
3D Gaussian Splatting represents a breakthrough in the field of novel view synthesis. It establishes Gaussians as core rendering primitives for highly accurate real-world environment reconstruction. Recent advances have drastically increased the size of scenes that can be created. In this work, we present a method for rendering large and complex 3D Gaussian Splatting scenes using virtual memory. By leveraging well-established virtual memory and virtual texturing techniques, our approach efficiently identifies visible Gaussians and dynamically streams them to the GPU just in time for real-time rendering. Selecting only the necessary Gaussians for both storage and rendering results in reduced memory usage and effectively accelerates rendering, especially for highly complex scenes. Furthermore, we demonstrate how level of detail can be integrated into our proposed method to further enhance rendering speed for large-scale scenes. With an optimized implementation, we highlight key practical considerations and thoroughly evaluate the proposed technique and its impact on desktop and mobile devices.

Authors:Angxuan Chen, Jingjing Lian, Xinran Kuang, Jiyou Jia
Title: Can theory-driven learning analytics dashboard enhance human-AI collaboration in writing learning? Insights from an empirical experiment
Abstract:
The integration of Generative AI (GenAI) into education has raised concerns about over-reliance and superficial learning, particularly in writing tasks in higher education. This study explores whether a theory-driven learning analytics dashboard (LAD) can enhance human-AI collaboration in the academic writing task by improving writing knowledge gains, fostering self-regulated learning (SRL) skills and building different human-AI dialogue characteristics. Grounded in Zimmerman's SRL framework, the LAD provided real-time feedback on learners' goal-setting, writing processes and reflection, while monitoring the quality of learner-AI interactions. A quasi-experiment was conducted involving 52 postgraduate students divided into an experimental group (EG) using the LAD to a control group (CG) without it in a human-AI collaborative writing task. Pre- and post- knowledge tests, questionnaires measuring SRL and cognitive load, and students' dialogue data with GenAI were collected and analyzed. Results showed that the EG achieved significantly higher writing knowledge gains and improved SRL skills, particularly in self-efficacy and cognitive strategies. However, the EG also reported increased test anxiety and cognitive load, possibly due to heightened metacognitive awareness. Epistemic Network Analysis revealed that the EG engaged in more reflective, evaluative interactions with GenAI, while the CG focused on more transactional and information-seeking exchanges. These findings contribute to the growing body of literature on the educational use of GenAI and highlight the importance of designing interventions that complement GenAI tools, ensuring that technology enhances rather than undermines the learning process.

Authors:Feiting Yang, Antoine Moevus, Steve Lévesque
Title: Emotion Detection on User Front-Facing App Interfaces for Enhanced Schedule Optimization: A Machine Learning Approach
Abstract:
Human-Computer Interaction (HCI) has evolved significantly to incorporate emotion recognition capabilities, creating unprecedented opportunities for adaptive and personalized user experiences. This paper explores the integration of emotion detection into calendar applications, enabling user interfaces to dynamically respond to users' emotional states and stress levels, thereby enhancing both productivity and engagement. We present and evaluate two complementary approaches to emotion detection: a biometric-based method utilizing heart rate (HR) data extracted from electrocardiogram (ECG) signals processed through Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) neural networks to predict the emotional dimensions of Valence, Arousal, and Dominance; and a behavioral method analyzing computer activity through multiple machine learning models to classify emotions based on fine-grained user interactions such as mouse movements, clicks, and keystroke patterns. Our comparative analysis, from real-world datasets, reveals that while both approaches demonstrate effectiveness, the computer activity-based method delivers superior consistency and accuracy, particularly for mouse-related interactions, which achieved approximately 90\% accuracy. Furthermore, GRU networks outperformed LSTM models in the biometric approach, with Valence prediction reaching 84.38\% accuracy.

Authors:Claire Yang, Heer Patel, Max Kleiman-Weiner, Maya Cakmak
Title: Preserving Sense of Agency: User Preferences for Robot Autonomy and User Control across Household Tasks
Abstract:
Roboticists often design with the assumption that assistive robots should be fully autonomous. However, it remains unclear whether users prefer highly autonomous robots, as prior work in assistive robotics suggests otherwise. High robot autonomy can reduce the user's sense of agency, which represents feeling in control of one's environment. How much control do users, in fact, want over the actions of robots used for in-home assistance? We investigate how robot autonomy levels affect users' sense of agency and the autonomy level they prefer in contexts with varying risks. Our study asked participants to rate their sense of agency as robot users across four distinct autonomy levels and ranked their robot preferences with respect to various household tasks. Our findings revealed that participants' sense of agency was primarily influenced by two factors: (1) whether the robot acts autonomously, and (2) whether a third party is involved in the robot's programming or operation. Notably, an end-user programmed robot highly preserved users' sense of agency, even though it acts autonomously. However, in high-risk settings, e.g., preparing a snack for a child with allergies, they preferred robots that prioritized their control significantly more. Additional contextual factors, such as trust in a third party operator, also shaped their preferences.

Authors:Lorenzo Porcelli, Francesco Palmieri
Title: Raise Awareness of the Environmental Impacts of Retail Food Products: A User-Centered Scenario-Based Approach
Abstract:
The climate is warming rapidly, and atmospheric concentrations of greenhouse gases (GHGs) are at their highest levels ever recorded. As a result of these climate changes, caused mainly by human activities, disasters have increased fivefold over the past 50 years, causing death and economic loss. Civic engagement and awareness are essential to mitigate climate change and its impacts. In this work, we proposed a user interface that makes users aware of the environmental impact of the food products they buy when shopping. A user-centered scenario-based design was followed in the development of the interface. Gamification elements were added to increase civic participation in climate action.

Authors:Neha Rani, Sharan Majumder, Ishan Bhardwaj, Pedro Guillermo Feijoo Garcia
Title: Can AI support student engagement in classroom activities in higher education?
Abstract:
Lucrative career prospects and creative opportunities often attract students to enroll in computer science majors and pursue advanced studies in the field. Consequently, there has been a significant surge in enrollment in computer science courses, resulting in large class sizes that can range from hundreds to even thousands of students. A common challenge in such large classrooms is the lack of engagement between students and both the instructor and the learning material. However, with advancements in technology and improvements in large language models (LLMs), there is a considerable opportunity to utilize LLM-based AI models, such as conversational artificial intelligence (CAI), to enhance student engagement with learning content in large classes. To explore the potential of CAI to support engagement, especially with learning content, we designed an activity in a software Engineering course (with a large class size) where students used CAI for an in-class activity. We conducted a within-subject investigation in a large classroom at a US university where we compared student engagement during an in-class activity that used CAI tool vs. one without CAI tool. The CAI tool we used was ChatGPT due to its widespread popularity and familiarity. Our results indicate that CAI (ChatGPT) has the potential to support engagement with learning content during in-class activities, especially in large class sizes. We further discuss the implications of our findings.

Authors:Leonie Kallabis, Timo Bertram, Florian Rupp
Title: Deceptive Game Design? Investigating the Impact of Visual Card Style on Player Perception
Abstract:
The visual style of game elements considerably contributes to the overall experience. Aesthetics influence player appeal, while the abilities of game pieces define their in-game functionality. In this paper, we investigate how the visual style of collectible cards influences the players' perception of the card's actual strength in the game. Using the popular trading card game Magic: The Gathering, we conduct a single-blind survey study that examines how players perceive the strength of AI-generated cards that are shown in two contrasting visual styles: cute and harmless, or heroic and mighty. Our analysis reveals that some participants are influenced by a card's visual appearance when judging its in-game strength. Overall, differences in style perception are normally distributed around a neutral center, but individual participants vary in both directions: some generally perceive the cute style to be stronger, whereas others believe that the heroic style is better.

Authors:Ronald Cumbal, Didem Gurdur Broo, Ginevra Castellano
Title: Crowdsourcing eHMI Designs: A Participatory Approach to Autonomous Vehicle-Pedestrian Communication
Abstract:
As autonomous vehicles become more integrated into shared human environments, effective communication with road users is essential for ensuring safety. While previous research has focused on developing external Human-Machine Interfaces (eHMIs) to facilitate these interactions, we argue that involving users in the early creative stages can help address key challenges in the development of this technology. To explore this, our study adopts a participatory, crowd-sourced approach to gather user-generated ideas for eHMI designs. Participants were first introduced to fundamental eHMI concepts, equipping them to sketch their own design ideas in response to scenarios with varying levels of perceived risk. An initial pre-study with 29 participants showed that while they actively engaged in the process, there was a need to refine task objectives and encourage deeper reflection. To address these challenges, a follow-up study with 50 participants was conducted. The results revealed a strong preference for autonomous vehicles to communicate their awareness and intentions using lights (LEDs and projections), symbols, and text. Participants' sketches prioritized multi-modal communication, directionality, and adaptability to enhance clarity, consistently integrating familiar vehicle elements to improve intuitiveness.

Authors:Jaime Banks, Zhixin Li
Title: Conceptualization, Operationalization, and Measurement of Machine Companionship: A Scoping Review
Abstract:
The notion of machine companions has long been embedded in social-technological imaginaries. Recent advances in AI have moved those media musings into believable sociality manifested in interfaces, robotic bodies, and devices. Those machines are often referred to colloquially as "companions" yet there is little careful engagement of machine companionship (MC) as a formal concept or measured variable. This PRISMA-guided scoping review systematically samples, surveys, and synthesizes current scholarly works on MC (N = 71; 2017-2025), to that end. Works varied widely in considerations of MC according to guiding theories, dimensions of a-priori specified properties (subjectively positive, sustained over time, co-active, autotelic), and in measured concepts (with more than 50 distinct measured variables). WE ultimately offer a literature-guided definition of MC as an autotelic, coordinated connection between human and machine that unfolds over time and is subjectively positive.

Authors:Ankolika De, Kelley Cotter, Shaheen Kanthawala, Haley McAtee, Amy Ritchart, Gahana Kadur
Title: "Whoever needs to see it, will see it": Motivations and Labor of Creating Algorithmic Conspirituality Content on TikTok
Abstract:
Recent studies show that users often interpret social media algorithms as mystical or spiritual because of their unpredictability. This invites new questions about how such perceptions affect the content that creators create and the communities they form online. In this study, 14 creators of algorithmic conspirituality content on TikTok were interviewed to explore their interpretations and creation processes influenced by the platform's For You Page algorithm. We illustrate how creators' beliefs interact with TikTok's algorithmic mediation to reinforce and shape their spiritual or relational themes. Furthermore, we show how algorithmic conspirituality content impacts viewers, highlighting its role in generating significant emotional and affective labor for creators, stemming from complex relational dynamics inherent in this content creation. We discuss implications for design to support creators aimed at recognizing the unexpected spiritual and religious experiences algorithms prompt, as well as supporting creators in effectively managing these challenges.

Authors:Nathalia Gomez, S. Sue Batham, Matias Volonte, Tiffany D. Do
Title: Virtual Interviewers, Real Results: Exploring AI-Driven Mock Technical Interviews on Student Readiness and Confidence
Abstract:
Technical interviews are a critical yet stressful step in the hiring process for computer science graduates, often hindered by limited access to practice opportunities. This formative qualitative study (n=20) explores whether a multimodal AI system can realistically simulate technical interviews and support confidence-building among candidates. Participants engaged with an AI-driven mock interview tool featuring whiteboarding tasks and real-time feedback. Many described the experience as realistic and helpful, noting increased confidence and improved articulation of problem-solving decisions. However, challenges with conversational flow and timing were noted. These findings demonstrate the potential of AI-driven technical interviews as scalable and realistic preparation tools, suggesting that future research could explore variations in interviewer behavior and their potential effects on candidate preparation.

Authors:Pranav Pawar, Akshansh Dwivedi, Jenish Boricha, Himanshu Gohil, Aditya Dubey
Title: Optimizing Multilingual Text-To-Speech with Accents & Emotions
Abstract:
State-of-the-art text-to-speech (TTS) systems realize high naturalness in monolingual environments, synthesizing speech with correct multilingual accents (especially for Indic languages) and context-relevant emotions still poses difficulty owing to cultural nuance discrepancies in current frameworks. This paper introduces a new TTS architecture integrating accent along with preserving transliteration with multi-scale emotion modelling, in particularly tuned for Hindi and Indian English accent. Our approach extends the Parler-TTS model by integrating A language-specific phoneme alignment hybrid encoder-decoder architecture, and culture-sensitive emotion embedding layers trained on native speaker corpora, as well as incorporating a dynamic accent code switching with residual vector quantization. Quantitative tests demonstrate 23.7% improvement in accent accuracy (Word Error Rate reduction from 15.4% to 11.8%) and 85.3% emotion recognition accuracy from native listeners, surpassing METTS and VECL-TTS baselines. The novelty of the system is that it can mix code in real time - generating statements such as "Namaste, let's talk about " with uninterrupted accent shifts while preserving emotional consistency. Subjective evaluation with 200 users reported a mean opinion score (MOS) of 4.2/5 for cultural correctness, much better than existing multilingual systems (p<0.01). This research makes cross-lingual synthesis more feasible by showcasing scalable accent-emotion disentanglement, with direct application in South Asian EdTech and accessibility software.

Authors:Chuyao Wang, Patrick Sturgis, Daniel de Kadt
Title: AI labeling reduces the perceived accuracy of online content but has limited broader effects
Abstract:
Explicit labeling of online content produced by artificial intelligence (AI) is a widely mooted policy for ensuring transparency and promoting public confidence. Yet little is known about the scope of AI labeling effects on public assessments of labeled content. We contribute new evidence on this question from a survey experiment using a high-quality nationally representative probability sample (n = 3,861). First, we demonstrate that explicit AI labeling of a news article about a proposed public policy reduces its perceived accuracy. Second, we test whether there are spillover effects in terms of policy interest, policy support, and general concerns about online misinformation. We find that AI labeling reduces interest in the policy, but neither influences support for the policy nor triggers general concerns about online misinformation. We further find that increasing the salience of AI use reduces the negative impact of AI labeling on perceived accuracy, while one-sided versus two-sided framing of the policy has no moderating effect. Overall, our findings suggest that the effects of algorithm aversion induced by AI labeling of online content are limited in scope.

Authors:Mohammad Naiseh, Huseyin Dogan, Stephen Giff, Nan Jiang
Title: Development of a persuasive User Experience Research (UXR) Point of View for Explainable Artificial Intelligence (XAI)
Abstract:
Explainable Artificial Intelligence (XAI) plays a critical role in fostering user trust and understanding in AI-driven systems. However, the design of effective XAI interfaces presents significant challenges, particularly for UX professionals who may lack technical expertise in AI or machine learning. Existing explanation methods, such as SHAP, LIME, and counterfactual explanations, often rely on complex technical language and assumptions that are difficult for non-expert users to interpret. To address these gaps, we propose a UX Research (UXR) Playbook for XAI - a practical framework aimed at supporting UX professionals in designing accessible, transparent, and trustworthy AI experiences. Our playbook offers actionable guidance to help bridge the gap between technical explainability methods and user centred design, empowering designers to create AI interactions that foster better understanding, trust, and responsible AI adoption.

Authors:Mariann Kornelia Smith, Jacqueline Meijer-Irons, Andrew Millar
Title: From 600 Tools to 1 Console: A UX-Driven Transformation
Abstract:
In 2021 the Technical Infrastructure (TI) User Experience (UX) team sent a survey to 10,000 Google Developers (Googlers) and uncovered that Google's internal infrastructure tools were fragmented and inefficient, hindering developers' productivity. Using user centered research and design methodologies the team first created a story map and service blueprint to visualize the relationship between internal applications, then formulated a strategic vision to consolidate tools, streamline workflows, and measure the impact of their work. We secured executive buy-in and delivered incremental improvements.

Authors:Wenqi Guan, Yang Fang
Title: Optimizing Web-Based AI Query Retrieval with GPT Integration in LangChain A CoT-Enhanced Prompt Engineering Approach
Abstract:
Large Language Models have brought a radical change in the process of remote learning students, among other aspects of educative activities. Current retrieval of remote learning resources lacks depth in contextual meaning that provides comprehensive information on complex student queries. This work proposes a novel approach to enhancing remote learning retrieval by integrating GPT-based models within the LangChain framework. We achieve this system in a more intuitive and productive manner using CoT reasoning and prompt engineering. The framework we propose puts much emphasis on increasing the precision and relevance of the retrieval results to return comprehensive and contextually enriched explanations and resources that best suit each student's needs. We also assess the effectiveness of our approach against paradigmatic LLMs and report improvements in user satisfaction and learning outcomes.

Authors:Festus Adedoyin, Huseyin Dogan
Title: Human-Centred AI in FinTech: Developing a User Experience (UX) Research Point of View (PoV) Playbook
Abstract:
Advancements in Artificial Intelligence (AI) have significantly transformed the financial industry, enabling the development of more personalised and adaptable financial products and services. This research paper explores various instances where Human-Centred AI (HCAI) has facilitated these advancements, drawing from contemporary studies and industry progress. The paper examines how the application of HCAI-powered data analytics, machine learning, and natural language processing enables financial institutions to gain a deeper understanding of their customers' unique needs, preferences, and behavioural patterns. This, in turn, allows for the creation of tailored financial solutions that address individual consumer requirements, ultimately enhancing overall user experience and satisfaction. Additionally, the study highlights the integration of AI-powered robo-advisory services, which offer customised investment recommendations and portfolio management tailored to diverse risk profiles and investment goals. Moreover, the paper underscores the role of AI in strengthening fraud detection, risk assessment, and regulatory compliance, leading to a more secure and adaptable financial landscape. The findings of this research demonstrate the substantial impact of Human-Centred AI on the financial industry, offering a strategic framework for financial institutions to leverage these technologies. By incorporating a User Experience Research (UXR) Point of View (PoV), financial institutions can ensure that AI-driven solutions align with user needs and business objectives.

Authors:Jason Dong, Anna Wu
Title: Case Study for Developing a UXR Point of View for FinOps Product Innovation
Abstract:
In the dynamic landscape of Cloud financial management, we are sharing a case study exploring the development of a User Experience Research (UXR) Point of View (PoV) to drive FinOps product innovation. We demonstrate how qualitative and quantitative research methods working together to navigate the challenges of understanding customer needs, aligning cross-functional teams, and prioritizing limited resources. Through a multi-phased research approach, the research team identifies opportunities, quantifies pain points, and segments diverse customer cohorts. This culminated in a UXR PoV that informed the creation of a differentiated product strategy, a 'one-stop shop' dashboard empowering FinOps practitioners with actionable insights and tools. This case study highlights the power of mixed-methods research in uncovering actionable insights that drive impactful product innovation.

Authors:Jonas Lau, Annie Tran
Title: UXR Point of View on Product Feature Prioritization Prior To Multi-Million Engineering Commitments
Abstract:
This paper discusses a popular UX research activity, feature prioritization, using the User Experience Research Point of View (UXR PoV) Playbook framework. We describe an application of multinomial logistic regression, frequently marketed as MaxDiff, for prioritizing product features in consumer product development. It addresses challenges of traditional surveying techniques. We propose a solution using MaxDiff to generate a reliable preference list with a reasonable sample size. We also adapt the MaxDiff method to reduce the number of survey responses in half, making it less tedious from the survey takers' perspective. We present a case study using the adapted MaxDiff method for tablet feature prioritization research involving users with disabilities.

Authors:Richa Gupta, Alexander Htet Kyaw
Title: Insights Informed Generative AI for Design: Incorporating Real-world Data for Text-to-Image Output
Abstract:
Generative AI, specifically text-to-image models, have revolutionized interior architectural design by enabling the rapid translation of conceptual ideas into visual representations from simple text prompts. While generative AI can produce visually appealing images they often lack actionable data for designers In this work, we propose a novel pipeline that integrates DALL-E 3 with a materials dataset to enrich AI-generated designs with sustainability metrics and material usage insights. After the model generates an interior design image, a post-processing module identifies the top ten materials present and pairs them with carbon dioxide equivalent (CO2e) values from a general materials dictionary. This approach allows designers to immediately evaluate environmental impacts and refine prompts accordingly. We evaluate the system through three user tests: (1) no mention of sustainability to the user prior to the prompting process with generative AI, (2) sustainability goals communicated to the user before prompting, and (3) sustainability goals communicated along with quantitative CO2e data included in the generative AI outputs. Our qualitative and quantitative analyses reveal that the introduction of sustainability metrics in the third test leads to more informed design decisions, however, it can also trigger decision fatigue and lower overall satisfaction. Nevertheless, the majority of participants reported incorporating sustainability principles into their workflows in the third test, underscoring the potential of integrated metrics to guide more ecologically responsible practices. Our findings showcase the importance of balancing design freedom with practical constraints, offering a clear path toward holistic, data-driven solutions in AI-assisted architectural design.

Authors:Varun Mannam, Zhenyu Shi
Title: Efficient Retail Video Annotation: A Robust Key Frame Generation Approach for Product and Customer Interaction Analysis
Abstract:
Accurate video annotation plays a vital role in modern retail applications, including customer behavior analysis, product interaction detection, and in-store activity recognition. However, conventional annotation methods heavily rely on time-consuming manual labeling by human annotators, introducing non-robust frame selection and increasing operational costs. To address these challenges in the retail domain, we propose a deep learning-based approach that automates key-frame identification in retail videos and provides automatic annotations of products and customers. Our method leverages deep neural networks to learn discriminative features by embedding video frames and incorporating object detection-based techniques tailored for retail environments. Experimental results showcase the superiority of our approach over traditional methods, achieving accuracy comparable to human annotator labeling while enhancing the overall efficiency of retail video annotation. Remarkably, our approach leads to an average of 2 times cost savings in video annotation. By allowing human annotators to verify/adjust less than 5% of detected frames in the video dataset, while automating the annotation process for the remaining frames without reducing annotation quality, retailers can significantly reduce operational costs. The automation of key-frame detection enables substantial time and effort savings in retail video labeling tasks, proving highly valuable for diverse retail applications such as shopper journey analysis, product interaction detection, and in-store security monitoring.

Authors:Peng Jiang, Vinicius Cezar Monteiro de Lira, Antonio Maiorino
Title: Impact of a Deployed LLM Survey Creation Tool through the IS Success Model
Abstract:
Surveys are a cornerstone of Information Systems (IS) research, yet creating high-quality surveys remains labor-intensive, requiring both domain expertise and methodological rigor. With the evolution of large language models (LLMs), new opportunities emerge to automate survey generation. This paper presents the real-world deployment of an LLM-powered system designed to accelerate data collection while maintaining survey quality. Deploying such systems in production introduces real-world complexity, including diverse user needs and quality control. We evaluate the system using the DeLone and McLean IS Success Model to understand how generative AI can reshape a core IS method. This study makes three key contributions. To our knowledge, this is the first application of the IS Success Model to a generative AI system for survey creation. In addition, we propose a hybrid evaluation framework combining automated and human assessments. Finally, we implement safeguards that mitigate post-deployment risks and support responsible integration into IS workflows.

Authors:Mohamed Masry, Mohamed Amen, Mohamed Elzyat, Mohamed Hamed, Norhan Magdy, Maram Khaled
Title: ETS: Open Vocabulary Electroencephalography-To-Text Decoding and Sentiment Classification
Abstract:
Decoding natural language from brain activity using non-invasive electroencephalography (EEG) remains a significant challenge in neuroscience and machine learning, particularly for open-vocabulary scenarios where traditional methods struggle with noise and variability. Previous studies have achieved high accuracy on small-closed vocabularies, but it still struggles on open vocabularies. In this study, we propose ETS, a framework that integrates EEG with synchronized eye-tracking data to address two critical tasks: (1) open-vocabulary text generation and (2) sentiment classification of perceived language. Our model achieves a superior performance on BLEU and Rouge score for EEG-To-Text decoding and up to 10% F1 score on EEG-based ternary sentiment classification, which significantly outperforms supervised baselines. Furthermore, we show that our proposed model can handle data from various subjects and sources, showing great potential for high performance open vocabulary eeg-to-text system.

Authors:Jules Leguy, Pierre-Antoine Jean, Felipe Torres Figueroa, Sébastien Harispe
Title: WebXAII: an open-source web framework to study human-XAI interaction
Abstract:
This article introduces WebXAII, an open-source web framework designed to facilitate research on human interaction with eXplainable Artificial Intelligence (XAI) systems. The field of XAI is rapidly expanding, driven by the growing societal implications of the widespread adoption of AI (and in particular machine learning) across diverse applications. Researchers who study the interaction between humans and XAI techniques typically develop ad hoc interfaces in order to conduct their studies. These interfaces are usually not shared alongside the results of the studies, which limits their reusability and the reproducibility of experiments. In response, we design and implement WebXAII, a web-based platform that can embody full experimental protocols, meaning that it can present all aspects of the experiment to human participants and record their responses. The experimental protocols are translated into a composite architecture of generic views and modules, which offers a lot of flexibility. The architecture is defined in a structured configuration file, so that protocols can be implemented with minimal programming skills. We demonstrate that WebXAII can effectively embody relevant protocols, by reproducing the protocol of a state-of-the-art study of the literature.

Authors:Emanuel Moss, Elizabeth Watkins, Christopher Persaud, Passant Karunaratne, Dawn Nafus
Title: Controlling Context: Generative AI at Work in Integrated Circuit Design and Other High-Precision Domains
Abstract:
Generative AI tools have become more prevalent in engineering workflows, particularly through chatbots and code assistants. As the perceived accuracy of these tools improves, questions arise about whether and how those who work in high-precision domains might maintain vigilance for errors, and what other aspects of using such tools might trouble their work. This paper analyzes interviews with hardware and software engineers, and their collaborators, who work in integrated circuit design to identify the role accuracy plays in their use of generative AI tools and what other forms of trouble they face in using such tools. The paper inventories these forms of trouble, which are then mapped to elements of generative AI systems, to conclude that controlling the context of interactions between engineers and the generative AI tools is one of the largest challenges they face. The paper concludes with recommendations for mitigating this form of trouble by increasing the ability to control context interactively.

Authors:Griffin Pitts, Neha Rani, Weedguet Mildort, Eva-Marie Cook
Title: Students' Reliance on AI in Higher Education: Identifying Contributing Factors
Abstract:
The increasing availability and use of artificial intelligence (AI) tools in educational settings has raised concerns about students' overreliance on these technologies. Overreliance occurs when individuals accept incorrect AI-generated recommendations, often without critical evaluation, leading to flawed problem solutions and undermining learning outcomes. This study investigates potential factors contributing to patterns of AI reliance among undergraduate students, examining not only overreliance but also appropriate reliance (correctly accepting helpful and rejecting harmful recommendations) and underreliance (incorrectly rejecting helpful recommendations). Our approach combined pre- and post-surveys with a controlled experimental task where participants solved programming problems with an AI assistant that provided both accurate and deliberately incorrect suggestions, allowing direct observation of students' reliance patterns when faced with varying AI reliability. We find that appropriate reliance is significantly related to students' programming self-efficacy, programming literacy, and need for cognition, while showing negative correlations with post-task trust and satisfaction. Overreliance showed significant correlations with post-task trust and satisfaction with the AI assistant. Underreliance was negatively correlated with programming literacy, programming self-efficacy, and need for cognition. Overall, the findings provide insights for developing targeted interventions that promote appropriate reliance on AI tools, with implications for the integration of AI in curriculum and educational technologies.

Authors:Matthias Schonlau, Tiancheng Yang
Title: The Hammock Plot: Where Categorical and Numerical Data Relax Together
Abstract:
Effective methods for visualizing data involving multiple variables, including categorical ones, are limited. The hammock plot (Schonlau, 2003) visualizes both categorical and numerical variables using parallel coordinates. We introduce the Stata implementation hammock. We give numerous examples that explore highlighting, missing values, putting axes on the same scale, and tracing an observation across variables. Further, we introduce parallel univariate plots as an edge case of hammock plots. We also present and make publicly available a new dataset on the 2020 Tour de France.

Authors:David Grüning, Julia Kamin
Title: Prosocial Design in Trust and Safety
Abstract:
This chapter presents an overview of Prosocial Design, an approach to platform design and governance that recognizes design choices influence behavior and that those choices can or should be made toward supporting healthy interactions and other prosocial outcomes. The authors discuss several core principles of Prosocial Design and its relationship to Trust and Safety and other related fields. As a primary contribution, the chapter reviews relevant research to demonstrate how Prosocial Design can be an effective approach to reducing rule-breaking and other harmful behavior and how it can help to stem the spread of harmful misinformation. Prosocial Design is a nascent and evolving field and research is still limited. The authors hope this chapter will not only inspire more research and the adoption of a prosocial design approach, but that it will also provoke discussion about the principles of Prosocial Design and its potential to support Trust and Safety.

Authors:Ziheng Huang, Tal August, Hari Sundaram
Title: TermSight: Making Service Contracts Approachable
Abstract:
Terms of Service (ToS) are ubiquitous, legally binding contracts that govern consumers' digital interactions. However, ToS are not designed to be read: they are filled with pages of ambiguous and complex legal terminology that burden potential users. We introduce TermSight, an intelligent reading interface designed to make ToS more approachable. TermSight offers visual summaries that highlight the relevance and power balance of information in a ToS. TermSight also categorizes and simplifies information within the ToS into concise plain-language summaries. To aid in reading the original text, TermSight offers contextualized definitions and scenarios for unfamiliar phrases. Our within-subjects evaluation of TermSight (N=20) revealed that TermSight significantly reduced the difficulty of reading ToS and increased participants' willingness to do so. We also observed emerging strategies that participants took when interacting with AI-powered features that highlight the diverse ways that TermSight assisted ToS reading.

Authors:Paul van Schaik, Karen Renaud
Title: Extended Version of Paper Presented at ICISSP, Porto 20-22 February, 2025 A Value-Driven Approach to the Online Consent Conundrum -- A Study with the Unemployed
Abstract:
Online services are required to gain informed consent from users to collect, store and analyse their personal data, both intentionally divulged and derived during their use of the service. There are many issues with these forms: they are too long, too complex and demand the user's attention too frequently. Many users consent without reading so do not know what they are agreeing to. As such,granted consent is effectively uninformed. In this paper, we report on two studies we carried out to arrive at a value-driven approach to inform efforts to reduce the length of consent forms. The first study interviewed unemployed users to identify the values they want these forms to satisfy. The second survey study helped us to quantify the values and value creators. To ensure that we understood the particular valuation of the unemployed, we compared their responses to those of an employed demographic and observed no significant differences between their prioritisation on any of the values. However, we did find substantial differences between values and value creators, with effort minimisation being most valued by our participants.

Authors:Olga Vechtomova, Jeff Bos
Title: Reimagining Dance: Real-time Music Co-creation between Dancers and AI
Abstract:
Dance performance traditionally follows a unidirectional relationship where movement responds to music. While AI has advanced in various creative domains, its application in dance has primarily focused on generating choreography from musical input. We present a system that enables dancers to dynamically shape musical environments through their movements. Our multi-modal architecture creates a coherent musical composition by intelligently combining pre-recorded musical clips in response to dance movements, establishing a bidirectional creative partnership where dancers function as both performers and composers. Through correlation analysis of performance data, we demonstrate emergent communication patterns between movement qualities and audio features. This approach reconceptualizes the role of AI in performing arts as a responsive collaborator that expands possibilities for both professional dance performance and improvisational artistic expression across broader populations.

Authors:Bernhard Rieder, Adrian Padilla, Oscar Coromina
Title: Forgetful by Design? A Critical Audit of YouTube's Search API for Academic Research
Abstract:
This paper critically audits the search endpoint of YouTube's Data API (v3), a common tool for academic research. Through systematic weekly searches over six months using eleven queries, we identify major limitations regarding completeness, representativeness, consistency, and bias. Our findings reveal substantial differences between ranking parameters like relevance and date in terms of video recall and precision, with relevance often retrieving numerous off-topic videos. We also find severe temporal decay, as the number of findable videos for a specific period dramatically decreases after just 20-60 days from the publication date, potentially hampering many different research designs. Furthermore, search results lack consistency, with identical queries yielding different video sets over time, compromising replicability. A case study on the European Parliament elections highlights how these issues impact research outcomes. While the paper offers several mitigation strategies, it concludes that the API's search function, potentially prioritizing "freshness" over comprehensive retrieval, is not adequate for robust academic research, especially concerning Digital Services Act requirements.

Authors:Yun Wang, Yan Lu
Title: Interaction, Process, Infrastructure: A Unified Architecture for Human-Agent Collaboration
Abstract:
As AI tools proliferate across domains, from chatbots and copilots to emerging agents, they increasingly support professional knowledge work. Yet despite their growing capabilities, these systems remain fragmented: they assist with isolated tasks but lack the architectural scaffolding for sustained, adaptive collaboration. We propose a layered framework for human-agent systems that integrates three interdependent dimensions: interaction, process, and infrastructure. Crucially, our architecture elevates process to a primary focus by making it explicit, inspectable, and adaptable, enabling humans and agents to align with evolving goals and coordinate over time. This model clarifies limitations of current tools, unifies emerging system design approaches, and reveals new opportunities for researchers and AI system builders. By grounding intelligent behavior in structured collaboration, we reimagine human-agent collaboration not as task-specific augmentation, but as a form of coherent and aligned system for real-world work.

Authors:Mengisti Berihu Girmay, Felix Möhrle
Title: Perspectives on Explanation Formats From Two Stakeholder Groups in Germany: Software Providers and Dairy Farmers
Abstract:
This paper examines the views of software providers in the German dairy industry with regard to dairy farmers' needs for explanation of digital decision support systems. The study is based on mastitis detection in dairy cows using a hypothetical herd management system. We designed four exemplary explanation formats for mastitis assessments with different types of presentation (textual, rule-based, herd comparison, and time series). In our previous study, 14 dairy farmers in Germany had rated these formats in terms of comprehensibility and the trust they would have in a system providing each format. In this study, we repeat the survey with 13 software providers active in the German dairy industry. We ask them how well they think the formats would be received by farmers. We hypothesized that there may be discrepancies between the views of both groups that are worth investigating, partly to find reasons for the reluctance to adopt digital systems. A comparison of the feedback from both groups supports the hypothesis and calls for further investigation. The results show that software providers tend to make assumptions about farmers' preferences that are not necessarily accurate. Our study, although not representative due to the small sample size, highlights the potential benefits of a thorough user requirements analysis (farmers' needs) to improve software adaptation and user acceptance.

Authors:Daniel Zielasko, Ben Rehling, Bernadette von Dawans, Gregor Domes
Title: Do Not Immerse and Drive? Prolonged Effects of Cybersickness on Physiological Stress Markers And Cognitive Performance
Abstract:
Extended exposure to virtual reality environments can induce motion sickness, often referred to as cybersickness, which may lead to physiological stress responses and impaired cognitive performance. This study investigates the aftereffects of VR-induced motion sickness with a focus on physiological stress markers and working memory performance. Using a carousel simulation to elicit cybersickness, we assessed subjective discomfort (SSQ, FMS), physiological stress (salivary cortisol, alpha-amylase, electrodermal activity, heart rate), and cognitive performance (n-Back task) over a 90-minute post-exposure period. Our findings demonstrate a significant increase in both subjective and physiological stress indicators following VR exposure, accompanied by a decline in working memory performance. Notably, delayed symptom progression was observed in a substantial proportion of participants, with some reporting peak symptoms up to 90 minutes post-stimulation. Salivary cortisol levels remained elevated throughout the observation period, indicating prolonged stress recovery. These results highlight the need for longer washout phases in XR research and raise safety concerns for professional applications involving post-exposure task performance.

Authors:Sandro Radovanović, Shuangyu Li
Title: Co-Designing a Chatbot for Culturally Competent Clinical Communication: Experience and Reflections
Abstract:
Clinical communication skills are essential for preparing healthcare professionals to provide equitable care across cultures. However, traditional training with simulated patients can be resource intensive and difficult to scale, especially in under-resourced settings. In this project, we explore the use of an AI-driven chatbot to support culturally competent communication training for medical students. The chatbot was designed to simulate realistic patient conversations and provide structured feedback based on the ACT Cultural Competence model. We piloted the chatbot with a small group of third-year medical students at a UK medical school in 2024. Although we did not follow a formal experimental design, our experience suggests that the chatbot offered useful opportunities for students to reflect on their communication, particularly around empathy and interpersonal understanding. More challenging areas included addressing systemic issues and historical context. Although this early version of the chatbot helped surface some interesting patterns, limitations were also clear, such as the absence of nonverbal cues and the tendency for virtual patients to be overly agreeable. In general, this reflection highlights both the potential and the current limitations of AI tools in communication training. More work is needed to better understand their impact and improve the learning experience.

Authors:Md Mynoddin, Troyee Dev, Rishita Chakma
Title: Brain2Vec: A Deep Learning Framework for EEG-Based Stress Detection Using CNN-LSTM-Attention
Abstract:
Mental stress has become a pervasive factor affecting cognitive health and overall well-being, necessitating the development of robust, non-invasive diagnostic tools. Electroencephalogram (EEG) signals provide a direct window into neural activity, yet their non-stationary and high-dimensional nature poses significant modeling challenges. Here we introduce Brain2Vec, a new deep learning tool that classifies stress states from raw EEG recordings using a hybrid architecture of convolutional, recurrent, and attention mechanisms. The model begins with a series of convolutional layers to capture localized spatial dependencies, followed by an LSTM layer to model sequential temporal patterns, and concludes with an attention mechanism to emphasize informative temporal regions. We evaluate Brain2Vec on the DEAP dataset, applying bandpass filtering, z-score normalization, and epoch segmentation as part of a comprehensive preprocessing pipeline. Compared to traditional CNN-LSTM baselines, our proposed model achieves an AUC score of 0.68 and a validation accuracy of 81.25%. These findings demonstrate Brain2Vec's potential for integration into wearable stress monitoring platforms and personalized healthcare systems.

Authors:Jonathan Grizou, Carlos de la Torre-Ortiz, Tuukka Ruotsalo
Title: Self-Calibrating BCIs: Ranking and Recovery of Mental Targets Without Labels
Abstract:
We consider the problem of recovering a mental target (e.g., an image of a face) that a participant has in mind from paired EEG (i.e., brain responses) and image (i.e., perceived faces) data collected during interactive sessions without access to labeled information. The problem has been previously explored with labeled data but not via self-calibration, where labeled data is unavailable. Here, we present the first framework and an algorithm, CURSOR, that learns to recover unknown mental targets without access to labeled data or pre-trained decoders. Our experiments on naturalistic images of faces demonstrate that CURSOR can (1) predict image similarity scores that correlate with human perceptual judgments without any label information, (2) use these scores to rank stimuli against an unknown mental target, and (3) generate new stimuli indistinguishable from the unknown mental target (validated via a user study, N=53).

Authors:Chirudeep Tupakula, Rittika Shamsuddin
Title: Perception-Driven Bias Detection in Machine Learning via Crowdsourced Visual Judgment
Abstract:
Machine learning systems are increasingly deployed in high-stakes domains, yet they remain vulnerable to bias systematic disparities that disproportionately impact specific demographic groups. Traditional bias detection methods often depend on access to sensitive labels or rely on rigid fairness metrics, limiting their applicability in real-world settings. This paper introduces a novel, perception-driven framework for bias detection that leverages crowdsourced human judgment. Inspired by reCAPTCHA and other crowd-powered systems, we present a lightweight web platform that displays stripped-down visualizations of numeric data (for example-salary distributions across demographic clusters) and collects binary judgments on group similarity. We explore how users' visual perception-shaped by layout, spacing, and question phrasing can signal potential disparities. User feedback is aggregated to flag data segments as biased, which are then validated through statistical tests and machine learning cross-evaluations. Our findings show that perceptual signals from non-expert users reliably correlate with known bias cases, suggesting that visual intuition can serve as a powerful, scalable proxy for fairness auditing. This approach offers a label-efficient, interpretable alternative to conventional fairness diagnostics, paving the way toward human-aligned, crowdsourced bias detection pipelines.

Authors:Barbara Oakley, Michael Johnston, Ken-Zen Chen, Eulho Jung, Terrence J. Sejnowski
Title: The Memory Paradox: Why Our Brains Need Knowledge in an Age of AI
Abstract:
In the age of generative AI and ubiquitous digital tools, human cognition faces a structural paradox: as external aids become more capable, internal memory systems risk atrophy. Drawing on neuroscience and cognitive psychology, this paper examines how heavy reliance on AI systems and discovery-based pedagogies may impair the consolidation of declarative and procedural memory -- systems essential for expertise, critical thinking, and long-term retention. We review how tools like ChatGPT and calculators can short-circuit the retrieval, error correction, and schema-building processes necessary for robust neural encoding. Notably, we highlight striking parallels between deep learning phenomena such as "grokking" and the neuroscience of overlearning and intuition. Empirical studies are discussed showing how premature reliance on AI during learning inhibits proceduralization and intuitive mastery. We argue that effective human-AI interaction depends on strong internal models -- biological "schemata" and neural manifolds -- that enable users to evaluate, refine, and guide AI output. The paper concludes with policy implications for education and workforce training in the age of large language models.

Authors:Rico H Herzog, Till Degkwitz, Trivik Verma
Title: The Urban Model Platform: A Public Backbone for Modeling and Simulation in Urban Digital Twins
Abstract:
Urban digital twins are increasingly perceived as a way to pool the growing digital resources of cities for the purpose of a more sustainable and integrated urban planning. Models and simulations are central to this undertaking: They enable "what if?" scenarios, create insights and describe relationships between the vast data that is being collected. However, the process of integrating and subsequently using models in urban digital twins is an inherently complex undertaking. It raises questions about how to represent urban complexity, how to deal with uncertain assumptions and modeling paradigms, and how to capture underlying power relations. Existent approaches in the domain largely focus on monolithic and centralized solutions in the tradition of neoliberal city-making, oftentimes prohibiting pluralistic and open interoperable models. Using a participatory design for participatory systems approach together with the City of Hamburg, Germany, we find that an open Urban Model Platform can function both as a public technological backbone for modeling and simulation in urban digital twins and as a socio-technical framework for a collaborative and pluralistic representation of urban processes. Such a platform builds on open standards, allows for a decentralized integration of models, enables communication between models and supports a multi-model approach to representing urban systems.

Authors:T. T. J. E. Arets, G. Perugia, M. Houben, W. A. IJsselsteijn
Title: The Role of Generative AI in Facilitating Social Interactions: A Scoping Review
Abstract:
Reduced social connectedness increasingly poses a threat to mental health, life expectancy, and general well-being. Generative AI (GAI) technologies, such as large language models (LLMs) and image generation tools, are increasingly integrated into applications aimed at enhancing human social experiences. Despite their growing presence, little is known about how these technologies influence social interactions. This scoping review investigates how GAI-based applications are currently designed to facilitate social interaction, what forms of social engagement they target, and which design and evaluation methodologies designers use to create and evaluate them. Through an analysis of 30 studies published since 2020, we identify key trends in application domains including storytelling, socio-emotional skills training, reminiscence, collaborative learning, music making, and general conversation. We highlight the role of participatory and co-design approaches in fostering both effective technology use and social engagement, while also examining socio-ethical concerns such as cultural bias and accessibility. This review underscores the potential of GAI to support dynamic and personalized interactions, but calls for greater attention to equitable design practices and inclusive evaluation strategies.

Authors:Dimitar Valkov, Pascal Kockwelp, Florian Daiber, Antonio Krüger
Title: Grasp Prediction based on Local Finger Motion Dynamics
Abstract:
The ability to predict the object the user intends to grasp offers essential contextual information and may help to leverage the effects of point-to-point latency in interactive environments. This paper explores the feasibility and accuracy of real-time recognition of uninstrumented objects based on hand kinematics during reach-to-grasp actions. In a data collection study, we recorded the hand motions of 16 participants while reaching out to grasp and then moving real and synthetic objects. Our results demonstrate that even a simple LSTM network can predict the time point at which the user grasps an object with a precision better than 21 ms and the current distance to this object with a precision better than 1 cm. The target's size can be determined in advance with an accuracy better than 97%. Our results have implications for designing adaptive and fine-grained interactive user interfaces in ubiquitous and mixed-reality environments.

Authors:Luke Halpin, Phillip Benachour, Tracy Hall, Ann-Marie Houghton, Emily Winter
Title: Accessible Design in Integrated Development Environments: A Think Aloud Study Exploring the Experiences of Students with ADHD
Abstract:
Coding forms a key part of computer science education in universities. As part of this education, Integrated Development Environments (IDEs) are essential tools for coding. However, it is currently unknown how the design of an IDE's interface impacts on students with Attention Deficit Hyperactivity Disorder (ADHD). In this study we investigated the use of IDEs by students with ADHD. We conducted a think aloud study with nine university computing students, followed by qualitative observational interviews to analyse their learning and engagement with the Visual Studio Code IDE. The paper reports on these experiences and seeks to understand the role IDEs play in the educational setting. Our work also examines how digital accessibility and usability are considered in the current design of IDEs. We analysed the qualitative data using a thematic analysis and identified three primary themes: self-confidence, interaction, and learning as well as various sub-themes. The themes and their sub-themes illustrate key areas of consideration when designing IDEs for students with ADHD. The primary findings highlight experiences of frustration and barriers in the current design and layout of IDEs. Through our participatory approach we provide a rare insight into ADHD user experiences around usability and accessibility, and describe the need for better design of development environments to ensure a positive learning experience for the students.

Authors:Andrea Gaggioli, Sabrina Bartolotta, Andrea Ubaldi, Katusha Gerardini, Eleonora Diletta Sarcinella, Alice Chirico
Title: Extended Creativity: A Conceptual Framework for Understanding Human-AI Creative Relations
Abstract:
Artificial Intelligence holds significant potential to enhance human creativity. However, achieving this vision requires a clearer understanding of how such enhancement can be effectively realized. Drawing on a relational and distributed cognition perspective, we identify three fundamental modes by which AI can support and shape creative processes: Support, where AI acts as a tool; Synergy, where AI and humans collaborate in complementary ways; and Symbiosis, where human and AI cognition become so integrated that they form a unified creative system. These modes are defined along two key dimensions: the level of technical autonomy exhibited by the AI system (i.e., its ability to operate independently and make decisions without human intervention), and the degree of perceived agency attributed to it (i.e., the extent to which the AI is experienced as an intentional or creative partner). We examine how each configuration influences different levels of creativity from everyday problem solving to paradigm shifting innovation and discuss the implications for ethics, research, and the design of future human AI creative systems.

Authors:C. Gautier, J. Delanoy, G. Gesquière
Title: Integrating multimedia documents in 3D city models for a better understanding of territories
Abstract:
Digital 3D representations of urban areas, through their growing availability, are a helpful tool to better understand a territory. However, they lack contextual information about, for example, the history or functionality of buildings. On another side, multimedia documents like images, videos or texts usually contain such information. Crossing these two types of data can therefore help in the analysis and understanding of the organization of our cities. This could also be used to develop document search based on spatial navigation, instead of the classical textual query. In this paper, we propose four approaches to integrate multimedia documents in a 3D urban scene, allowing to contextualize the scene with any type of media. We combine these integration approaches with user guidance modes that allows to guide the user through the consumption of these media and support its understanding of the territory. We demonstrate the usefulness of these techniques in the context of different projects within the Lyon area (France). The use of multimedia documents integrated into a digital tour allows, for example, the iconic buildings to be contextualised or to understand the evolution of a territory through time.

Authors:Emma Kallina, Thomas Bohné, Jat Singh
Title: Stakeholder Participation for Responsible AI Development: Disconnects Between Guidance and Current Practice
Abstract:
Responsible AI (rAI) guidance increasingly promotes stakeholder involvement (SHI) during AI development. At the same time, SHI is already common in commercial software development, but with potentially different foci. This study clarifies the extent to which established SHI practices are able to contribute to rAI efforts as well as potential disconnects -- essential insights to inform and tailor future interventions that further shift industry practice towards rAI efforts. First, we analysed 56 rAI guidance documents to identify why SHI is recommended (i.e. its expected benefits for rAI) and uncovered goals such as redistributing power, improving socio-technical understandings, anticipating risks, and enhancing public oversight. To understand why and how SHI is currently practised in commercial settings, we then conducted an online survey (n=130) and semi-structured interviews (n=10) with AI practitioners. Our findings reveal that SHI in practice is primarily driven by commercial priorities (e.g. customer value, compliance) and several factors currently discourage more rAI-aligned SHI practices. This suggests that established SHI practices are largely not contributing to rAI efforts. To address this disconnect, we propose interventions and research opportunities to advance rAI development in practice.

Authors:Brooklyn J. Corbett, Jason M. Tangen
Title: AI Tutors vs. Tenacious Myths: Evidence from Personalised Dialogue Interventions in Education
Abstract:
Misconceptions in psychology and education persist despite clear contradictory evidence, resisting traditional correction methods. This study investigated whether personalised AI dialogue could effectively correct these stubborn beliefs. In a preregistered experiment (N = 375), participants holding strong psychology misconceptions engaged in one of three interventions: (1) personalised AI dialogue targeting their specific misconception, (2) generic textbook-style refutation, or (3) neutral AI dialogue (control). Results showed that personalised AI dialogue produced significantly larger immediate belief reductions compared to both textbook reading and neutral dialogue. This advantage persisted at 10-day follow-up but diminished by 2 months, where AI dialogue and textbook conditions converged while both remained superior to control. Both AI conditions generated significantly higher engagement and confidence than textbook reading, demonstrating the motivational benefits of conversational interaction. These findings demonstrate that AI dialogue can accelerate initial belief correction through personalised, interactive engagement that disrupts the cognitive processes maintaining misconceptions. However, the convergence of effects over time suggests brief interventions require reinforcement for lasting change. Future applications should integrate AI tutoring into structured educational programs with spaced reinforcement to sustain the initial advantages of personalised dialogue.

Authors:Erin Argo, Tanim Ahmed, Sarah Gable, Callie Hampton, Jeronimo Grandi, Regis Kopper
Title: Augmented Reality User Interfaces for First Responders: A Scoping Literature Review
Abstract:
During the past decade, there has been a significant increase in research focused on integrating AR User Interfaces into public safety applications, particularly for first responders in the domains of Emergency Medical Services, Firefighting, and Law Enforcement. This paper presents the results of a scoping review involving the application of AR user interfaces in the public safety domain and applies an established systematic review methodology to provide a comprehensive analysis of the current research landscape, identifying key trends, challenges, and gaps in the literature. This review includes peer-reviewed publications indexed by the major scientific databases up to April 2025. A basic keyword search retrieved 1,751 papers, of which 90 were deemed relevant for this review. An in-depth analysis of the literature allowed the development of a faceted taxonomy that categorizes AR user interfaces for public safety. This classification lays a solid foundation for future research, while also highlighting key design considerations, challenges, and gaps in the literature. This review serves as a valuable resource for researchers and developers, offering insights that can drive further advances in the field.

Authors:Mohammad Attar, Andrew Carse, Yeming Chen, Thomas Green, Jeong-Yeon Ha, Yanbai Jin, Amy McWilliams, Theirry Panggabean, Zhengyu Peng, Lujin Sun, Jing Ru, Jiacheng She, Jialin Wang, Zilun Wei, Jiayuan Zhu, Lachlan McGinness
Title: Particle Builder -- Learn about the Standard Model while playing against an AI
Abstract:
Particle Builder Online is a web-based education game designed for high school physics students. Students can play against an AI opponent or peers to familiarise themselves with the Standard Model of Particle Physics. The game is aimed at a high school level and tailored to the International Baccalaureate and the Australian Curriculum. Students from four schools in Canberra took pre/post-tests and a survey while completing a lesson where they played Particle Builder. Students' understanding of particle physics concepts improved significantly. Students found the game more enjoyable and effective than regular classroom lessons.

Authors:Petar Jakuš, Hrvoje Džapo
Title: Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU
Abstract:
This paper presents a keyword spotting (KWS) system implemented on the NXP MCXN947 microcontroller with an integrated Neural Processing Unit (NPU), enabling real-time voice interaction on resource-constrained devices. The system combines MFCC feature extraction with a CNN classifier, optimized using Quantization Aware Training to reduce model size with minimal accuracy drop. Experimental results demonstrate a 59x speedup in inference time when leveraging the NPU compared to CPU-only execution, achieving 97.06% accuracy with a model size of 30.58 KB, demonstrating the feasibility of efficient, low-power voice interfaces on embedded platforms.

Authors:Flavio D'Intino, Hans-Peter Hutter
Title: Advancing STT for Low-Resource Real-World Speech
Abstract:
Swiss German is a low-resource language represented by diverse dialects that differ significantly from Standard German and from each other, lacking a standardized written form. As a result, transcribing Swiss German involves translating into Standard German. Existing datasets have been collected in controlled environments, yielding effective speech-to-text (STT) models, but these models struggle with spontaneous conversational speech. This paper, therefore, introduces the new SRB-300 dataset, a 300-hour annotated speech corpus featuring real-world long-audio recordings from 39 Swiss German radio and TV stations. It captures spontaneous speech across all major Swiss dialects recorded in various realistic environments and overcomes the limitation of prior sentence-level corpora. We fine-tuned multiple OpenAI Whisper models on the SRB-300 dataset, achieving notable enhancements over previous zero-shot performance metrics. Improvements in word error rate (WER) ranged from 19% to 33%, while BLEU scores increased between 8% and 40%. The best fine-tuned model, large-v3, achieved a WER of 17.1% and a BLEU score of 74.8. This advancement is crucial for developing effective and robust STT systems for Swiss German and other low-resource languages in real-world contexts.

Authors:Xinyue Niu, Akira Furui
Title: Towards Cross-Subject EMG Pattern Recognition via Dual-Branch Adversarial Feature Disentanglement
Abstract:
Cross-subject electromyography (EMG) pattern recognition faces significant challenges due to inter-subject variability in muscle anatomy, electrode placement, and signal characteristics. Traditional methods rely on subject-specific calibration data to adapt models to new users, an approach that is both time-consuming and impractical for large-scale, real-world deployment. This paper presents an approach to eliminate calibration requirements through feature disentanglement, enabling effective cross-subject generalization. We propose an end-to-end dual-branch adversarial neural network that simultaneously performs pattern recognition and individual identification by disentangling EMG features into pattern-specific and subject-specific components. The pattern-specific components facilitate robust pattern recognition for new users without model calibration, while the subject-specific components enable downstream applications such as task-invariant biometric identification. Experimental results demonstrate that the proposed model achieves robust performance on data from unseen users, outperforming various baseline methods in cross-subject scenarios. Overall, this study offers a new perspective for cross-subject EMG pattern recognition without model calibration and highlights the proposed model's potential for broader applications, such as task-independent biometric systems.

Authors:Thomas M. Kwok, Hilary HY Cheng, Wai Tuck Chow
Title: EMG-Driven Stiffness-Modulating Palpation for Telerehabilitation
Abstract:
In this work, we introduce HJ-Pal, a lightweight wearable haptic device that leverages EMG-driven honeycomb jamming to render muscle activation as kinesthetic feedback, enabling remote palpation for small muscle assessment in telerehabilitation.

Authors:Kieran J. Smith, Tristan C. Endsley, Torin K. Clark
Title: Predicting Situation Awareness from Physiological Signals
Abstract:
Situation awareness (SA)--comprising the ability to 1) perceive critical elements in the environment, 2) comprehend their meanings, and 3) project their future states--is critical for human operator performance. Due to the disruptive nature of gold-standard SA measures, researchers have sought physiological indicators to provide real-time information about SA. We extend prior work by using a multimodal suite of neurophysiological, psychophysiological, and behavioral signals, predicting all three levels of SA along a continuum, and predicting a comprehensive measure of SA in a complex multi-tasking simulation. We present a lab study in which 31 participants controlled an aircraft simulator task battery while wearing physiological sensors and responding to SA 'freeze-probe' assessments. We demonstrate the validity of task and assessment for measuring SA. Multimodal physiological models predict SA with greater predictive performance ($Q^2$ for levels 1-3 and total, respectively: 0.14, 0.00, 0.26, and 0.36) than models built with shuffled labels, demonstrating that multimodal physiological signals provide useful information in predicting all SA levels. Level 3 SA (projection) was best predicted, and level 2 SA comprehension) was the most challenging to predict. Ablation analysis and single sensor models found EEG and eye-tracking signals to be particularly useful to predictions of level 3 and total SA. A reduced sensor fusion model showed that predictive performance can be maintained with a subset of sensors. This first rigorous cross-validation assessment of predictive performance demonstrates the utility of multimodal physiological signals for inferring complex, holistic, objective measures of SA at all levels, non-disruptively, and along a continuum.

Authors:Maryam Teimouri, Filip Ginter, Tomi "bgt" Suovuo
Title: Interaction Analysis by Humans and AI: A Comparative Perspective
Abstract:
This paper explores how Mixed Reality (MR) and 2D video conferencing influence children's communication during a gesture-based guessing game. Finnish-speaking participants engaged in a short collaborative task using two different setups: Microsoft HoloLens MR and Zoom. Audio-video recordings were transcribed and analyzed using Large Language Models (LLMs), enabling iterative correction, translation, and annotation. Despite limitations in annotations' accuracy and agreement, automated approaches significantly reduced processing time and allowed non-Finnish-speaking researchers to participate in data analysis. Evaluations highlight both the efficiency and constraints of LLM-based analyses for capturing children's interactions across these platforms. Initial findings indicate that MR fosters richer interaction, evidenced by higher emotional expression during annotation, and heightened engagement, while Zoom offers simplicity and accessibility. This study underscores the potential of MR to enhance collaborative learning experiences for children in distributed settings.

Authors:Prarabdh Shukla, Wei Yin Chong, Yash Patel, Brennan Schaffner, Danish Pruthi, Arjun Bhagoji
Title: Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch
Abstract:
To meet the demands of content moderation, online platforms have resorted to automated systems. Newer forms of real-time engagement($\textit{e.g.}$, users commenting on live streams) on platforms like Twitch exert additional pressures on the latency expected of such moderation systems. Despite their prevalence, relatively little is known about the effectiveness of these systems. In this paper, we conduct an audit of Twitch's automated moderation tool ($\texttt{AutoMod}$) to investigate its effectiveness in flagging hateful content. For our audit, we create streaming accounts to act as siloed test beds, and interface with the live chat using Twitch's APIs to send over $107,000$ comments collated from $4$ datasets. We measure $\texttt{AutoMod}$'s accuracy in flagging blatantly hateful content containing misogyny, racism, ableism and homophobia. Our experiments reveal that a large fraction of hateful messages, up to $94\%$ on some datasets, $\textit{bypass moderation}$. Contextual addition of slurs to these messages results in $100\%$ removal, revealing $\texttt{AutoMod}$'s reliance on slurs as a moderation signal. We also find that contrary to Twitch's community guidelines, $\texttt{AutoMod}$ blocks up to $89.5\%$ of benign examples that use sensitive words in pedagogical or empowering contexts. Overall, our audit points to large gaps in $\texttt{AutoMod}$'s capabilities and underscores the importance for such systems to understand context effectively.

Authors:Anna Yokokubo, Takeo Hamada, Tatsuya Ishizuka, Hiroaki Mori, Noboru Koshizuka
Title: Happiness Finder: Exploring the Role of AI in Enhancing Well-Being During Four-Leaf Clover Searches
Abstract:
A four-leaf clover (FLC) symbolizes luck and happiness worldwide, but it is hard to distinguish it from the common three-leaf clover. While AI technology can assist in searching for FLC, it may not replicate the traditional search's sense of achievement. This study explores searcher feelings when AI aids the FLC search. In this study, we developed a system called ``Happiness Finder'' that uses object detection algorithms on smartphones or tablets to support the search. We exhibited HappinessFinder at an international workshop, allowing participants to experience four-leaf clover searching using potted artificial clovers and the HappinessFinder app. This paper reports the findings from this demonstration.

Authors:Guanming Qiao, Partha Protim Paul
Title: Human Side of Smart Contract Fuzzing: An Empirical Study
Abstract:
Smart contract (SC) fuzzing is a critical technique for detecting vulnerabilities in blockchain applications. However, its adoption remains challenging for practitioners due to fundamental differences between SCs and traditional software systems. In this study, we investigate the challenges practitioners face when adopting SC fuzzing tools by conducting an inductive content analysis of 381 GitHub issues from two widely used SC fuzzers: Echidna and Foundry. Furthermore, we conducted a user study to examine how these challenges affect different practitioner groups, SC developers, and traditional software security professionals, and identify strategies practitioners use to overcome them. We systematically categorize these challenges into a taxonomy based on their nature and occurrence within the SC fuzzing workflow. Our findings reveal domain-specific ease-of-use and usefulness challenges, including technical issues with blockchain emulation, and human issues with a lack of accessible documentation and process automation. Our results provide actionable insights for tool developers and researchers, guiding future improvements in SC fuzzer tool design.

Authors:Victor B. Santos, Cauã O. Jordão, Leonardo J. O. Ibiapina, Gabriel M. Silva, Mirella E. B. Santana, Matheus A. Garrido, Lucas R. C. Farias
Title: IDEIA: A Generative AI-Based System for Real-Time Editorial Ideation in Digital Journalism
Abstract:
This paper presents IDEIA (Intelligent Engine for Editorial Ideation and Assistance), a generative AI-powered system designed to optimize the journalistic ideation process by combining real-time trend analysis with automated content suggestion. Developed in collaboration with the Sistema Jornal do Commercio de Comunicação (SJCC), the largest media conglomerate in Brazil's North and Northeast regions, IDEIA integrates the Google Trends API for data-driven topic monitoring and the Google Gemini API for the generation of context-aware headlines and summaries. The system adopts a modular architecture based on Node.js, React, and PostgreSQL, supported by Docker containerization and a CI/CD pipeline using GitHub Actions and Vercel. Empirical results demonstrate a significant reduction in the time and cognitive effort required for editorial planning, with reported gains of up to 70\% in the content ideation stage. This work contributes to the field of computational journalism by showcasing how intelligent automation can enhance productivity while maintaining editorial quality. It also discusses the technical and ethical implications of incorporating generative models into newsroom workflows, highlighting scalability and future applicability across sectors beyond journalism.

Authors:Naseem Babu, Jimson Mathew, A. P. Vinod
Title: Large Language Models for EEG: A Comprehensive Survey and Taxonomy
Abstract:
The growing convergence between Large Language Models (LLMs) and electroencephalography (EEG) research is enabling new directions in neural decoding, brain-computer interfaces (BCIs), and affective computing. This survey offers a systematic review and structured taxonomy of recent advancements that utilize LLMs for EEG-based analysis and applications. We organize the literature into four domains: (1) LLM-inspired foundation models for EEG representation learning, (2) EEG-to-language decoding, (3) cross-modal generation including image and 3D object synthesis, and (4) clinical applications and dataset management tools. The survey highlights how transformer-based architectures adapted through fine-tuning, few-shot, and zero-shot learning have enabled EEG-based models to perform complex tasks such as natural language generation, semantic interpretation, and diagnostic assistance. By offering a structured overview of modeling strategies, system designs, and application areas, this work serves as a foundational resource for future work to bridge natural language processing and neural signal analysis through language models.

Authors:Eunhye Grace Ko, Shaini Nanayakkara, Earl W. Huff
Title: "We need to avail ourselves of GenAI to enhance knowledge distribution": Empowering Older Adults through GenAI Literacy
Abstract:
As generative AI (GenAI) becomes increasingly widespread, it is crucial to equip users, particularly vulnerable populations such as older adults (65 and older), with the knowledge to understand its benefits and potential risks. Older adults often exhibit greater reservations about adopting emerging technologies and require tailored literacy support. Using a mixed methods approach, this study examines strategies for delivering GenAI literacy to older adults through a chatbot named Litti, evaluating its impact on their AI literacy (knowledge, safety, and ethical use). The quantitative data indicated a trend toward improved AI literacy, though the results were not statistically significant. However, qualitative interviews revealed diverse levels of familiarity with generative AI and a strong desire to learn more. Findings also show that while Litti provided a positive learning experience, it did not significantly enhance participants' trust or sense of safety regarding GenAI. This exploratory case study highlights the challenges and opportunities in designing AI literacy education for the rapidly growing older adult population.

Authors:Tianyi Alex Qiu, Zhonghao He, Tejasveer Chugh, Max Kleiman-Weiner
Title: The Lock-in Hypothesis: Stagnation by Algorithm
Abstract:
The training and deployment of large language models (LLMs) create a feedback loop with human users: models learn human beliefs from data, reinforce these beliefs with generated content, reabsorb the reinforced beliefs, and feed them back to users again and again. This dynamic resembles an echo chamber. We hypothesize that this feedback loop entrenches the existing values and beliefs of users, leading to a loss of diversity and potentially the lock-in of false beliefs. We formalize this hypothesis and test it empirically with agent-based LLM simulations and real-world GPT usage data. Analysis reveals sudden but sustained drops in diversity after the release of new GPT iterations, consistent with the hypothesized human-AI feedback loop. Code and data available at https://thelockinhypothesis.com

Authors:Eunhye Grace Ko, Soo Hyoung Joo
Title: (AI peers) are people learning from the same standpoint: Perception of AI characters in a Collaborative Science Investigation
Abstract:
While the complexity of 21st-century demands has promoted pedagogical approaches to foster complex competencies, a persistent gap remains between in-class learning activities and individualized learning or assessment practices. To address this, studies have explored the use of AI-generated characters in learning and assessment. One attempt is scenario-based assessment (SBA), a technique that not only measures but also fosters the development of competencies throughout the assessment process. SBA introduces simulated agents to provide an authentic social-interactional context, allowing for the assessment of competency-based constructs while mitigating the unpredictability of real-life interactions. Recent advancements in multimodal AI, such as text-to-video technology, allow these agents to be enhanced into AI-generated characters. This mixed-method study investigates how learners perceive AI characters taking the role of mentor and teammates in an SBA mirroring the context of a collaborative science investigation. Specifically, we examined the Likert scale responses of 56 high schoolers regarding trust, social presence, and effectiveness. We analyzed the relationships between these factors and their impact on the intention to adopt AI characters through PLS-SEM. Our findings indicated that learners' trust shaped their sense of social presence with the AI characters, enhancing perceived effectiveness. Qualitative analysis further highlighted factors that foster trust, such as material credibility and alignment with learning goals, as well as the pivotal role of social presence in creating a collaborative context. This paper was accepted as an full paper for AIED 2025.

Authors:Zackary Okun Dunivin, Paul E. Smaldino
Title: Recommender systems, stigmergy, and the tyranny of popularity
Abstract:
Scientific recommender systems, such as Google Scholar and Web of Science, are essential tools for discovery. Search algorithms that power work through stigmergy, a collective intelligence mechanism that surfaces useful paths through repeated engagement. While generally effective, this "rich-get-richer" dynamic results in a small number of high-profile papers that dominate visibility. This essay argues argue that these algorithm over-reliance on popularity fosters intellectual homogeneity and exacerbates structural inequities, stifling innovative and diverse perspectives critical for scientific progress. We propose an overhaul of search platforms to incorporate user-specific calibration, allowing researchers to manually adjust the weights of factors like popularity, recency, and relevance. We also advise platform developers on how text embeddings and LLMs could be implemented in ways that increase user autonomy. While our suggestions are particularly pertinent to aligning recommender systems with scientific values, these ideas are broadly applicable to information access systems in general. Designing platforms that increase user autonomy is an important step toward more robust and dynamic information

Authors:Ruochen Ji, Lyu Tiangang
Title: Conversational Interfaces for Parametric Conceptual Architectural Design: Integrating Mixed Reality with LLM-driven Interaction
Abstract:
Mixed reality (MR) environments offer embodied spatial interaction, providing intuitive 3D manipulation capabilities that enhance the conceptual design process. Parametric modeling, a powerful and advanced architectural design method, enables the generation of complex, optimized geometries. However, its integration into MR environments remains limited due to precision constraints and unsuitable input modalities. Existing MR tools prioritize spatial interaction but lack the control and expressiveness required for parametric workflows, particularly for designers without formal programming backgrounds. We address this gap by introducing a novel conversational MR interface that combines speech input, gesture recognition, and a multi-agent large language model (LLM) system to support intuitive parametric modeling. Our system dynamically manages parameter states, resolves ambiguous commands through conversation and contextual prompting, and enables real-time model manipulation within immersive environments. We demonstrate how this approach reduces cognitive and operational barriers in early-stage design tasks, allowing users to refine and explore their design space. This work expands the role of MR to a generative design platform, supporting programmatic thinking in design tasks through natural, embodied interaction.

Authors:Aunam Quyoum, Mark Wong, Sebati Ghosh, Siamak F. Shahandashti
Title: Minoritised Ethnic People's Security and Privacy Concerns and Responses towards Essential Online Services
Abstract:
Minoritised ethnic people are marginalised in society, and therefore at a higher risk of adverse online harms, including those arising from the loss of security and privacy of personal data. Despite this, there has been very little research focused on minoritised ethnic people's security and privacy concerns, attitudes, and behaviours. In this work, we provide the results of one of the first studies in this regard. We explore minoritised ethnic people's experiences of using essential online services across three sectors: health, social housing, and energy, their security and privacy-related concerns, and responses towards these services. We conducted a thematic analysis of 44 semi-structured interviews with people of various reported minoritised ethnicities in the UK. Privacy concerns and lack of control over personal data emerged as a major theme, with many interviewees considering privacy as their most significant concern when using online services. Several creative tactics to exercise some agency were reported, including selective and inconsistent disclosure of personal data. A core concern about how data may be used was driven by a fear of repercussions, including penalisation and discrimination, influenced by prior experiences of institutional and online racism. The increased concern and potential for harm resulted in minoritised ethnic people grappling with a higher-stakes dilemma of whether to disclose personal information online or not. Furthermore, trust in institutions, or lack thereof, was found to be embedded throughout as a basis for adapting behaviour. We draw on our results to provide lessons learned for the design of more inclusive, marginalisation-aware, and privacy-preserving online services.

Authors:Gizem Öz, Christian Dindler, Sharon Lindberg
Title: The Turn to Practice in Design Ethics: Characteristics and Future Research Directions for HCI Research
Abstract:
As emerging technologies continue to shape society, there is a growing emphasis on the need to engage with design ethics as it unfolds in practice to better capture the complexities of ethical considerations embedded in day-to-day work. Positioned within the broader "turn to practice" in HCI, the review characterizes this body of work in terms of its motivations, conceptual frameworks, methodologies, and contributions across a range of design disciplines and academic databases. The findings reveal a shift away from static and abstract ethical frameworks toward an understanding of ethics as an evolving, situated, and inherent aspect of design activities, one that can be cultivated and fostered collaboratively. This review proposes six future directions for establishing common research priorities and fostering the field's growth. While the review promotes cross-disciplinary dialogue, we argue that HCI research, given its cumulative experience with practice-oriented research, is well-equipped to guide this emerging strand of work on design ethics.

Authors:Ziqun Hua, Ao Jiang, Haoling Yang, Hao Fan, Huizhong Hu, Bernard Foing
Title: Regenerating Daily Routines for Young Adults with Depression through User-Led Indoor Environment Modifications Using Local Natural Materials
Abstract:
Young adults with depression often experience prolonged indoor stays, limiting their access to natural environments and exacerbating mental health challenges. While nature therapy is recognized for its psychological benefits, existing interventions frequently require outdoor engagement, which may not be accessible for all individuals. This study explores the potential of user-led indoor modifications using local natural materials as a mental health intervention. A qualitative approach wasemployedtoassessemotionalandenvironmentalconnectedness.Participants engaged in material exploration, collection, and crafting, integrating natural elements into their living spaces. Findings indicate improved mood,increased environmental awareness,and a stronger sense of agency over personal space. The standardized intervention steps suggest the feasibility of a self-help toolkit, enabling broader implementation. This research contributes to sustainable, user-driven mental health interventions, bridging the gap between nature therapy and practical indoor applications.

Authors:Mohammed Almutairi, Diego Gómez-Zará
Title: Towards Effective Multidisciplinary Health and HCI Teams based on AI Framework
Abstract:
As a Ph.D. student with a diverse background in both public and private sectors, I have encountered numerous challenges in cross-disciplinary and multi-stakeholder team projects. My research on developing team compositions that involve multidisciplinary members from fields including education, academia, and health. Along with my advisor, we are focused on exploring how HCI can help individuals assemble more effective teams. This effort involves developing socio-technical systems that guide and inform individuals of the potential teams that they can assemble. We employ state-of-the-art algorithms that prioritize inclusion among team members from diverse areas of expertise and familiarity between the team members. Our goal for attending this workshop is to engage in meaningful dialogues with scholars and researchers, leveraging these interactions to refine our approach to building an AI-driven team composition system to foster effective, interdisciplinary collaboration in health-focused HCI research.

Authors:Kacper Sokol, James Fackler, Julia E Vogt
Title: Artificial Intelligence Should Genuinely Support Clinical Reasoning and Decision Making To Bridge the Translational Gap
Abstract:
Artificial intelligence promises to revolutionise medicine, yet its impact remains limited because of the pervasive translational gap. We posit that the prevailing technology-centric approaches underpin this challenge, rendering such systems fundamentally incompatible with clinical practice, specifically diagnostic reasoning and decision making. Instead, we propose a novel sociotechnical conceptualisation of data-driven support tools designed to complement doctors' cognitive and epistemic activities. Crucially, it prioritises real-world impact over superhuman performance on inconsequential benchmarks.

Authors:Camilla Mannino, Pierpaolo Sorrentino, Mario Chavez, Marie-Costance Corsi
Title: Neuronal avalanches as a predictive biomarker of BCI performance: towards a tool to guide tailored training program
Abstract:
Brain-Computer Interfaces (BCIs) based on motor imagery (MI) hold promise for restoring control in individuals with motor impairments. However, up to 30% of users remain unable to effectively use BCIs-a phenomenon termed ''BCI inefficiency.'' This study addresses a major limitation in current BCI training protocols: the use of fixed-length training paradigms that ignore individual learning variability. We propose a novel approach that leverages neuronal avalanches-spatiotemporal cascades of brain activity-as biomarkers to characterize and predict user-specific learning mechanism. Using electroencephalography (EEG) data collected across four MI-BCI training sessions in 20 healthy participants, we extracted two features: avalanche length and activations. These features revealed significant training and taskcondition effects, particularly in later sessions. Crucially, changes in these features across sessions ($Δ$avalanche length and $Δ$activations) correlated significantly with BCI performance and enabled prediction of future BCI success via longitudinal Support Vector Regression and Classification models. Predictive accuracy reached up to 91%, with notable improvements after spatial filtering based on selected regions of interest. These findings demonstrate the utility of neuronal avalanche dynamics as robust biomarkers for BCI training, supporting the development of personalized protocols aimed at mitigating BCI illiteracy.

Authors:Mukesh Rajmohan, Smit Desai, Sanchari Das
Title: Multi-Tool Analysis of User Interface & Accessibility in Deployed Web-Based Chatbots
Abstract:
In this work, we present a multi-tool evaluation of 106 deployed web-based chatbots, across domains like healthcare, education and customer service, comprising both standalone applications and embedded widgets using automated tools (Google Lighthouse, PageSpeed Insights, SiteImprove Accessibility Checker) and manual audits (Microsoft Accessibility Insights). Our analysis reveals that over 80% of chatbots exhibit at least one critical accessibility issue, and 45% suffer from missing semantic structures or ARIA role misuse. Furthermore, we found that accessibility scores correlate strongly across tools (e.g., Lighthouse vs PageSpeed Insights, r = 0.861), but performance scores do not (r = 0.436), underscoring the value of a multi-tool approach. We offer a replicable evaluation insights and actionable recommendations to support the development of user-friendly conversational interfaces.

Authors:Ekaterina Fedorova, Madeline Kitch, Chara Podimata
Title: User Altruism in Recommendation Systems
Abstract:
Users of social media platforms based on recommendation systems (RecSys) (e.g. TikTok, X, YouTube) strategically interact with platform content to influence future recommendations. On some such platforms, users have been documented to form large-scale grassroots movements encouraging others to purposefully interact with algorithmically suppressed content in order to "boost" its recommendation; we term this behavior user altruism. To capture this behavior, we study a game between users and a RecSys, where users provide the RecSys (potentially manipulated) preferences over the contents available to them, and the RecSys -- limited by data and computation constraints -- creates a low-rank approximation preference matrix, and ultimately provides each user her (approximately) most-preferred item. We compare the users' social welfare under truthful preference reporting and under a class of strategies capturing user altruism. In our theoretical analysis, we provide sufficient conditions to ensure strict increases in user social welfare under user altruism, and provide an algorithm to find an effective altruistic strategy. Interestingly, we show that for commonly assumed recommender utility functions, effectively altruistic strategies also improve the utility of the RecSys! We show that our results are robust to several model misspecifications, thus strengthening our conclusions. Our theoretical analysis is complemented by empirical results of effective altruistic strategies on the GoodReads dataset, and an online survey on how real-world users behave altruistically in RecSys. Overall, our findings serve as a proof-of-concept of the reasons why traditional RecSys may incentivize users to form collectives and/or follow altruistic strategies when interacting with them.

Authors:Tapio Pitkäranta, Leena Pitkäranta
Title: HADA: Human-AI Agent Decision Alignment Architecture
Abstract:
We present HADA (Human-AI Agent Decision Alignment), a protocol- and framework agnostic reference architecture that keeps both large language model (LLM) agents and legacy algorithms aligned with organizational targets and values. HADA wraps any algorithm or LLM in role-specific stakeholder agents -- business, data-science, audit, ethics, and customer -- each exposing conversational APIs so that technical and non-technical actors can query, steer, audit, or contest every decision across strategic, tactical, and real-time horizons. Alignment objectives, KPIs, and value constraints are expressed in natural language and are continuously propagated, logged, and versioned while thousands of heterogeneous agents run on different orchestration stacks. A cloud-native proof of concept packages a production credit-scoring model (getLoanDecision) and deploys it on Docker/Kubernetes/Python; five scripted retail-bank scenarios show how target changes, parameter tweaks, explanation requests, and ethics triggers flow end to end through the architecture. Evaluation followed the Design-Science Research Methodology. Walkthrough observation and log inspection demonstrated complete coverage of six predefined objectives: every role could invoke conversational control, trace KPIs and value constraints, detect and mitigate ZIP-code bias, and reproduce full decision lineage, independent of the underlying LLM or agent library. Contributions: (1) an open-source HADA architecture, (2) a mid-range design theory for human-AI alignment in multi-agent systems, and (3) empirical evidence that framework-agnostic, protocol-compliant stakeholder agents improve accuracy, transparency, and ethical compliance in real-world decision pipelines.

Authors:Michel Adam, Patrice Frison, Moncef Daoud, Sabine Letellier Zarshenas
Title: Design of a visual environment for programming by direct data manipulation
Abstract:
The use of applications on computers, smartphones, and tablets has been considerably simplified thanks to interactive and dynamic graphical interfaces coupled with the mouse and touch screens. It is no longer necessary to be a computer specialist to use them. Paradoxically, the development of computer programs generally requires writing lines of code in a programming language whose syntax is particularly strict. This process poses many difficulties for programmers. We propose an original tool in which arbitrary programs (Turing-complete) can be developed in a completely visual manner by direct manipulation of the data, without writing a line of code. The user can thus develop an algorithm by directly visualizing the result of actions taken on the data. A method for constructing iterations is associated with the tool. It proposes to create each part, including the loop body, in a non-linear manner under visual control of the state of the data. In addition, the tool supports the production of lines of code in several languages including Python, C, Java, that correspond to the actions performed. In this article, we present the tool, the design choices, the problems to be solved, and the limits and the contributions of the direct-data-manipulation approach.

Authors:Xiaotian Su, April Wang
Title: The Stress of Improvisation: Instructors' Perspectives on Live Coding in Programming Classes
Abstract:
Live coding is a pedagogical technique in which an instructor writes and executes code in front of students to impart skills like incremental development and debugging. Although live coding offers many benefits, instructors face many challenges in the classroom, like cognitive challenges and psychological stress, most of which have yet to be formally studied. To understand the obstacles faced by instructors in CS classes, we conducted (1) a formative interview with five teaching assistants in exercise sessions and (2) a contextual inquiry study with four lecturers for large-scale classes. We found that the improvisational and unpredictable nature of live coding makes it difficult for instructors to manage their time and keep students engaged, resulting in more mental stress than presenting static slides. We discussed opportunities for augmenting existing IDEs and presentation setups to help enhance live coding experience.

Authors:Madhu Babu Sikha, Lalith Appari, Gurudatt Nanjanagudu Ganesh, Amay Bandodkar, Imon Banerjee
Title: Multi-Analyte, Swab-based Automated Wound Monitor with AI
Abstract:
Diabetic foot ulcers (DFUs), a class of chronic wounds, affect ~750,000 individuals every year in the US alone and identifying non-healing DFUs that develop to chronic wounds early can drastically reduce treatment costs and minimize risks of amputation. There is therefore a pressing need for diagnostic tools that can detect non-healing DFUs early. We develop a low cost, multi-analyte 3D printed assays seamlessly integrated on swabs that can identify non-healing DFUs and a Wound Sensor iOS App - an innovative mobile application developed for the controlled acquisition and automated analysis of wound sensor data. By comparing both the original base image (before exposure to the wound) and the wound-exposed image, we developed automated computer vision techniques to compare density changes between the two assay images, which allow us to automatically determine the severity of the wound. The iOS app ensures accurate data collection and presents actionable insights, despite challenges such as variations in camera configurations and ambient conditions. The proposed integrated sensor and iOS app will allow healthcare professionals to monitor wound conditions real-time, track healing progress, and assess critical parameters related to wound care.

Authors:Kimaya Lecamwasam, Tishya Ray Chaudhuri
Title: Exploring listeners' perceptions of AI-generated and human-composed music for functional emotional applications
Abstract:
This work investigates how listeners perceive and evaluate AI-generated as compared to human-composed music in the context of emotional resonance and regulation. Across a mixed-methods design, participants were exposed to both AI and human music under various labeling conditions (music correctly labeled as AI- or human-origin, music incorrectly labeled as AI- or human-origin, and unlabeled music) and emotion cases (Calm and Upbeat), and were asked to rate preference, efficacy of target emotion elicitation, and emotional impact. Participants were significantly more likely to rate human-composed music, regardless of labeling, as more effective at eliciting target emotional states, though quantitative analyses revealed no significant differences in emotional response. However, participants were significantly more likely to indicate preference for AI-generated music, yielding further questions regarding the impact of emotional authenticity and perceived authorship on musical appraisal. Qualitative data underscored this, with participants associating humanness with qualities such as imperfection, flow, and 'soul.' These findings challenge the assumption that preference alone signals success in generative music systems. Rather than positioning AI tools as replacements for human creativity or emotional expression, they point toward a more careful design ethos that acknowledges the limits of replication and prioritizes human values such as authenticity, individuality, and emotion regulation in wellness and affective technologies.

Authors:Zhengyang Li, Hailin Deng
Title: Cognitive Load-Driven VR Memory Palaces: Personalizing Focus and Recall Enhancement
Abstract:
Cognitive load, which varies across individuals, can significantly affect focus and memory performance.This study explores the integration of Virtual Reality (VR) with memory palace techniques, aiming to optimize VR environments tailored to individual cognitive load levels to improve focus and memory. We utilized EEG devices, specifically the Oculus Quest 2, to monitor Beta wave activity in 10 participants.By modeling their cognitive load profiles through polynomial regression, we dynamically adjusted spatial variables within a VR environment using Grasshopper, creating personalized experiences. Results indicate that 8 participants showed a notable increase in Beta wave activity, demonstrating improved focus and cognitive performance in the customized VR settings.These findings underscore the potential of VR-based memory environments, driven by cognitive load considerations, and provide valuable insights for advancing VR memory research

Authors:Arisa Sugino, Takayuki Itoh
Title: Visualization for interactively adjusting the de-bias effect of word embedding
Abstract:
Word embedding, which converts words into numerical values, is an important natural language processing technique and widely used. One of the serious problems of word embedding is that the bias will be learned and affect the model if the dataset used for pre-training contains bias. On the other hand, indiscriminate removal of bias from word embeddings may result in the loss of information, even if the bias is undesirable to us. As a result, a risk of model performance degradation due to bias removal will be another problem. As a solution to this problem, we focus on gender bias in Japanese and propose an interactive visualization method to adjust the degree of debias for each word category. Specifically, we visualize the accuracy in a category classification task after debiasing, and allow the user to adjust the parameters based on the visualization results, so that the debiasing can be adjusted according to the user's objectives. In addition, considering a trade-off between debiasing and preventing degradation of model performance, and that different people perceive gender bias differently, we developed a mechanism to present multiple choices of debiasing configurations applying an optimization scheme. This paper presents the results of an experiment in which we removed the gender bias for word embeddings learned from the Japanese version of Wikipedia. We classified words into five categories based on a news corpus, and observed that the degree of influence of debiasing differed greatly among the categories. We then adjusted the degree of debiasing for each category based on the visualization results.

Authors:Stephan Druskat, Sabine Theis
Title: Challenges in designing research infrastructure software in multi-stakeholder contexts
Abstract:
This study investigates the challenges in designing research infrastructure software for automated software publication in multi-stakeholder environments, focusing specifically on the HERMES system. Through two quantitative surveys of research software engineers (RSEs) and infrastructure facility staff (IFs), it examines technical, organizational, and social requirements across these stakeholder groups. The study reveals significant differences in how RSEs and IFs prioritize various system features. While RSEs highly value compatibility with existing infrastructure, IFs prioritize user-focused aspects like system usability and documentation. The research identifies two main challenges in designing research infrastructure software: (1) the existence of multiple stakeholder groups with differing requirements, and (2) the internal heterogeneity within each stakeholder group across dimensions such as technical experience. The study also highlights that only half of RSE respondents actively practice software publication, pointing to potential cultural or technical barriers. Additionally, the research reveals discrepancies in how stakeholders view organizational aspects, with IFs consistently rating factors like responsibility structures and quality assurance as more important than RSEs do. These findings contribute to a better understanding of the complexities involved in designing research infrastructure software and emphasize the need for systems that can accommodate diverse user groups while maintaining usability across different technical expertise levels.

Authors:Aditya Naik, Jovi Thomas, Teja Sree, Himavant Reddy
Title: Artificial Empathy: AI based Mental Health
Abstract:
Many people suffer from mental health problems but not everyone seeks professional help or has access to mental health care. AI chatbots have increasingly become a go-to for individuals who either have mental disorders or simply want someone to talk to. This paper presents a study on participants who have previously used chatbots and a scenario-based testing of large language model (LLM) chatbots. Our findings indicate that AI chatbots were primarily utilized as a "Five minute therapist" or as a non-judgmental companion. Participants appreciated the anonymity and lack of judgment from chatbots. However, there were concerns about privacy and the security of sensitive information. The scenario-based testing of LLM chatbots highlighted additional issues. Some chatbots were consistently reassuring, used emojis and names to add a personal touch, and were quick to suggest seeking professional help. However, there were limitations such as inconsistent tone, occasional inappropriate responses (e.g., casual or romantic), and a lack of crisis sensitivity, particularly in recognizing red flag language and escalating responses appropriately. These findings can inform both the technology and mental health care industries on how to better utilize AI chatbots to support individuals during challenging emotional periods.

Authors:An Vu, Jonas Oppenlaender
Title: Prompt Engineer: Analyzing Skill Requirements in the AI Job Market
Abstract:
The rise of large language models (LLMs) has created a new job role: the Prompt Engineer. Despite growing interest in this position, we still do not fully understand what skills this new job role requires or how common these jobs are. We analyzed 20,662 job postings on LinkedIn, including 72 prompt engineer positions, to learn more about this emerging role. We found that prompt engineering is still rare (less than 0.5% of sampled job postings) but has a unique skill profile. Prompt engineers need AI knowledge (22.8%), prompt design skills (18.7%), good communication (21.9%), and creative problem-solving (15.8%) skills. These requirements significantly differ from those of established roles, such as data scientists and machine learning engineers, showing that prompt engineering is becoming its own profession. Our findings help job seekers, employers, and educational institutions in better understanding the emerging field of prompt engineering.

Authors:Zhijun Pan, Antonios Andronis, Eva Hayek, Oscar AP Wilkinson, Ilya Lasy, Annette Parry, Guy Gadney, Tim J. Smith, Mick Grierson
Title: Guiding Generative Storytelling with Knowledge Graphs
Abstract:
Large Language Models (LLMs) have shown great potential in automated story generation, but challenges remain in maintaining long-form coherence and providing users with intuitive and effective control. Retrieval-Augmented Generation (RAG) has proven effective in reducing hallucinations in text generation; however, the use of structured data to support generative storytelling remains underexplored. This paper investigates how knowledge graphs (KGs) can enhance LLM-based storytelling by improving narrative quality and enabling user-driven modifications. We propose a KG-assisted storytelling pipeline and evaluate its effectiveness through a user study with 15 participants. Participants created their own story prompts, generated stories, and edited knowledge graphs to shape their narratives. Through quantitative and qualitative analysis, our findings demonstrate that knowledge graphs significantly enhance story quality in action-oriented and structured narratives within our system settings. Additionally, editing the knowledge graph increases users' sense of control, making storytelling more engaging, interactive, and playful.

Authors:Katalin Feher, Marton Demeter
Title: Generative Knowledge Production Pipeline Driven by Academic Influencers
Abstract:
Generative AI transforms knowledge production, validation, and dissemination, raising academic integrity and credibility concerns. This study examines 53 academic influencer videos that reached 5.3 million viewers to identify an emerging, structured, implementation-ready pipeline balancing originality, ethical compliance, and human-AI collaboration despite the disruptive impacts. Findings highlight generative AI's potential to automate publication workflows and democratize participation in knowledge production while challenging traditional scientific norms. Academic influencers emerge as key intermediaries in this paradigm shift, connecting bottom-up practices with institutional policies to improve adaptability. Accordingly, the study proposes a generative publication production pipeline and a policy framework for co-intelligence adaptation and reinforcing credibility-centered standards in AI-powered research. These insights support scholars, educators, and policymakers in understanding AI's transformative impact by advocating responsible and innovation-driven knowledge production. Additionally, they reveal pathways for automating best practices, optimizing scholarly workflows, and fostering creativity in academic research and publication.

Authors:David I. Gonzalez-Aguirre, Javier Felip Leon, Javier Felix-Rendon, Roderico Garcia-Leal, Julio C. Zamora Esquivel
Title: Towards Tangible Immersion for Cobot Programming-by-Demonstration: Visual, Tactile and Haptic Interfaces for Mixed-Reality Cobot Automation in Semiconductor Manufacturing
Abstract:
Sensor-based reactive and hybrid approaches have proven a promising line of study to address imperfect knowledge in grasping and manipulation. However the reactive approaches are usually tightly coupled to a particular embodiment making transfer of knowledge difficult. This paper proposes a paradigm for modeling and execution of reactive manipulation actions, which makes knowledge transfer to different embodiments possible while retaining the reactive capabilities of the embodiments. The proposed approach extends the idea of control primitives coordinated by a state machine by introducing an embodiment independent layer of abstraction. Abstract manipulation primitives constitute a vocabulary of atomic, embodiment independent actions, which can be coordinated using state machines to describe complex actions. To obtain embodiment specific models, the abstract state machines are automatically translated to embodiment specific models, such that full capabilities of each platform can be utilized. The strength of the manipulation primitives paradigm is demonstrated by developing a set of corresponding embodiment specific primitives for object transport, including a complex reactive grasping primitive. The robustness of the approach is experimentally studied in emptying of a box filled with several unknown objects. The embodiment independence is studied by performing a manipulation task on two different platforms using the same abstract description.

Authors:Truong Jack Luu, Binny M. Samuel
Title: Exposing the Impact of GenAI for Cybercrime: An Investigation into the Dark Side
Abstract:
In recent years, the rapid advancement and democratization of generative AI models have sparked significant debate over safety, ethical risks, and dual-use concerns, particularly in the context of cybersecurity. While anecdotally known, this paper provides empirical evidence regarding generative AI's association with malicious internet-related activities and cybercrime by examining the phenomenon through psychological frameworks of technological amplification and affordance theory. Using a quasi-experimental design with interrupted time series analysis, we analyze two datasets, one general and one cryptocurrency-focused, to empirically assess generative AI's role in cybercrime. The findings contribute to ongoing discussions about AI governance by balancing control and fostering innovation, underscoring the need for strategies to guide policymakers, inform AI developers and cybersecurity professionals, and educate the public to maximize AI's benefits while mitigating its risks.

Authors:Jonathan Smith, Siddartha Khastgir
Title: Self-driving technologies need the help of the public: A narrative review of the evidence
Abstract:
If public trust is lost in a new technology early in its life cycle it can take much more time for the benefits of that technology to be realised. Eventually tens-of-millions of people will collectively have the power to determine self-driving technology success of failure driven by their perception of risk, data handling, safety, governance, accountability, benefits to their life and more. This paper reviews the evidence on safety critical technology covering trust, engagement, and acceptance. The paper takes a narrative review approach concluding with a scalable model for self-driving technology education and engagement. The paper find that if a mismatch between the publics perception and expectations about self driving systems emerge it can lead to misuse, disuse, or abuse of the system. Furthermore we find from the evidence that industrial experts often misunderstand what matters to the public, users, and stakeholders. However we find that engagement programmes that develop approaches to defining the right information at the right time, in the right format orientated around what matters to the public creates the potential for ever more sophisticated conversations, greater trust, and moving the public into a progressive more active role of critique and advocacy. This work has been undertaken as part of the Partners for Automated Vehicle Education (PAVE) United Kingdom programme.

Authors:Sapolnach Prompiengchai, Charith Narreddy, Steve Joordens
Title: A Practical Guide for Supporting Formative Assessment and Feedback Using Generative AI
Abstract:
Formative assessment is a cornerstone of effective teaching and learning, providing students with feedback to guide their learning. While there has been an exponential growth in the application of generative AI in scaling various aspects of formative assessment, ranging from automatic question generation to intelligent tutoring systems and personalized feedback, few have directly addressed the core pedagogical principles of formative assessment. Here, we critically examined how generative AI, especially large-language models (LLMs) such as ChatGPT, can support key components of formative assessment: helping students, teachers, and peers understand "where learners are going," "where learners currently are," and "how to move learners forward" in the learning process. With the rapid emergence of new prompting techniques and LLM capabilities, we also provide guiding principles for educators to effectively leverage cost-free LLMs in formative assessments while remaining grounded in pedagogical best practices. Furthermore, we reviewed the role of LLMs in generating feedback, highlighting limitations in current evaluation metrics that inadequately capture the nuances of formative feedback, such as distinguishing feedback at the task, process, and self-regulatory levels. Finally, we offer practical guidelines for educators and researchers, including concrete classroom strategies and future directions such as developing robust metrics to assess LLM-generated feedback, leveraging LLMs to overcome systemic and cultural barriers to formative assessment, and designing AI-aware assessment strategies that promote transferable skills while mitigating overreliance on LLM-generated responses. By structuring the discussion within an established formative assessment framework, this review provides a comprehensive foundation for integrating LLMs into formative assessment in a pedagogically informed manner.

Authors:Anke Fischer-Janzen, Thomas M. Wendt, Daniel Görlich, Kristof Van Laerhoven
Title: Eye-tracking-Driven Shared Control for Robotic Arms:Wizard of Oz Studies to Assess Design Choices
Abstract:
Advances in eye-tracking control for assistive robotic arms provide intuitive interaction opportunities for people with physical disabilities. Shared control has gained interest in recent years by improving user satisfaction through partial automation of robot control. We present an eye-tracking-guided shared control design based on insights from state-of-the-art literature. A Wizard of Oz setup was used in which automation was simulated by an experimenter to evaluate the concept without requiring full implementation. This approach allowed for rapid exploration of user needs and expectations to inform future iterations. Two studies were conducted to assess user experience, identify design challenges, and find improvements to ensure usability and accessibility. The first study involved people with disabilities by providing a survey, and the second study used the Wizard of Oz design in person to gain technical insights, leading to a comprehensive picture of findings.

Authors:Yuval Samoilov-Kats, Matan Noach, Noam Beer, Yuval Efrati, Adam Zaidel
Title: An open-source Modular Online Psychophysics Platform (MOPP)
Abstract:
In recent years, there is a growing need and opportunity to use online platforms for psychophysics research. Online experiments make it possible to evaluate large and diverse populations remotely and quickly, complementing laboratory-based research. However, developing and running online psychophysics experiments poses several challenges: i) a high barrier-to-entry for researchers who often need to learn complex code-based platforms, ii) an uncontrolled experimental environment, and iii) questionable credibility of the participants. Here, we introduce an open-source Modular Online Psychophysics Platform (MOPP) to address these challenges. Through the simple web-based interface of MOPP, researchers can build modular experiments, share them with others, and copy or modify tasks from each others environments. MOPP provides built-in features to calibrate for viewing distance and to measure visual acuity. It also includes email-based and IP-based authentication, and reCAPTCHA verification. We developed five example psychophysics tasks, that come preloaded in the environment, and ran a pilot experiment which was hosted on the AWS (Amazon Web Services) cloud. Pilot data collected for these tasks yielded similar results to those reported in laboratory settings. MOPP can thus help researchers collect large psychophysics datasets online, with reduced turnaround time, and in a standardized manner.

Authors:RoshikNagaSai Patibandla, Ross Greer
Title: Evaluating Driver Perceptions of Integrated Safety Monitoring Systems for Alcohol Impairment and Distraction
Abstract:
The increasing number of accidents caused by alcohol-impaired driving has prompted the development of integrated safety systems in vehicles to monitor driver behavior and prevent crashes. This paper explores how drivers perceive these systems, focusing on their comfort, trust, privacy concerns, and willingness to adopt the technology. Through a survey of 115 U.S. participants, the study reveals a preference for non-intrusive systems, such as those monitoring eye movements, over more restrictive technologies like alcohol detection devices. Privacy emerged as a major concern, with many participants preferring local data processing and anonymity. Trust in these systems was crucial for acceptance, as drivers are more likely to adapt their behavior when they believe the system is accurate and reliable. To encourage adoption, it is important to address concerns about privacy and balance the benefits of safety with personal freedom. By improving transparency, ensuring reliability, and increasing public awareness, these systems could play a significant role in reducing road accidents and improving safety.

Authors:Md Ehtesham-Ul-Haque, Syed Masum Billah
Title: ToPSen: Task-Oriented Priming and Sensory Alignment for Comparing Coding Strategies Between Sighted and Blind Programmers
Abstract:
This paper examines how the coding strategies of sighted and blind programmers differ when working with audio feedback alone. The goal is to identify challenges in mixed-ability collaboration, particularly when sighted programmers work with blind peers or teach programming to blind students. To overcome limitations of traditional blindness simulation studies, we proposed Task-Oriented Priming and Sensory Alignment (ToPSen), a design framework that reframes sensory constraints as technical requirements rather than as a disability. Through a study of 12 blind and 12 sighted participants coding non-visually, we found that expert blind programmers maintain more accurate mental models and process more information in working memory than sighted programmers using ToPSen. Our analysis revealed that blind and sighted programmers process structural information differently, exposing gaps in current IDE designs. These insights inform our guidelines for improving the accessibility of programming tools and fostering effective mixed-ability collaboration.

Authors:Christopher Knievel, Alexander Bernhardt, Christian Bernhardt
Title: AITEE -- Agentic Tutor for Electrical Engineering
Abstract:
Intelligent tutoring systems combined with large language models offer a promising approach to address students' diverse needs and promote self-efficacious learning. While large language models possess good foundational knowledge of electrical engineering basics, they remain insufficiently capable of addressing specific questions about electrical circuits. In this paper, we present AITEE, an agent-based tutoring system for electrical engineering designed to accompany students throughout their learning process, offer individualized support, and promote self-directed learning. AITEE supports both hand-drawn and digital circuits through an adapted circuit reconstruction process, enabling natural interaction with students. Our novel graph-based similarity measure identifies relevant context from lecture materials through a retrieval augmented generation approach, while parallel Spice simulation further enhances accuracy in applying solution methodologies. The system implements a Socratic dialogue to foster learner autonomy through guided questioning. Experimental evaluations demonstrate that AITEE significantly outperforms baseline approaches in domain-specific knowledge application, with even medium-sized LLM models showing acceptable performance. Our results highlight the potential of agentic tutors to deliver scalable, personalized, and effective learning environments for electrical engineering education.

Authors:Jennifer Turliuk, Alejandro Sevilla, Daniela Gorza, Tod Hynes
Title: Enhancing Selection of Climate Tech Startups with AI -- A Case Study on Integrating Human and AI Evaluations in the ClimaTech Great Global Innovation Challenge
Abstract:
This case study examines the ClimaTech Great Global Innovation Challenge's approach to selecting climate tech startups by integrating human and AI evaluations. The competition aimed to identify top startups and enhance the accuracy and efficiency of the selection process through a hybrid model. Research shows data-driven approaches help VC firms reduce bias and improve decision-making. Machine learning models have outperformed human investors in deal screening, helping identify high-potential startups. Incorporating AI aimed to ensure more equitable and objective evaluations. The methodology included three phases: initial AI review, semi-finals judged by humans, and finals using a hybrid weighting. In phase one, 57 applications were scored by an AI tool built with StackAI and OpenAI's GPT-4o, and the top 36 advanced. In the semi-finals, human judges, unaware of AI scores, evaluated startups on team quality, market potential, and technological innovation. Each score - human or AI - was weighted equally, resulting in 75 percent human and 25 percent AI influence. In the finals, with five human judges, weighting shifted to 83.3 percent human and 16.7 percent AI. There was a moderate positive correlation between AI and human scores - Spearman's = 0.47 - indicating general alignment with key differences. Notably, the final four startups, selected mainly by humans, were among those rated highest by the AI. This highlights the complementary nature of AI and human judgment. The study shows that hybrid models can streamline and improve startup assessments. The ClimaTech approach offers a strong framework for future competitions by combining human expertise with AI capabilities.

Authors:Ibrahim Shoer, Engin Erzin
Title: Learning Annotation Consensus for Continuous Emotion Recognition
Abstract:
In affective computing, datasets often contain multiple annotations from different annotators, which may lack full agreement. Typically, these annotations are merged into a single gold standard label, potentially losing valuable inter-rater variability. We propose a multi-annotator training approach for continuous emotion recognition (CER) that seeks a consensus across all annotators rather than relying on a single reference label. Our method employs a consensus network to aggregate annotations into a unified representation, guiding the main arousal-valence predictor to better reflect collective inputs. Tested on the RECOLA and COGNIMUSE datasets, our approach outperforms traditional methods that unify annotations into a single label. This underscores the benefits of fully leveraging multi-annotator data in emotion recognition and highlights its applicability across various fields where annotations are abundant yet inconsistent.

Authors:Sherry Mason, Tawfiq Ammari
Title: Racism, Resistance, and Reddit: How Popular Culture Sparks Online Reckonings
Abstract:
This study examines how Reddit users engaged with the racial narratives of Lovecraft Country and Watchmen, two television series that reimagine historical racial trauma. Drawing on narrative persuasion and multistep flow theory, we analyze 3,879 Reddit comments using topic modeling and critical discourse analysis. We identify three dynamic social roles advocates, adversaries, and adaptives and explore how users move between them in response to racial discourse. Findings reveal how Reddits pseudonymous affordances shape role fluidity, opinion leadership, and moral engagement. While adversaries minimized or rejected racism as exaggerated, advocates shared standpoint experiences and historical resources to challenge these claims. Adaptive users shifted perspectives over time, demonstrating how online publics can foster critical racial learning. This research highlights how popular culture and participatory platforms intersect in shaping collective meaning making around race and historical memory.

Authors:Jingchao Fang, Mina Lee
Title: What Shapes Writers' Decisions to Disclose AI Use?
Abstract:
Have you ever read a blog or social media post and suspected that it was written--at least in part--by artificial intelligence (AI)? While transparently acknowledging contributors to writing is generally valued, why some writers choose to disclose or withhold AI involvement remains unclear. In this work, we ask what factors shape writers' decisions to disclose their AI use as a starting point to effectively advocate for transparency. To shed light on this question, we synthesize study findings and theoretical frameworks in human-AI interaction and behavioral science. Concretely, we identify and curate a list of factors that could affect writers' decisions regarding disclosure for human-AI co-created content.

Authors:P. S. Kesavan, Pontus Nordenfelt
Title: Reconceptualizing Smart Microscopy: From Data Collection to Knowledge Creation by Multi-Agent Integration
Abstract:
Smart microscopy represents a paradigm shift in biological imaging, moving from passive observation tools to active collaborators in scientific inquiry. Enabled by advances in automation, computational power, and artificial intelligence, these systems are now capable of adaptive decision-making and real-time experimental control. Here, we introduce a theoretical framework that reconceptualizes smart microscopy as a partner in scientific investigation. Central to our framework is the concept of the 'epistemic-empirical divide' in cellular investigation-the gap between what is observable (empirical domain) and what must be understood (epistemic domain). We propose six core design principles: epistemic-empirical awareness, hierarchical context integration, an evolution from detection to perception, adaptive measurement frameworks, narrative synthesis capabilities, and cross-contextual reasoning. Together, these principles guide a multi-agent architecture designed to align empirical observation with the goals of scientific understanding. Our framework provides a roadmap for building microscopy systems that go beyond automation to actively support hypothesis generation, insight discovery, and theory development, redefining the role of scientific instruments in the process of knowledge creation.

Authors:Antoni Gomila, Vincent C. Müller
Title: Challenges for artificial cognitive systems
Abstract:
The declared goal of this paper is to fill this gap: "... cognitive systems research needs questions or challenges that define progress. The challenges are not (yet more) predictions of the future, but a guideline to what are the aims and what would constitute progress." -- the quotation being from the project description of EUCogII, the project for the European Network for Cognitive Systems within which this formulation of the 'challenges' was originally developed (http://www.eucognition.org). So, we stick out our neck and formulate the challenges for artificial cognitive systems. These challenges are articulated in terms of a definition of what a cognitive system is: a system that learns from experience and uses its acquired knowledge (both declarative and practical) in a flexible manner to achieve its own goals.

Authors:Qingyu Liang, Jaime Banks
Title: On the Same Page: Dimensions of Perceived Shared Understanding in Human-AI Interaction
Abstract:
Shared understanding plays a key role in the effective communication in and performance of human-human interactions. With the increasingly common integration of AI into human contexts, the future of personal and workplace interactions will likely see human-AI interaction (HAII) in which the perception of shared understanding is important. Existing literature has addressed the processes and effects of PSU in human-human interactions, but the construal remains underexplored in HAII. To better understand PSU in HAII, we conducted an online survey to collect user reflections on interactions with a large language model when it sunderstanding of a situation was thought to be similar to or different from the participant's. Through inductive thematic analysis, we identified eight dimensions comprising PSU in human-AI interactions: Fluency, aligned operation, fluidity, outcome satisfaction, contextual awareness, lack of humanlike abilities, computational limits, and suspicion.

Authors:Maciej Swiechowski, Dominik Slezak
Title: The Many Challenges of Human-Like Agents in Virtual Game Environments
Abstract:
Human-like agents are an increasingly important topic in games and beyond. Believable non-player characters enhance the gaming experience by improving immersion and providing entertainment. They also offer players the opportunity to engage with AI entities that can function as opponents, teachers, or cooperating partners. Additionally, in games where bots are prohibited -- and even more so in non-game environments -- there is a need for methods capable of identifying whether digital interactions occur with bots or humans. This leads to two fundamental research questions: (1) how to model and implement human-like AI, and (2) how to measure its degree of human likeness. This article offers two contributions. The first one is a survey of the most significant challenges in implementing human-like AI in games (or any virtual environment featuring simulated agents, although this article specifically focuses on games). Thirteen such challenges, both conceptual and technical, are discussed in detail. The second is an empirical study performed in a tactical video game that addresses the research question: "Is it possible to distinguish human players from bots (AI agents) based on empirical data?" A machine-learning approach using a custom deep recurrent convolutional neural network is presented. We hypothesize that the more challenging it is to create human-like AI for a given game, the easier it becomes to develop a method for distinguishing humans from AI-driven players.

Authors:Baichuan Li, Larry Powell, Tracy Hammond
Title: It's Not Just Labeling -- A Research on LLM Generated Feedback Interpretability and Image Labeling Sketch Features
Abstract:
The quality of training data is critical to the performance of machine learning applications in domains like transportation, healthcare, and robotics. Accurate image labeling, however, often relies on time-consuming, expert-driven methods with limited feedback. This research introduces a sketch-based annotation approach supported by large language models (LLMs) to reduce technical barriers and enhance accessibility. Using a synthetic dataset, we examine how sketch recognition features relate to LLM feedback metrics, aiming to improve the reliability and interpretability of LLM-assisted labeling. We also explore how prompting strategies and sketch variations influence feedback quality. Our main contribution is a sketch-based virtual assistant that simplifies annotation for non-experts and advances LLM-driven labeling tools in terms of scalability, accessibility, and explainability.

Authors:Min Hun Lee, Martyn Zhe Yu Tok
Title: Towards Uncertainty Aware Task Delegation and Human-AI Collaborative Decision-Making
Abstract:
Despite the growing promise of artificial intelligence (AI) in supporting decision-making across domains, fostering appropriate human reliance on AI remains a critical challenge. In this paper, we investigate the utility of exploring distance-based uncertainty scores for task delegation to AI and describe how these scores can be visualized through embedding representations for human-AI decision-making. After developing an AI-based system for physical stroke rehabilitation assessment, we conducted a study with 19 health professionals and 10 students in medicine/health to understand the effect of exploring distance-based uncertainty scores on users' reliance on AI. Our findings showed that distance-based uncertainty scores outperformed traditional probability-based uncertainty scores in identifying uncertain cases. In addition, after exploring confidence scores for task delegation and reviewing embedding-based visualizations of distance-based uncertainty scores, participants achieved an 8.20% higher rate of correct decisions, a 7.15% higher rate of changing their decisions to correct ones, and a 7.14% lower rate of incorrect changes after reviewing AI outputs than those reviewing probability-based uncertainty scores ($p<0.01$). Our findings highlight the potential of distance-based uncertainty scores to enhance decision accuracy and appropriate reliance on AI while discussing ongoing challenges for human-AI collaborative decision-making.

Authors:Jiaying Fu, Yiyang Lu, Zehua Yang, Fiona Nah, RAY LC
Title: Cracking Aegis: An Adversarial LLM-based Game for Raising Awareness of Vulnerabilities in Privacy Protection
Abstract:
Traditional methods for raising awareness of privacy protection often fail to engage users or provide hands-on insights into how privacy vulnerabilities are exploited. To address this, we incorporate an adversarial mechanic in the design of the dialogue-based serious game Cracking Aegis. Leveraging LLMs to simulate natural interactions, the game challenges players to impersonate characters and extract sensitive information from an AI agent, Aegis. A user study (n=22) revealed that players employed diverse deceptive linguistic strategies, including storytelling and emotional rapport, to manipulate Aegis. After playing, players reported connecting in-game scenarios with real-world privacy vulnerabilities, such as phishing and impersonation, and expressed intentions to strengthen privacy control, such as avoiding oversharing personal information with AI systems. This work highlights the potential of LLMs to simulate complex relational interactions in serious games, while demonstrating how an adversarial game strategy provides unique insights for designs for social good, particularly privacy protection.

Authors:Mai Lee Chang, Kim Baraka, Greg Trafton, Zach Lalu Vazhekatt, Andrea Lockerd Thomaz
Title: Fairness and Efficiency in Human-Agent Teams: An Iterative Algorithm Design Approach
Abstract:
When agents interact with people as part of a team, fairness becomes an important factor. Prior work has proposed fairness metrics based on teammates' capabilities for task allocation within human-agent teams. However, most metrics only consider teammate capabilities from a third-person point of view (POV). In this work, we extend these metrics to include task preferences and consider a first-person POV. We leverage an iterative design method consisting of simulation data and human data to design a task allocation algorithm that balances task efficiency and fairness based on both capabilities and preferences. We first show that these metrics may not align with people's perceived fairness from a first-person POV. In light of this result, we propose a new fairness metric, fair-equity, and the Fair-Efficient Algorithm (FEA). Our findings suggest that an agent teammate who balances efficiency and fairness based on equity will be perceived to be fairer and preferred by human teammates in various human-agent team types. We suggest that the perception of fairness may also depend on a person's POV.

Authors:Brett Binst, Lien Michiels, Annelien Smets
Title: What Is Serendipity? An Interview Study to Conceptualize Experienced Serendipity in Recommender Systems
Abstract:
Serendipity has been associated with numerous benefits in the context of recommender systems, e.g., increased user satisfaction and consumption of long-tail items. Despite this, serendipity in the context of recommender systems has thus far remained conceptually ambiguous. This conceptual ambiguity has led to inconsistent operationalizations between studies, making it difficult to compare and synthesize findings. In this paper, we conceptualize the user's experience of serendipity. To this effect, we interviewed 17 participants and analyzed the data following the grounded theory paradigm. Based on these interviews, we conceptualize experienced serendipity as "a user experience in which a user unintentionally encounters content that feels fortuitous, refreshing, and enriching". We find that all three components -- fortuitous, refreshing and enriching -- are necessary and together are sufficient to classify a user's experience as serendipitous. However, these components can be satisfied through a variety of conditions. Our conceptualization unifies previous definitions of serendipity within a single framework, resolving inconsistencies by identifying distinct flavors of serendipity. It highlights underexposed flavors, offering new insights into how users experience serendipity in the context of recommender systems. By clarifying the components and conditions of experienced serendipity in recommender systems, this work can guide the design of recommender systems that stimulate experienced serendipity in their users, and lays the groundwork for developing a standardized operationalization of experienced serendipity in its many flavors, enabling more consistent and comparable evaluations.

Authors:Prasanna Parasurama, Panos Ipeirotis
Title: Algorithmic Hiring and Diversity: Reducing Human-Algorithm Similarity for Better Outcomes
Abstract:
Algorithmic tools are increasingly used in hiring to improve fairness and diversity, often by enforcing constraints such as gender-balanced candidate shortlists. However, we show theoretically and empirically that enforcing equal representation at the shortlist stage does not necessarily translate into more diverse final hires, even when there is no gender bias in the hiring stage. We identify a crucial factor influencing this outcome: the correlation between the algorithm's screening criteria and the human hiring manager's evaluation criteria -- higher correlation leads to lower diversity in final hires. Using a large-scale empirical analysis of nearly 800,000 job applications across multiple technology firms, we find that enforcing equal shortlists yields limited improvements in hire diversity when the algorithmic screening closely mirrors the hiring manager's preferences. We propose a complementary algorithmic approach designed explicitly to diversify shortlists by selecting candidates likely to be overlooked by managers, yet still competitive according to their evaluation criteria. Empirical simulations show that this approach significantly enhances gender diversity in final hires without substantially compromising hire quality. These findings highlight the importance of algorithmic design choices in achieving organizational diversity goals and provide actionable guidance for practitioners implementing fairness-oriented hiring algorithms.

Authors:Ulrike Kuhl, Annika Bush
Title: When Bias Backfires: The Modulatory Role of Counterfactual Explanations on the Adoption of Algorithmic Bias in XAI-Supported Human Decision-Making
Abstract:
Although the integration of artificial intelligence (AI) into everyday tasks improves efficiency and objectivity, it also risks transmitting bias to human decision-making. In this study, we conducted a controlled experiment that simulated hiring decisions to examine how biased AI recommendations - augmented with or without counterfactual explanations - influence human judgment over time. Participants, acting as hiring managers, completed 60 decision trials divided into a baseline phase without AI, followed by a phase with biased (X)AI recommendations (favoring either male or female candidates), and a final post-interaction phase without AI. Our results indicate that the participants followed the AI recommendations 70% of the time when the qualifications of the given candidates were comparable. Yet, only a fraction of participants detected the gender bias (8 out of 294). Crucially, exposure to biased AI altered participants' inherent preferences: in the post-interaction phase, participants' independent decisions aligned with the bias when no counterfactual explanations were provided before, but reversed the bias when explanations were given. Reported trust did not differ significantly across conditions. Confidence varied throughout the study phases after exposure to male-biased AI, indicating nuanced effects of AI bias on decision certainty. Our findings point to the importance of calibrating XAI to avoid unintended behavioral shifts in order to safeguard equitable decision-making and prevent the adoption of algorithmic bias.

Authors:Vedanshi Chetan Shah, Ab Mosca
Title: What is Visualization for Communication? Analyzing Four Years of VisComm Papers
Abstract:
With the introduction of the Visualization for Communication workshop (VisComm) at IEEE VIS and in light of the COVID-19 pandemic, there has been renewed interest in studying visualization as a medium of communication. However the characteristics and definition of this line of study tend to vary from paper to paper and person to person. In this work, we examine the 37 papers accepted to VisComm from 2018 through 2022. Using grounded theory we identify nuances in how VisComm defines visualization, common themes in the work in this area, and a noticeable gap in DEI practices.

Authors:Eloise Minder, Sylvain Fleury, Solène Neyret, Jean-Rémy Chardonnet
Title: The Virtual Reality Koinos Method: Analyzing Virtual Reality Collaboration from the perspective of communication models
Abstract:
Understanding which factors could influence co-presence in Virtual Reality could help develop more qualitative social interactions, or social interactions that generate similar sensations, emotions and feelings than the ones generated during Face-to-Face interactions. Co-presence is studied since the beginning of Virtual Reality (VR); though, no consensus is identified on what factors could influence it, except the consensus on the definition of "being there together" inside the Virtual Environment. In this paper, we introduce the Koinos method to explain social interactions in VR through communication models, (i) theoretically, and (ii) on two VR experiments that change the virtual partner social and physical representations. These analyses lead us to propose an equation to predict and help manage the sense of co-presence in VR.

Authors:Owais Mujtaba Khanday, Pablo Rodroguez San Esteban, Zubair Ahmad Lone, Marc Ouellet, Jose Andres Gonzalez Lopez
Title: Recreating Neural Activity During Speech Production with Language and Speech Model Embeddings
Abstract:
Understanding how neural activity encodes speech and language production is a fundamental challenge in neuroscience and artificial intelligence. This study investigates whether embeddings from large-scale, self-supervised language and speech models can effectively reconstruct high-gamma neural activity characteristics, key indicators of cortical processing, recorded during speech production. We leverage pre-trained embeddings from deep learning models trained on linguistic and acoustic data to represent high-level speech features and map them onto these high-gamma signals. We analyze the extent to which these embeddings preserve the spatio-temporal dynamics of brain activity. Reconstructed neural signals are evaluated against high-gamma ground-truth activity using correlation metrics and signal reconstruction quality assessments. The results indicate that high-gamma activity can be effectively reconstructed using large language and speech model embeddings in all study participants, generating Pearson's correlation coefficients ranging from 0.79 to 0.99.

Authors:Sebastian Zepf, Mark Colley
Title: Human Authenticity and Flourishing in an AI-Driven World: Edmund's Journey and the Call for Mindfulness
Abstract:
Humans have always dreamed of possessing superpowers, and the rapid development of AI-based features promises to bring these dreams (closer) to reality. However, these advancements come with significant risks. This paper advocates for challenging existing methods and approaches in design and evaluation for more responsible AI. We stimulate reflection through a futuristic user journey illustrating the AI-driven life of Edmund in 2035. Subsequently, we discuss four AI-based superpowers: extended perception, cognitive offloading, externalized memory, and enhanced presence. We then discuss implications for HCI and AI, emphasizing the need for preserving intrinsic human superpowers, identifying meaningful use cases for AI, and evaluating AI's impact on human abilities. This paper advocates for responsible and reflective AI integration and proposes a pathway towards the idea of a Human Flourishing Benchmark.

Authors:Ryan Bowers, Richard Agbeyibor, Jack Kolb, Karen Feigh
Title: Model Cards for AI Teammates: Comparing Human-AI Team Familiarization Methods for High-Stakes Environments
Abstract:
We compare three methods of familiarizing a human with an artificial intelligence (AI) teammate ("agent") prior to operation in a collaborative, fast-paced intelligence, surveillance, and reconnaissance (ISR) environment. In a between-subjects user study (n=60), participants either read documentation about the agent, trained alongside the agent prior to the mission, or were given no familiarization. Results showed that the most valuable information about the agent included details of its decision-making algorithms and its relative strengths and weaknesses compared to the human. This information allowed the familiarization groups to form sophisticated team strategies more quickly than the control group. Documentation-based familiarization led to the fastest adoption of these strategies, but also biased participants towards risk-averse behavior that prevented high scores. Participants familiarized through direct interaction were able to infer much of the same information through observation, and were more willing to take risks and experiment with different control modes, but reported weaker understanding of the agent's internal processes. Significant differences were seen between individual participants' risk tolerance and methods of AI interaction, which should be considered when designing human-AI control interfaces. Based on our findings, we recommend a human-AI team familiarization method that combines AI documentation, structured in-situ training, and exploratory interaction.

Authors:Seongsil Heo, Calvin Murdock, Michael Proulx, Christi Miller
Title: Gaze-Enhanced Multimodal Turn-Taking Prediction in Triadic Conversations
Abstract:
Turn-taking prediction is crucial for seamless interactions. This study introduces a novel, lightweight framework for accurate turn-taking prediction in triadic conversations without relying on computationally intensive methods. Unlike prior approaches that either disregard gaze or treat it as a passive signal, our model integrates gaze with speaker localization, structuring it within a spatial constraint to transform it into a reliable predictive cue. Leveraging egocentric behavioral cues, our experiments demonstrate that incorporating gaze data from a single-user significantly improves prediction performance, while gaze data from multiple-users further enhances it by capturing richer conversational dynamics. This study presents a lightweight and privacy-conscious approach to support adaptive, directional sound control, enhancing speech intelligibility in noisy environments, particularly for hearing assistance in smart glasses.

Authors:Tim Pearce, David Souto, Douglas Barrett, Benjamin Lok, Mateusz Bocian, Artur Soczawa-Stronczyk, Giasemi Vavoula, Paul Long, Avinash Bhangaonkar, Stephanie Bowry, Michaela Butter, David Coke, Kate Loveman, Rosemary Sweet, Lars Tharp, Jeremy Webster, Hongji Yang, Robin Green, Andrew Hugill
Title: Sight, Sound and Smell in Immersive Experiences of Urban History: Virtual Vauxhall Gardens Case Study
Abstract:
We explore the integration of multisensory elements in virtual reality reconstructions of historical spaces through a case study of the Virtual Vauxhall Gardens project. While visual and auditory components have become standard in digital heritage experiences, the addition of olfactory stimuli remains underexplored, despite its powerful connection to memory and emotional engagement. This research investigates how multisensory experiences involving olfaction can be effectively integrated into VR reconstructions of historical spaces to enhance presence and engagement with cultural heritage. In the context of a VR reconstruction of London's eighteenth-century Vauxhall Pleasure Gardens, we developed a networked portable olfactory display capable of synchronizing specific scents with visual and auditory elements at pivotal moments in the virtual experience. Our evaluation methodology assesses both technical implementation and user experience, measuring presence, and usability metrics across diverse participant groups. Our results show that integrating synchronized olfactory stimuli into the VR experience can enhance user engagement and be perceived positively, contributing to a unique and immersive encounter with historical settings. While presence questionnaires indicated a strong sense of auditory presence and control, with other sensory factors rated moderately, user experience of attractiveness was exceptionally high; qualitative feedback suggested heightened sensory awareness and engagement influenced by the inclusion and anticipation of smell. Our results suggest that evaluating multisensory VR heritage experiences requires a nuanced approach, as standard usability metrics may be ill-suited and 'realism' might be less critical than creating an evocative, historically informed, and emotionally resonant experience......

Authors:Frédéric Tran Minh, Laure Gonnord, Julien Narboux
Title: Proof Assistants for Teaching: a Survey
Abstract:
In parallel to the ever-growing usage of mechanized proofs in diverse areas of mathematics and computer science, proof assistants are used more and more for education. This paper surveys previous work related to the use of proof assistants for (mostly undergraduate) teaching. This includes works where the authors report on their experiments using proof assistants to teach logic, mathematics or computer science, as well as designs or adaptations of proof assistants for teaching. We provide an overview of both tutoring systems that have been designed for teaching proof and proving, or general-purpose proof assistants that have been adapted for education, adding user interfaces and/or dedicated input or output languages.

Authors:Roberto Pugliese, George Kourousias, Francesco Venier, Grazia Garlatti Costa
Title: Agentic Publications: An LLM-Driven Framework for Interactive Scientific Publishing, Supplementing Traditional Papers with AI-Powered Knowledge Systems
Abstract:
The exponential growth of scientific literature presents significant challenges for researchers navigating the complex knowledge landscape. We propose "Agentic Publications", a novel LLM-driven framework complementing traditional publishing by transforming papers into interactive knowledge systems. Our architecture integrates structured data with unstructured content through retrieval-augmented generation and multi-agent verification. The framework offers interfaces for both humans and machines, combining narrative explanations with machine-readable outputs while addressing ethical considerations through automated validation and transparent governance. Key features include continuous knowledge updates, automatic integration of new findings, and customizable detail levels. Our proof-of-concept demonstrates multilingual interaction, API accessibility, and structured knowledge representation through vector databases, knowledge graphs, and verification agents. This approach enhances scientific communication across disciplines, improving efficiency and collaboration while preserving traditional publishing pathways, particularly valuable for interdisciplinary fields where knowledge integration remains challenging.

Authors:Rebecca Westhäußer, Frederik Berenz, Wolfgang Minker, Sebastian Zepf
Title: CAIM: Development and Evaluation of a Cognitive AI Memory Framework for Long-Term Interaction with Intelligent Agents
Abstract:
Large language models (LLMs) have advanced the field of artificial intelligence (AI) and are a powerful enabler for interactive systems. However, they still face challenges in long-term interactions that require adaptation towards the user as well as contextual knowledge and understanding of the ever-changing environment. To overcome these challenges, holistic memory modeling is required to efficiently retrieve and store relevant information across interaction sessions for suitable responses. Cognitive AI, which aims to simulate the human thought process in a computerized model, highlights interesting aspects, such as thoughts, memory mechanisms, and decision-making, that can contribute towards improved memory modeling for LLMs. Inspired by these cognitive AI principles, we propose our memory framework CAIM. CAIM consists of three modules: 1.) The Memory Controller as the central decision unit; 2.) the Memory Retrieval, which filters relevant data for interaction upon request; and 3.) the Post-Thinking, which maintains the memory storage. We compare CAIM against existing approaches, focusing on metrics such as retrieval accuracy, response correctness, contextual coherence, and memory storage. The results demonstrate that CAIM outperforms baseline frameworks across different metrics, highlighting its context-awareness and potential to improve long-term human-AI interactions.

Authors:Himel Ghosh, Ahmed Mosharafa, Georg Groh
Title: To Bias or Not to Bias: Detecting bias in News with bias-detector
Abstract:
Media bias detection is a critical task in ensuring fair and balanced information dissemination, yet it remains challenging due to the subjectivity of bias and the scarcity of high-quality annotated data. In this work, we perform sentence-level bias classification by fine-tuning a RoBERTa-based model on the expert-annotated BABE dataset. Using McNemar's test and the 5x2 cross-validation paired t-test, we show statistically significant improvements in performance when comparing our model to a domain-adaptively pre-trained DA-RoBERTa baseline. Furthermore, attention-based analysis shows that our model avoids common pitfalls like oversensitivity to politically charged terms and instead attends more meaningfully to contextually relevant tokens. For a comprehensive examination of media bias, we present a pipeline that combines our model with an already-existing bias-type classifier. Our method exhibits good generalization and interpretability, despite being constrained by sentence-level analysis and dataset size because of a lack of larger and more advanced bias corpora. We talk about context-aware modeling, bias neutralization, and advanced bias type classification as potential future directions. Our findings contribute to building more robust, explainable, and socially responsible NLP systems for media bias detection.

Authors:Atsuya Kusui, Susumu Hirai, Asuka Takai
Title: Development of a non-wearable support robot capable of reproducing natural standing-up movements
Abstract:
To reproduce natural standing-up motion, recent studies have emphasized the importance of coordination between the assisting robot and the human. However, many non-wearable assistive devices have struggled to replicate natural motion trajectories. While wearable devices offer better coordination with the human body, they present challenges in completely isolating mechanical and electrical hazards. To address this, we developed a novel standing-assist robot that integrates features of both wearable and non-wearable systems, aiming to achieve high coordination while maintaining safety. The device employs a four-link mechanism aligned with the human joint structure, designed to reproduce the S-shaped trajectory of the hip and the arc trajectory of the knee during natural standing-up motion. Subject-specific trajectory data were obtained using a gyroscope, and the link lengths were determined to drive the seat along the optimal path. A feedforward speed control using a stepping motor was implemented, and the reproducibility of the trajectory was evaluated based on the geometric constraints of the mechanism. A load-bearing experiment with weights fixed to the seat was conducted to assess the trajectory accuracy under different conditions. Results showed that the reproduction errors for the hip and knee trajectories remained within approximately 4 percent of the seat's total displacement, demonstrating high fidelity to the target paths. In addition, durability testing, thermal safety evaluation, and risk assessment confirmed the reliability and safety of the system for indoor use. These findings suggest that the proposed design offers a promising approach for developing assistive technologies that adapt to individual physical characteristics, with potential applications in elderly care and rehabilitation.

Authors:Manari Hirose, Masato Uchida
Title: Decoding the Mind of Large Language Models: A Quantitative Evaluation of Ideology and Biases
Abstract:
The widespread integration of Large Language Models (LLMs) across various sectors has highlighted the need for empirical research to understand their biases, thought patterns, and societal implications to ensure ethical and effective use. In this study, we propose a novel framework for evaluating LLMs, focusing on uncovering their ideological biases through a quantitative analysis of 436 binary-choice questions, many of which have no definitive answer. By applying our framework to ChatGPT and Gemini, findings revealed that while LLMs generally maintain consistent opinions on many topics, their ideologies differ across models and languages. Notably, ChatGPT exhibits a tendency to change their opinion to match the questioner's opinion. Both models also exhibited problematic biases, unethical or unfair claims, which might have negative societal impacts. These results underscore the importance of addressing both ideological and ethical considerations when evaluating LLMs. The proposed framework offers a flexible, quantitative method for assessing LLM behavior, providing valuable insights for the development of more socially aligned AI systems.

Authors:Kwong Chiu Fung, Wai Ho Mow
Title: TrainBo: An Interactive Robot-assisted Scenario Training System for Older Adults with Dementia
Abstract:
Dementia is an overall decline in memory and cognitive skills severe enough to reduce an elders ability to perform everyday activities. There is an increasing need for accessible technologies for cognitive training to slow down the cognitive decline. With the ability to provide instant feedback and assistance, social robotic systems have been proven effective in enhancing learning abilities across various age groups. This study focuses on the design of an interactive robot-assisted scenario training system TrainBo with self-determination theory, derives design requirements through formative and formal studies and the system usability is also be evaluated. A pilot test is conducted on seven older adults with dementia in an elderly care center in Hong Kong for four weeks. Our finding shows that older adults with dementia have an improvement in behavioural engagement, emotional engagement, and intrinsic motivation after using Trainbo. These findings can provide valuable insights into the development of more captivating interactive robots for extensive training purposes.

Authors:Patryk Bartkowiak, Michal Podstawski
Title: EdgeWisePersona: A Dataset for On-Device User Profiling from Natural Language Interactions
Abstract:
This paper introduces a novel dataset and evaluation benchmark designed to assess and improve small language models deployable on edge devices, with a focus on user profiling from multi-session natural language interactions in smart home environments. At the core of the dataset are structured user profiles, each defined by a set of routines - context-triggered, repeatable patterns of behavior that govern how users interact with their home systems. Using these profiles as input, a large language model (LLM) generates corresponding interaction sessions that simulate realistic, diverse, and context-aware dialogues between users and their devices. The primary task supported by this dataset is profile reconstruction: inferring user routines and preferences solely from interactions history. To assess how well current models can perform this task under realistic conditions, we benchmarked several state-of-the-art compact language models and compared their performance against large foundation models. Our results show that while small models demonstrate some capability in reconstructing profiles, they still fall significantly short of large models in accurately capturing user behavior. This performance gap poses a major challenge - particularly because on-device processing offers critical advantages, such as preserving user privacy, minimizing latency, and enabling personalized experiences without reliance on the cloud. By providing a realistic, structured testbed for developing and evaluating behavioral modeling under these constraints, our dataset represents a key step toward enabling intelligent, privacy-respecting AI systems that learn and adapt directly on user-owned devices.

Authors:Jenny Xiyu Fu, Brennan Antone, Kowe Kadoma, Malte Jung
Title: Large Language Model Use Impact Locus of Control
Abstract:
As AI tools increasingly shape how we write, they may also quietly reshape how we perceive ourselves. This paper explores the psychological impact of co-writing with AI on people's locus of control. Through an empirical study with 462 participants, we found that employment status plays a critical role in shaping users' reliance on AI and their locus of control. Current results demonstrated that employed participants displayed higher reliance on AI and a shift toward internal control, while unemployed users tended to experience a reduction in personal agency. Through quantitative results and qualitative observations, this study opens a broader conversation about AI's role in shaping personal agency and identity.

Authors:Koki Iwai, Yusuke Kumagae, Yuki Koyama, Masahiro Hamasaki, Masataka Goto
Title: Constrained Preferential Bayesian Optimization and Its Application in Banner Ad Design
Abstract:
Preferential Bayesian optimization (PBO) is a variant of Bayesian optimization that observes relative preferences (e.g., pairwise comparisons) instead of direct objective values, making it especially suitable for human-in-the-loop scenarios. However, real-world optimization tasks often involve inequality constraints, which existing PBO methods have not yet addressed. To fill this gap, we propose constrained preferential Bayesian optimization (CPBO), an extension of PBO that incorporates inequality constraints for the first time. Specifically, we present a novel acquisition function for this purpose. Our technical evaluation shows that our CPBO method successfully identifies optimal solutions by focusing on exploring feasible regions. As a practical application, we also present a designer-in-the-loop system for banner ad design using CPBO, where the objective is the designer's subjective preference, and the constraint ensures a target predicted click-through rate. We conducted a user study with professional ad designers, demonstrating the potential benefits of our approach in guiding creative design under real-world constraints.

Authors:Go Fukino, Kanta Tachibana
Title: A Convolution-Based Gait Asymmetry Metric for Inter-Limb Synergistic Coordination
Abstract:
This study focuses on the velocity patterns of various body parts during walking and proposes a method for evaluating gait symmetry. Traditional motion analysis studies have assessed gait symmetry based on differences in electromyographic (EMG) signals or acceleration between the left and right sides. In contrast, this paper models intersegmental coordination using an LTI system and proposes a dissimilarity metric to evaluate symmetry. The method was tested on five subjects with both symmetric and asymmetric gait.

Authors:Jiaheng Wang, Zhenyu Wang, Tianheng Xu, Yuan Si, Ang Li, Ting Zhou, Xi Zhao, Honglin Hu
Title: Bridging BCI and Communications: A MIMO Framework for EEG-to-ECoG Wireless Channel Modeling
Abstract:
As a method to connect human brain and external devices, Brain-computer interfaces (BCIs) are receiving extensive research attention. Recently, the integration of communication theory with BCI has emerged as a popular trend, offering potential to enhance system performance and shape next-generation communications. A key challenge in this field is modeling the brain wireless communication channel between intracranial electrocorticography (ECoG) emitting neurons and extracranial electroencephalography (EEG) receiving electrodes. However, the complex physiology of brain challenges the application of traditional channel modeling methods, leaving relevant research in its infancy. To address this gap, we propose a frequency-division multiple-input multiple-output (MIMO) estimation framework leveraging simultaneous macaque EEG and ECoG recordings, while employing neurophysiology-informed regularization to suppress noise interference. This approach reveals profound similarities between neural signal propagation and multi-antenna communication systems. Experimental results show improved estimation accuracy over conventional methods while highlighting a trade-off between frequency resolution and temporal stability determined by signal duration. This work establish a conceptual bridge between neural interfacing and communication theory, accelerating synergistic developments in both fields.

Authors:Brandon Lepine, Gawesha Weerantunga, Juho Kim, Pamela Mishkin, Matthew Beane
Title: Evaluations at Work: Measuring the Capabilities of GenAI in Use
Abstract:
Current AI benchmarks miss the messy, multi-turn nature of human-AI collaboration. We present an evaluation framework that decomposes real-world tasks into interdependent subtasks, letting us track both LLM performance and users' strategies across a dialogue. Complementing this framework, we develop a suite of metrics, including a composite usage derived from semantic similarity, word overlap, and numerical matches; structural coherence; intra-turn diversity; and a novel measure of the "information frontier" reflecting the alignment between AI outputs and users' working knowledge. We demonstrate our methodology in a financial valuation task that mirrors real-world complexity. Our empirical findings reveal that while greater integration of LLM-generated content generally enhances output quality, its benefits are moderated by factors such as response incoherence, excessive subtask diversity, and the distance of provided information from users' existing knowledge. These results suggest that proactive dialogue strategies designed to inject novelty may inadvertently undermine task performance. Our work thus advances a more holistic evaluation of human-AI collaboration, offering both a robust methodological framework and actionable insights for developing more effective AI-augmented work processes.

Authors:Julian Wolter, Amr Gomaa
Title: Predicting Human Behavior in Autonomous Systems: A Collaborative Machine Teaching Approach for Reducing Transfer of Control Events
Abstract:
As autonomous systems become integral to various industries, effective strategies for fault handling are essential to ensure reliability and efficiency. Transfer of Control (ToC), a traditional approach for interrupting automated processes during faults, is often triggered unnecessarily in non-critical situations. To address this, we propose a data-driven method that uses human interaction data to train AI models capable of preemptively identifying and addressing issues or assisting users in resolution. Using an interactive tool simulating an industrial vacuum cleaner, we collected data and developed an LSTM-based model to predict user behavior. Our findings reveal that even data from non-experts can effectively train models to reduce unnecessary ToC events, enhancing the system's robustness. This approach highlights the potential of AI to learn directly from human problem-solving behaviors, complementing sensor data to improve industrial automation and human-AI collaboration.

Authors:Yonghyun Kim, Sangheon Park, Marcus Parker, Donghoon Seu, Alexandria Smith
Title: NeoLightning: A Modern Reimagination of Gesture-Based Sound Design
Abstract:
This paper introduces NeoLightning, a modern reinterpretation of the Buchla Lightning. NeoLightning preserves the innovative spirit of Don Buchla's "Buchla Lightning" (introduced in the 1990s) while making its gesture-based interaction accessible to contemporary users. While the original Buchla Lightning and many other historical instruments were groundbreaking in their time, they are now largely unsupported, limiting user interaction to indirect experiences. To address this, NeoLightning leverages MediaPipe for deep learning-based gesture recognition and employs Max/MSP and Processing for real-time multimedia processing. The redesigned system offers precise, low-latency gesture recognition and immersive 3D interaction. By merging the creative spirit of the original Lightning with modern advancements, NeoLightning redefines gesture-based musical interaction, expanding possibilities for expressive performance and interactive sound design.

Authors:Yun Ho, Romain Nith, Peili Jiang, Shan-Yuan Teng, Pedro Lopes
Title: Generative Muscle Stimulation: Physical Assistance by Constraining Multimodal-AI with Biomechanical Knowledge
Abstract:
Decades of interactive electrical-muscle-stimulation (EMS) revealed its promise as a wearable interface for physical assistance-EMS directly demonstrates movements through the users' body (e.g., shaking a spray-can before painting). However, interactive EMS-systems are highly-specialized because their feedback is (1) fixed (e.g., one program executes spray-can instructions, another executes piano instructions) and (2) non-contextual (e.g., using a spray-can while cooking likely involves cooking oil, not paint, and thus shaking is unnecessary). To address this, we explored a more flexible approach and engineered a system that generates muscle-stimulation-instructions given the user's context. Through our examples, we show that such a system is flexible: it enables unprecedented EMS-interactions (e.g., opening a child-proof pill bottle cap) but also replicates existing systems (e.g., shake a spray can)-all without requiring task-specific programming. To achieve this, our system takes in user's spoken-requests and images from their point of view. It uses computer vision (e.g., detect objects/handedness) and large-language-models (e.g., reason about objects/situations) to generate textual-instructions. Finally, these instructions are then constrained by biomechanical-knowledge (e.g., joint limits, kinematic-chain, EMS capabilities) to produce suitable muscle-stimulation gestures. We believe our concept marks a shift toward more general-purpose EMS-interfaces, enabling more flexible and context-aware assistance.

Authors:Manisha Mehta, Fausto Giunchiglia
Title: Understanding Gen Alpha Digital Language: Evaluation of LLM Safety Systems for Content Moderation
Abstract:
This research offers a unique evaluation of how AI systems interpret the digital language of Generation Alpha (Gen Alpha, born 2010-2024). As the first cohort raised alongside AI, Gen Alpha faces new forms of online risk due to immersive digital engagement and a growing mismatch between their evolving communication and existing safety tools. Their distinct language, shaped by gaming, memes, and AI-driven trends, often conceals harmful interactions from both human moderators and automated systems. We assess four leading AI models (GPT-4, Claude, Gemini, and Llama 3) on their ability to detect masked harassment and manipulation within Gen Alpha discourse. Using a dataset of 100 recent expressions from gaming platforms, social media, and video content, the study reveals critical comprehension failures with direct implications for online safety. This work contributes: (1) a first-of-its-kind dataset capturing Gen Alpha expressions; (2) a framework to improve AI moderation systems for youth protection; (3) a multi-perspective evaluation including AI systems, human moderators, and parents, with direct input from Gen Alpha co-researchers; and (4) an analysis of how linguistic divergence increases youth vulnerability. Findings highlight the urgent need to redesign safety systems attuned to youth communication, especially given Gen Alpha reluctance to seek help when adults fail to understand their digital world. This study combines the insight of a Gen Alpha researcher with systematic academic analysis to address critical digital safety challenges.

Authors:Anh Tuan Ha, Hoang Khang Phan, Thai Minh Tien Ngo, Anh Phan Truong, Nhat Tan Le
Title: SOS: A Shuffle Order Strategy for Data Augmentation in Industrial Human Activity Recognition
Abstract:
In the realm of Human Activity Recognition (HAR), obtaining high quality and variance data is still a persistent challenge due to high costs and the inherent variability of real-world activities. This study introduces a generation dataset by deep learning approaches (Attention Autoencoder and conditional Generative Adversarial Networks). Another problem that data heterogeneity is a critical challenge, one of the solutions is to shuffle the data to homogenize the distribution. Experimental results demonstrate that the random sequence strategy significantly improves classification performance, achieving an accuracy of up to 0.70 $\pm$ 0.03 and a macro F1 score of 0.64 $\pm$ 0.01. For that, disrupting temporal dependencies through random sequence reordering compels the model to focus on instantaneous recognition, thereby improving robustness against activity transitions. This approach not only broadens the effective training dataset but also offers promising avenues for enhancing HAR systems in complex, real-world scenarios.

Authors:Muzhe Wu, Yanzhi Zhao, Shuyi Han, Michael Xieyang Liu, Hong Shen
Title: AI LEGO: Scaffolding Cross-Functional Collaboration in Industrial Responsible AI Practices during Early Design Stages
Abstract:
Responsible AI (RAI) efforts increasingly emphasize the importance of addressing potential harms early in the AI development lifecycle through social-technical lenses. However, in cross-functional industry teams, this work is often stalled by a persistent knowledge handoff challenge: the difficulty of transferring high-level, early-stage technical design rationales from technical experts to non-technical or user-facing roles for ethical evaluation and harm identification. Through literature review and a co-design study with 8 practitioners, we unpack how this challenge manifests -- technical design choices are rarely handed off in ways that support meaningful engagement by non-technical roles; collaborative workflows lack shared, visual structures to support mutual understanding; and non-technical practitioners are left without scaffolds for systematic harm evaluation. Existing tools like JIRA or Google Docs, while useful for product tracking, are ill-suited for supporting joint harm identification across roles, often requiring significant extra effort to align understanding. To address this, we developed AI LEGO, a web-based prototype that supports cross-functional AI practitioners in effectively facilitating knowledge handoff and identifying harmful design choices in the early design stages. Technical roles use interactive blocks to draft development plans, while non-technical roles engage with those blocks through stage-specific checklists and LLM-driven persona simulations to surface potential harms. In a study with 18 cross-functional practitioners, AI LEGO increased the volume and likelihood of harms identified compared to baseline worksheets. Participants found that its modular structure and persona prompts made harm identification more accessible, fostering clearer and more collaborative RAI practices in early design.

Authors:Kayo Mimizuka, Megan A Brown, Kai-Cheng Yang, Josephine Lukito
Title: Post-Post-API Age: Studying Digital Platforms in Scant Data Access Times
Abstract:
Over the past decade, data provided by digital platforms has informed substantial research in HCI to understand online human interaction and communication. Following the closure of major social media APIs that previously provided free access to large-scale data (the "post-API age"), emerging data access programs required by the European Union's Digital Services Act (DSA) have sparked optimism about increased platform transparency and renewed opportunities for comprehensive research on digital platforms, leading to the "post-post-API age." However, it remains unclear whether platforms provide adequate data access in practice. To assess how platforms make data available under the DSA, we conducted a comprehensive survey followed by in-depth interviews with 19 researchers to understand their experiences with data access in this new era. Our findings reveal significant challenges in accessing social media data, with researchers facing multiple barriers including complex API application processes, difficulties obtaining credentials, and limited API usability. These challenges have exacerbated existing institutional, regional, and financial inequities in data access. Based on these insights, we provide actionable recommendations for platforms, researchers, and policymakers to foster more equitable and effective data access, while encouraging broader dialogue within the CSCW community around interdisciplinary and multi-stakeholder solutions.

Authors:Nasif Zaman, Venkatesh Potluri, Brandon Biggs, James M. Coughlan
Title: WhatsAI: Transforming Meta Ray-Bans into an Extensible Generative AI Platform for Accessibility
Abstract:
Multi-modal generative AI models integrated into wearable devices have shown significant promise in enhancing the accessibility of visual information for blind or visually impaired (BVI) individuals, as evidenced by the rapid uptake of Meta Ray-Bans among BVI users. However, the proprietary nature of these platforms hinders disability-led innovation of visual accessibility technologies. For instance, OpenAI showcased the potential of live, multi-modal AI as an accessibility resource in 2024, yet none of the presented applications have reached BVI users, despite the technology being available since then. To promote the democratization of visual access technology development, we introduce WhatsAI, a prototype extensible framework that empowers BVI enthusiasts to leverage Meta Ray-Bans to create personalized wearable visual accessibility technologies. Our system is the first to offer a fully hackable template that integrates with WhatsApp, facilitating robust Accessible Artificial Intelligence Implementations (AAII) that enable blind users to conduct essential visual assistance tasks, such as real-time scene description, object detection, and Optical Character Recognition (OCR), utilizing standard machine learning techniques and cutting-edge visual language models. The extensible nature of our framework aspires to cultivate a community-driven approach, led by BVI hackers and innovators to tackle the complex challenges associated with visual accessibility.

Authors:Gino Carmona-Díaz, William Jiménez-Leal, María Alejandra Grisales, Chandra Sripada, Santiago Amaya, Michael Inzlicht, Juan Pablo Bermúdez
Title: An AI-Powered Research Assistant in the Lab: A Practical Guide for Text Analysis Through Iterative Collaboration with LLMs
Abstract:
Analyzing texts such as open-ended responses, headlines, or social media posts is a time- and labor-intensive process highly susceptible to bias. LLMs are promising tools for text analysis, using either a predefined (top-down) or a data-driven (bottom-up) taxonomy, without sacrificing quality. Here we present a step-by-step tutorial to efficiently develop, test, and apply taxonomies for analyzing unstructured data through an iterative and collaborative process between researchers and LLMs. Using personal goals provided by participants as an example, we demonstrate how to write prompts to review datasets and generate a taxonomy of life domains, evaluate and refine the taxonomy through prompt and direct modifications, test the taxonomy and assess intercoder agreements, and apply the taxonomy to categorize an entire dataset with high intercoder reliability. We discuss the possibilities and limitations of using LLMs for text analysis.

Authors:Hussaini Zubairu, Abdelrahaman Abdou, Ashraf Matrawy
Title: Evaluation Metrics for Misinformation Warning Interventions: Challenges and Prospects
Abstract:
Misinformation has become a widespread issue in the 21st century, impacting numerous areas of society and underscoring the need for effective intervention strategies. Among these strategies, user-centered interventions, such as warning systems, have shown promise in reducing the spread of misinformation. Many studies have used various metrics to evaluate the effectiveness of these warning interventions. However, no systematic review has thoroughly examined these metrics in all studies. This paper provides a comprehensive review of existing metrics for assessing the effectiveness of misinformation warnings, categorizing them into four main groups: behavioral impact, trust and credulity, usability, and cognitive and psychological effects. Through this review, we identify critical challenges in measuring the effectiveness of misinformation warnings, including inconsistent use of cognitive and attitudinal metrics, the lack of standardized metrics for affective and emotional impact, variations in user trust, and the need for more inclusive warning designs. We present an overview of these metrics and propose areas for future research.

Authors:Seitaro Kaneko, Hiroki Ishizuka, Hidenori Yoshimura, Hiroyuki Kajimoto
Title: Utilization of Skin Color Change for Image-based Tactile Sensing
Abstract:
Measurement of pressure distribution applied to a fingertip is crucial for the teleoperation of robots and human computer interface. Previous studies have acquired pressure distribution by affixing a sensor array to the fingertip or by optically recording the deformation of an object. However, these existing methods inhibit the fingertip from directly contacting the texture, and the pressure applied to the fingertip is measured indirectly. In this study, we propose a method to measure pressure distribution by directly touching a transparent object, focusing on the change in skin color induced by the applied pressure, caused by blood flow. We evaluated the relationship between pressure and skin color change when local pressure is applied, and found a correlation between the pressure and the color change. However, the contact area and the color change area did not align perfectly. We further explored the factor causing the spatial non-uniformity of the color change, by accounting for the stress distribution using finite element analysis. These results suggest that the proposed measurement method can be utilized to measure the internal stress distribution, and it is anticipated to serve as a simple sensor in the field of human computer interface.

Authors:Hyunyoung Han, Jongwon Jang, Kitaeg Shim, Sang Ho Yoon
Title: AfforDance: Personalized AR Dance Learning System with Visual Affordance
Abstract:
We propose AfforDance, an augmented reality (AR)-based dance learning system that generates personalized learning content and enhances learning through visual affordances. Our system converts user-selected dance videos into interactive learning experiences by integrating 3D reference avatars, audio synchronization, and adaptive visual cues that guide movement execution. This work contributes to personalized dance education by offering an adaptable, user-centered learning interface.

Authors:Alexander P. Ryjov, Alina A. Egorova
Title: A Note on Semantic Diffusion
Abstract:
This paper provides an in-depth examination of the concept of semantic diffusion as a complementary instrument to large language models (LLMs) for design applications. Conventional LLMs and diffusion models fail to induce a convergent, iterative refinement process: each invocation of the diffusion mechanism spawns a new stochastic cycle, so successive outputs do not relate to prior ones and convergence toward a desired design is not guaranteed. The proposed hybrid framework - "LLM + semantic diffusion" - resolves this limitation by enforcing an approximately convergent search procedure, thereby formally addressing the problem of localized design refinement.

Authors:Yu Lun Hsu, Yun-Rung Chou, Chiao-Ju Chang, Yu-Cheng Chang, Zer-Wei Lee, Rokas Gipiškis, Rachel Li, Chih-Yuan Shih, Jen-Kuei Peng, Hsien-Liang Huang, Jaw-Shiun Tsai, Mike Y. Chen
Title: PreCare: Designing AI Assistants for Advance Care Planning (ACP) to Enhance Personal Value Exploration, Patient Knowledge, and Decisional Confidence
Abstract:
Advance Care Planning (ACP) allows individuals to specify their preferred end-of-life life-sustaining treatments before they become incapacitated by injury or terminal illness (e.g., coma, cancer, dementia). While online ACP offers high accessibility, it lacks key benefits of clinical consultations, including personalized value exploration, immediate clarification of decision consequences. To bridge this gap, we conducted two formative studies: 1) shadowed and interviewed 3 ACP teams consisting of physicians, nurses, and social workers (18 patients total), and 2) interviewed 14 users of ACP websites. Building on these insights, we designed PreCare in collaboration with 6 ACP professionals. PreCare is a website with 3 AI-driven assistants designed to guide users through exploring personal values, gaining ACP knowledge, and supporting informed decision-making. A usability study (n=12) showed that PreCare achieved a System Usability Scale (SUS) rating of excellent. A comparative evaluation (n=12) showed that PreCare's AI assistants significantly improved exploration of personal values, knowledge, and decisional confidence, and was preferred by 92% of participants.

Authors:Parth Arora, Ethan Kimmel, Katherine Huang, Tyler Kwok, Yukun Song, Sofia Vempala, Georgianna Lin, Ozan Cakmakci, Thad Starner
Title: Positioning Monocular Optical See Through Head Worn Displays in Glasses for Everyday Wear
Abstract:
Head-worn displays for everyday wear in the form of regular eyeglasses are technically feasible with recent advances in waveguide technology. One major design decision is determining where in the user's visual field to position the display. Centering the display in the principal point of gaze (PPOG) allows the user to switch attentional focus between the virtual and real images quickly, and best performance often occurs when the display is centered in PPOG or is centered vertically below PPOG. However, these positions are often undesirable in that they are considered interruptive or are associated with negative social perceptions by users. Offsetting the virtual image may be preferred when tasks involve driving, walking, or social interaction. This paper consolidates findings from recent studies on monocular optical see-through HWDs (OST-HWDs), focusing on potential for interruption, comfort, performance, and social perception. For text-based tasks, which serve as a proxy for many monocular OST-HWD tasks, we recommend a 15° horizontal field of view (FOV) with the virtual image in the right lens vertically centered but offset to +8.7° to +23.7° toward the ear. Glanceable content can be offset up to +30° for short interactions.

Authors:Vladimír Lazárik, Marco Agus, Barbora Kozlíková, Pere-Pau Vázquez
Title: VizCV: AI-assisted visualization of researchers' publications tracks
Abstract:
Analyzing how the publication records of scientists and research groups have evolved over the years is crucial for assessing their expertise since it can support the management of academic environments by assisting with career planning and evaluation. We introduce VizCV, a novel web-based end-to-end visual analytics framework that enables the interactive exploration of researchers' scientific trajectories. It incorporates AI-assisted analysis and supports automated reporting of career evolution. Our system aims to model career progression through three key dimensions: a) research topic evolution to detect and visualize shifts in scholarly focus over time, b) publication record and the corresponding impact, c) collaboration dynamics depicting the growth and transformation of a researcher's co-authorship network. AI-driven insights provide automated explanations of career transitions, detecting significant shifts in research direction, impact surges, or collaboration expansions. The system also supports comparative analysis between researchers, allowing users to compare topic trajectories and impact growth. Our interactive, multi-tab and multiview system allows for the exploratory analysis of career milestones under different perspectives, such as the most impactful articles, emerging research themes, or obtaining a detailed analysis of the contribution of the researcher in a subfield. The key contributions include AI/ML techniques for: a) topic analysis, b) dimensionality reduction for visualizing patterns and trends, c) the interactive creation of textual descriptions of facets of data through configurable prompt generation and large language models, that include key indicators, to help understanding the career development of individuals or groups.

Authors:Marco Maida, Alberto Crescini, Marco Perronet, Elena Camuffo
Title: Claycode: Stylable and Deformable 2D Scannable Codes
Abstract:
This paper introduces Claycode, a novel 2D scannable code designed for extensive stylization and deformation. Unlike traditional matrix-based codes (e.g., QR codes), Claycodes encode their message in a tree structure. During the encoding process, bits are mapped into a topology tree, which is then depicted as a nesting of color regions drawn within the boundaries of a target polygon shape. When decoding, Claycodes are extracted and interpreted in real-time from a camera stream. We detail the end-to-end pipeline and show that Claycodes allow for extensive stylization without compromising their functionality. We then empirically demonstrate Claycode's high tolerance to heavy deformations, outperforming traditional 2D scannable codes in scenarios where they typically fail.

Authors:Alpay Sabuncuoglu, Christopher Burr, Carsten Maple
Title: Justified Evidence Collection for Argument-based AI Fairness Assurance
Abstract:
It is well recognised that ensuring fair AI systems is a complex sociotechnical challenge, which requires careful deliberation and continuous oversight across all stages of a system's lifecycle, from defining requirements to model deployment and deprovisioning. Dynamic argument-based assurance cases, which present structured arguments supported by evidence, have emerged as a systematic approach to evaluating and mitigating safety risks and hazards in AI-enabled system development and have also been extended to deal with broader normative goals such as fairness and explainability. This paper introduces a systems-engineering-driven framework, supported by software tooling, to operationalise a dynamic approach to argument-based assurance in two stages. In the first stage, during the requirements planning phase, a multi-disciplinary and multi-stakeholder team define goals and claims to be established (and evidenced) by conducting a comprehensive fairness governance process. In the second stage, a continuous monitoring interface gathers evidence from existing artefacts (e.g. metrics from automated tests), such as model, data, and use case documentation, to support these arguments dynamically. The framework's effectiveness is demonstrated through an illustrative case study in finance, with a focus on supporting fairness-related arguments.

Authors:S. E Emedem, I. E Onyenwe, E. G Onyedinma
Title: Development of a WAZOBIA-Named Entity Recognition System
Abstract:
Named Entity Recognition NER is very crucial for various natural language processing applications, including information extraction, machine translation, and sentiment analysis. Despite the ever-increasing interest in African languages within computational linguistics, existing NER systems focus mainly on English, European, and a few other global languages, leaving a significant gap for under-resourced languages. This research presents the development of a WAZOBIA-NER system tailored for the three most prominent Nigerian languages: Hausa, Yoruba, and Igbo. This research begins with a comprehensive compilation of annotated datasets for each language, addressing data scarcity and linguistic diversity challenges. Exploring the state-of-the-art machine learning technique, Conditional Random Fields (CRF) and deep learning models such as Bidirectional Long Short-Term Memory (BiLSTM), Bidirectional Encoder Representation from Transformers (Bert) and fine-tune with a Recurrent Neural Network (RNN), the study evaluates the effectiveness of these approaches in recognizing three entities: persons, organizations, and locations. The system utilizes optical character recognition (OCR) technology to convert textual images into machine-readable text, thereby enabling the Wazobia system to accept both input text and textual images for extraction purposes. The system achieved a performance of 0.9511 in precision, 0.9400 in recall, 0.9564 in F1-score, and 0.9301 in accuracy. The model's evaluation was conducted across three languages, with precision, recall, F1-score, and accuracy as key assessment metrics. The Wazobia-NER system demonstrates that it is feasible to build robust NER tools for under-resourced African languages using current NLP frameworks and transfer learning.

Authors:Kyriaki Syrigou, Marina Stoforou, Panagiotis Kourtesis
Title: Time Perception in Virtual Reality: Effects of Emotional Valence and Stress Level
Abstract:
Background & Objective: Emotional states and stress distort time perception, yet findings are inconsistent, particularly in immersive media. Integrating the Attentional Gate Model (AGM) and Internal Clock Model (ICM), we examined how emotional valence and stress alter perceived duration in Virtual Reality (VR). This study assesses the effects of valence (calming, neutral, stressful) and stress (low/high) on prospective time estimation, mood, and arousal. Methods: Fifty-four adults (18-39 years) explored three custom VR environments: (1) a tranquil Japanese garden, (2) an affectively neutral room, and (3) a threatening underground sewer. Active navigation promoted presence; a distraction task separated conditions. Valence and arousal were assessed with the Visual Analog Mood Scales, stress with the Perceived Stress Scale-10 (PSS-10), and perceived duration with a verbal estimation task. Mixed-model ANOVAs evaluated main and interaction effects. Results: Valence reliably shaped perceived duration: calming VR led to overestimation, stressful VR to underestimation, and neutral VR to intermediate timing. Baseline stress level, as measured by PSS-10, neither altered timing nor interacted with valence. Nevertheless, the VR environments affected VAMS' mood metrics: calming environments elevated mood and reduced perceived stress, whereas stressful environments lowered mood and heightened stress. Conclusions: Findings support the AGM-attentionally demanding negative environments shorten perceived time-and the ICM-valence-linked arousal speeds or slows the pacemaker. Contrary to classical predictions, in VR, baseline stress did not distort duration, suggesting valence-driven attentional allocation outweighs pre-exposure stress levels. VR offers a controllable platform for dissecting time-perception mechanisms and advancing interventions that target emotion-related temporal distortions.

Authors:Zihan Gao, Justin Cranshaw, Jacob Thebault-Spieker
Title: A Turing Test for ''Localness'': Conceptualizing, Defining, and Recognizing Localness in People and Machines
Abstract:
As digital platforms increasingly mediate interactions tied to place, ensuring genuine local participation is essential for maintaining trust and credibility in location-based services, community-driven platforms, and civic engagement systems. However, localness is a social and relational identity shaped by knowledge, participation, and community recognition. Drawing on the German philosopher Heidegger's concept of dwelling -- which extends beyond physical presence to encompass meaningful connection to place -- we investigate how people conceptualize and evaluate localness in both human and artificial agents. Using a chat-based interaction paradigm inspired by Turing's Imitation Game and Von Ahn's Games With A Purpose, we engaged 230 participants in conversations designed to examine the cues people rely on to assess local presence. Our findings reveal a multi-dimensional framework of localness, highlighting differences in how locals and nonlocals emphasize various aspects of local identity. We show that people are significantly more accurate in recognizing locals than nonlocals, suggesting that localness is an affirmative status requiring active demonstration rather than merely the absence of nonlocal traits. Additionally, we identify conditions under which artificial agents are perceived as local and analyze participants' sensemaking strategies in evaluating localness. Through predictive modeling, we determine key factors that drive accurate localness judgments. By bridging theoretical perspectives on human-place relationships with practical challenges in digital environments, our work informs the design of location-based services that foster meaningful local engagement. Our findings contribute to a broader understanding of localness as a dynamic and relational construct, reinforcing the importance of dwelling as a process of belonging, recognition, and engagement with place.

Authors:Brandon S. Byers, Eleftherios Triantafyllidis, Thibaut Menny, Martin Schulte, Catherine De Wolf
Title: Assessing the User Experience of Extended Reality Devices for (Dis)Assembly: A Classroom Study
Abstract:
Despite the current rise and promising capabilities of Extended Reality (XR) technologies, the architecture, engineering, and construction industry lacks informed guidance when choosing between these technologies, especially for complex processes like assembly and disassembly tasks. This research compares the user experience across different XR devices for (dis)assembly utilizing the NASA Task Load Index and System Usability Scale metrics. Through a workshop and surveys with graduate civil engineering and architecture students, the study found that Augmented Reality scored highest in usability, followed closely by Mixed Reality. However, Mixed Reality showed the best task load index score, indicating low cognitive demand. The findings presented in this research may aid academics and practitioners in making informed decisions when selecting XR systems in practical, real-world assembly scenarios. Moreover, this study suggests opportunities and guidelines for more detailed XR system comparisons and exploration of XR's further role in circular construction practices.

Authors:Senhao Yang, Qiwen Cheng, Ruiqi Ma, Liangzhe Zhao, Zhenying Wu, Guangqiang Yu
Title: The Wisdom of Agent Crowds: A Human-AI Interaction Innovation Ignition Framework
Abstract:
With the widespread application of large AI models in various fields, the automation level of multi-agent systems has been continuously improved. However, in high-risk decision-making scenarios such as healthcare and finance, human participation and the alignment of intelligent systems with human intentions remain crucial. This paper focuses on the financial scenario and constructs a multi-agent brainstorming framework based on the BDI theory. A human-computer collaborative multi-agent financial analysis process is built using Streamlit. The system plans tasks according to user intentions, reduces users' cognitive load through real-time updated structured text summaries and the interactive Cothinker module, and reasonably integrates general and reasoning large models to enhance the ability to handle complex problems. By designing a quantitative analysis algorithm for the sentiment tendency of interview content based on LLMs and a method for evaluating the diversity of ideas generated by LLMs in brainstorming based on k-means clustering and information entropy, the system is comprehensively evaluated. The results of human factors testing show that the system performs well in terms of usability and user experience. Although there is still room for improvement, it can effectively support users in completing complex financial tasks. The research shows that the system significantly improves the efficiency of human-computer interaction and the quality of decision-making in financial decision-making scenarios, providing a new direction for the development of related fields.

Authors:Dima Alattal, Asal Khoshravan Azar, Puja Myles, Richard Branson, Hatim Abdulhussein, Allan Tucker
Title: Integrating Explainable AI in Medical Devices: Technical, Clinical and Regulatory Insights and Recommendations
Abstract:
There is a growing demand for the use of Artificial Intelligence (AI) and Machine Learning (ML) in healthcare, particularly as clinical decision support systems to assist medical professionals. However, the complexity of many of these models, often referred to as black box models, raises concerns about their safe integration into clinical settings as it is difficult to understand how they arrived at their predictions. This paper discusses insights and recommendations derived from an expert working group convened by the UK Medicine and Healthcare products Regulatory Agency (MHRA). The group consisted of healthcare professionals, regulators, and data scientists, with a primary focus on evaluating the outputs from different AI algorithms in clinical decision-making contexts. Additionally, the group evaluated findings from a pilot study investigating clinicians' behaviour and interaction with AI methods during clinical diagnosis. Incorporating AI methods is crucial for ensuring the safety and trustworthiness of medical AI devices in clinical settings. Adequate training for stakeholders is essential to address potential issues, and further insights and recommendations for safely adopting AI systems in healthcare settings are provided.

Authors:Nisan Chhetri, Arpan Sainju
Title: PromptIQ: Who Cares About Prompts? Let System Handle It -- A Component-Aware Framework for T2I Generation
Abstract:
Generating high-quality images without prompt engineering expertise remains a challenge for text-to-image (T2I) models, which often misinterpret poorly structured prompts, leading to distortions and misalignments. While humans easily recognize these flaws, metrics like CLIP fail to capture structural inconsistencies, exposing a key limitation in current evaluation methods. To address this, we introduce PromptIQ, an automated framework that refines prompts and assesses image quality using our novel Component-Aware Similarity (CAS) metric, which detects and penalizes structural errors. Unlike conventional methods, PromptIQ iteratively generates and evaluates images until the user is satisfied, eliminating trial-and-error prompt tuning. Our results show that PromptIQ significantly improves generation quality and evaluation accuracy, making T2I models more accessible for users with little to no prompt engineering expertise.

Authors:Alexiy Buynitsky, Sina Ehsani, Bhanu Pallakonda, Pragyana Mishra
Title: Camera Control at the Edge with Language Models for Scene Understanding
Abstract:
In this paper, we present Optimized Prompt-based Unified System (OPUS), a framework that utilizes a Large Language Model (LLM) to control Pan-Tilt-Zoom (PTZ) cameras, providing contextual understanding of natural environments. To achieve this goal, the OPUS system improves cost-effectiveness by generating keywords from a high-level camera control API and transferring knowledge from larger closed-source language models to smaller ones through Supervised Fine-Tuning (SFT) on synthetic data. This enables efficient edge deployment while maintaining performance comparable to larger models like GPT-4. OPUS enhances environmental awareness by converting data from multiple cameras into textual descriptions for language models, eliminating the need for specialized sensory tokens. In benchmark testing, our approach significantly outperformed both traditional language model techniques and more complex prompting methods, achieving a 35% improvement over advanced techniques and a 20% higher task accuracy compared to closed-source models like Gemini Pro. The system demonstrates OPUS's capability to simplify PTZ camera operations through an intuitive natural language interface. This approach eliminates the need for explicit programming and provides a conversational method for interacting with camera systems, representing a significant advancement in how users can control and utilize PTZ camera technology.

Authors:Johndayll Lewis Arizala, Joshua Permito, Steven Errol Escopete, John Kovie Niño, Jordan Aiko Deja
Title: Designing RoutScape: Geospatial Prototyping with XR for Flood Evacuation Planning
Abstract:
Flood response planning in local communities is often hindered by fragmented communication across Disaster Risk Reduction and Management (DRRM) councils. In this work, we explore how extended reality (XR) can support more effective planning through narrative-driven design. We present Routscape, an XR prototype for visualizing flood scenarios and evacuation routes, developed through iterative prototyping and user-centered design with DRRM officers. By grounding the system in real-world experiences and localized narratives, we highlight how XR can aid in fostering shared understanding and spatial sensemaking in disaster preparedness efforts.

Authors:Alba María Mármol-Romero, Manuel García-Vega, Miguel Ángel García-Cumbreras, Arturo Montejo-Ráez
Title: An empathic GPT-based chatbot to talk about mental disorders with Spanish teenagers
Abstract:
This paper presents a chatbot-based system to engage young Spanish people in the awareness of certain mental disorders through a self-disclosure technique. The study was carried out in a population of teenagers aged between 12 and 18 years. The dialogue engine mixes closed and open conversations, so certain controlled messages are sent to focus the chat on a specific disorder, which will change over time. Once a set of trial questions is answered, the system can initiate the conversation on the disorder under the focus according to the user's sensibility to that disorder, in an attempt to establish a more empathetic communication. Then, an open conversation based on the GPT-3 language model is initiated, allowing the user to express themselves with more freedom. The results show that these systems are of interest to young people and could help them become aware of certain mental disorders.

Authors:Nikita Boguslavskii, Lorena Maria Genua, Zhi Li
Title: Human-Robot Collaboration for the Remote Control of Mobile Humanoid Robots with Torso-Arm Coordination
Abstract:
Recently, many humanoid robots have been increasingly deployed in various facilities, including hospitals and assisted living environments, where they are often remotely controlled by human operators. Their kinematic redundancy enhances reachability and manipulability, enabling them to navigate complex, cluttered environments and perform a wide range of tasks. However, this redundancy also presents significant control challenges, particularly in coordinating the movements of the robot's macro-micro structure (torso and arms). Therefore, we propose various human-robot collaborative (HRC) methods for coordinating the torso and arm of remotely controlled mobile humanoid robots, aiming to balance autonomy and human input to enhance system efficiency and task execution. The proposed methods include human-initiated approaches, where users manually control torso movements, and robot-initiated approaches, which autonomously coordinate torso and arm based on factors such as reachability, task goal, or inferred human intent. We conducted a user study with N=17 participants to compare the proposed approaches in terms of task performance, manipulability, and energy efficiency, and analyzed which methods were preferred by participants.

Authors:Ahdiyeh Alipour, Tilo Hartmann, Maryam Alimardani
Title: Would You Rely on an Eerie Agent? A Systematic Review of the Impact of the Uncanny Valley Effect on Trust in Human-Agent Interaction
Abstract:
Trust is a fundamental component of human-agent interaction. With the increasing presence of artificial agents in daily life, it is essential to understand how people perceive and trust these agents. One of the key challenges affecting this perception is the Uncanny Valley Effect (UVE), where increasingly human-like artificial beings can be perceived as eerie or repelling. Despite growing interest in trust and the UVE, existing research varies widely in terms of how these concepts are defined and operationalized. This inconsistency raises important questions about how and under what conditions the UVE influences trust in agents. A systematic understanding of their relationship is currently lacking. This review aims to examine the impact of the UVE on human trust in agents and to identify methodological patterns, limitations, and gaps in the existing empirical literature. Following PRISMA guidelines, a systematic search identified 53 empirical studies that investigated both UVE-related constructs and trust or trust-related outcomes. Studies were analyzed based on a structured set of categories, including types of agents and interactions, methodological and measurement approaches, and key findings. The results of our systematic review reveal that most studies rely on static images or hypothetical scenarios with limited real-time interaction, and the majority use subjective trust measures. This review offers a novel framework for classifying trust measurement approaches with regard to the best-practice criteria for empirically investigating the UVE. As the first systematic attempt to map the intersection of UVE and trust, this review contributes to a deeper understanding of their interplay and offers a foundation for future research. Keywords: the uncanny valley effect, trust, human-likeness, affinity response, human-agent interaction

Authors:Shaja Arul Selvamani, Nia D'Souza Ganapathy
Title: A Multi-Agent AI Framework for Immersive Audiobook Production through Spatial Audio and Neural Narration
Abstract:
This research introduces an innovative AI-driven multi-agent framework specifically designed for creating immersive audiobooks. Leveraging neural text-to-speech synthesis with FastSpeech 2 and VALL-E for expressive narration and character-specific voices, the framework employs advanced language models to automatically interpret textual narratives and generate realistic spatial audio effects. These sound effects are dynamically synchronized with the storyline through sophisticated temporal integration methods, including Dynamic Time Warping (DTW) and recurrent neural networks (RNNs). Diffusion-based generative models combined with higher-order ambisonics (HOA) and scattering delay networks (SDN) enable highly realistic 3D soundscapes, substantially enhancing listener immersion and narrative realism. This technology significantly advances audiobook applications, providing richer experiences for educational content, storytelling platforms, and accessibility solutions for visually impaired audiences. Future work will address personalization, ethical management of synthesized voices, and integration with multi-sensory platforms.

Authors:Myriam Metzulat, Barbara Metz, Aaron Edelmann, Alexandra Neukum, Wilfried Kunde
Title: Sick of being driven? -- Prevalence and modulating factors of carsickness in the European population in context of automated driving
Abstract:
As in automated driving the driver becomes a passenger, carsickness might reduce comfort for susceptible individuals. Insights in the prevalence of carsickness and its modulating factors are considered useful for the development of automated vehicles to mitigate or prevent its occurrence. An online survey was conducted with N = 3999 participants in Spain, Sweden, Poland, and Germany. 30% of participants reported to have already experienced carsickness as adult. The frequency of carsickness was modulated not only by demographic factors (country, gender, age), but also by frequency of being a passenger, type of non-driving related task, road type, and the seating position in car. Furthermore, the efficiency of applied countermeasures, temporal aspects of carsickness development, as well as the relation of carsickness with the acceptability of automated driving and the effect on subjective fitness to drive was investigated. The results are discussed with focus on automated driving.

Authors:Christopher Flöter, Sergej Geringer, Guido Reina, Daniel Weiskopf, Timo Ropinski
Title: Evaluating Foveated Frame Rate Reduction in Virtual Reality for Head-Mounted Displays
Abstract:
Foveated rendering methods usually reduce spatial resolution in the periphery of the users' view. However, using foveated rendering to reduce temporal resolution, i.e., rendering frame rate, seems less explored. In this work, we present the results of a user study investigating the perceptual effects of foveated temporal resolution reduction, where only the temporal resolution (frame rate) is reduced in the periphery without affecting spatial quality (pixel density). In particular, we investigated the perception of temporal resolution artifacts caused by reducing the frame rate dependent on the eccentricity of the user's gaze. Our user study with 15 participants was conducted in a virtual reality setting using a head-mounted display. Our results indicate that it was possible to reduce average rendering costs, i.e., the number of rendered pixels, to a large degree before participants consistently reported perceiving temporal artifacts.

Authors:Lucas Anastasiou, Anna De Liddo
Title: BCause: Human-AI collaboration to improve hybrid mapping and ideation in argumentation-grounded deliberation
Abstract:
Public deliberation, as in open discussion of issues of public concern, often suffers from scattered and shallow discourse, poor sensemaking, and a disconnect from actionable policy outcomes. This paper introduces BCause, a discussion system leveraging generative AI and human-machine collaboration to transform unstructured dialogue around public issues (such as urban living, policy changes, and current socio-economic transformations) into structured, actionable democratic processes. We present three innovations: (i) importing and transforming unstructured transcripts into argumentative discussions, (ii) geo-deliberated problem-sensing via a Telegram bot for local issue reporting, and (iii) smart reporting with customizable widgets (e.g., summaries, topic modelling, policy recommendations, clustered arguments). The system's human-AI partnership preserves critical human participation to ensure ethical oversight, contextual relevance, and creative synthesis.

Authors:Samuel Pantze, Jean-Yves Tinevez, Matthew McGinity, Ulrik Günther
Title: manvr3d: A Platform for Human-in-the-loop Cell Tracking in Virtual Reality
Abstract:
We propose manvr3d, a novel VR-ready platform for interactive human-in-the-loop cell tracking. We utilize VR controllers and eye-tracking hardware to facilitate rapid ground truth generation and proofreading for deep learning-based cell tracking models. Life scientists reconstruct the developmental history of organisms on the cellular level by analyzing 3D time-lapse microscopy images acquired at high spatio-temporal resolution. The reconstruction of such cell lineage trees traditionally involves tracking individual cells through all recorded time points, manually annotating their positions, and then linking them over time to create complete trajectories. Deep learning-based algorithms accelerate this process, yet depend heavily on manually-annotated high-quality ground truth data and curation. Visual representation of the image data in this process still relies primarily on 2D renderings, which greatly limits spatial understanding and navigation. In this work, we bridge the gap between deep learning-based cell tracking software and 3D/VR visualization to create a human-in-the-loop cell tracking system. We lift the incremental annotation, training and proofreading loop of the deep learning model into the 3rd dimension and apply natural user interfaces like hand gestures and eye tracking to accelerate the cell tracking workflow for life scientists.

Authors:Juan Ahmad, Jonas Hellgren, Alan Said
Title: Tell Me the Good Stuff: User Preferences in Movie Recommendation Explanations
Abstract:
Recommender systems play a vital role in helping users discover content in streaming services, but their effectiveness depends on users understanding why items are recommended. In this study, explanations were based solely on item features rather than personalized data, simulating recommendation scenarios. We compared user perceptions of one-sided (purely positive) and two-sided (positive and negative) feature-based explanations for popular movie recommendations. Through an online study with 129 participants, we examined how explanation style affected perceived trust, transparency, effectiveness, and satisfaction. One-sided explanations consistently received higher ratings across all dimensions. Our findings suggest that in low-stakes entertainment domains such as popular movie recommendations, simpler positive explanations may be more effective. However, the results should be interpreted with caution due to potential confounding factors such as item familiarity and the placement of negative information in explanations. This work provides practical insights for explanation design in recommender interfaces and highlights the importance of context in shaping user preferences.

Authors:George Xi Wang, Jingying Deng, Safinah Ali
Title: Evaluating the Impact of AI-Powered Audiovisual Personalization on Learner Emotion, Focus, and Learning Outcomes
Abstract:
Independent learners often struggle with sustaining focus and emotional regulation in unstructured or distracting settings. Although some rely on ambient aids such as music, ASMR, or visual backgrounds to support concentration, these tools are rarely integrated into cohesive, learner-centered systems. Moreover, existing educational technologies focus primarily on content adaptation and feedback, overlooking the emotional and sensory context in which learning takes place. Large language models have demonstrated powerful multimodal capabilities including the ability to generate and adapt text, audio, and visual content. Educational research has yet to fully explore their potential in creating personalized audiovisual learning environments. To address this gap, we introduce an AI-powered system that uses LLMs to generate personalized multisensory study environments. Users select or generate customized visual themes (e.g., abstract vs. realistic, static vs. animated) and auditory elements (e.g., white noise, ambient ASMR, familiar vs. novel sounds) to create immersive settings aimed at reducing distraction and enhancing emotional stability. Our primary research question investigates how combinations of personalized audiovisual elements affect learner cognitive load and engagement. Using a mixed-methods design that incorporates biometric measures and performance outcomes, this study evaluates the effectiveness of LLM-driven sensory personalization. The findings aim to advance emotionally responsive educational technologies and extend the application of multimodal LLMs into the sensory dimension of self-directed learning.

Authors:Sijia Liu, XiaoKe Zeng, Fengyihan Wu, Shu Ye, Bowen Liu, Sidney Cheung, Richard William Allen, Ray Lc
Title: "Salt is the Soul of Hakka Baked Chicken": Reimagining Traditional Chinese Culinary ICH for Modern Contexts Without Losing Tradition
Abstract:
Intangible Cultural Heritage (ICH) like traditional culinary practices face increasing pressure to adapt to globalization while maintaining their cultural authenticity. Centuries-old traditions in Chinese cuisine are subject to rapid changes for adaptation to contemporary tastes and dietary preferences. The preservation of these cultural practices requires approaches that can enable ICH practitioners to reimagine and recreate ICH for modern contexts. To address this, we created workshops where experienced practitioners of traditional Chinese cuisine co-created recipes using GenAI tools and realized the dishes. We found that GenAI inspired ICH practitioners to innovate recipes based on traditional workflows for broader audiences and adapt to modern dining contexts. However, GenAI-inspired co-creation posed challenges in maintaining the accuracy of original ICH workflows and preserving traditional flavors in the culinary outcomes. This study offers implications for designing human-AI collaborative processes for safeguarding and enhancing culinary ICH.

Authors:Micaela Siraj, Jon Duke, Thomas Plötz
Title: The GenAI Generation: Student Views of Awareness, Preparedness, and Concern
Abstract:
Generative Artificial Intelligence (GenAI) is revolutionizing education and workforce development, profoundly shaping how students learn, engage, and prepare for their future. Outpacing the development of uniform policies and structures, GenAI has heralded a unique era and given rise to the GenAI Generation. We define the GenAI Generation as a cohort of students whose education has been increasingly shaped by the opportunities and challenges GenAI presents during its widespread adoption within society. This study examines students' perceptions of GenAI through a concise survey with optional open-ended questions, focusing on their awareness, preparedness, and concerns. Notably, readiness appears increasingly tied to exposure to GenAI through one's coursework. Students with greater curricular exposure to GenAI tend to feel more prepared, while those without it more often express vulnerability and uncertainty, highlighting a new and growing divide in readiness that goes beyond traditional disciplinary boundaries. Evaluation of more than 250 responses, with over 40% providing detailed qualitative feedback, reveals a core dual sentiment: while most students express enthusiasm for GenAI, an even greater proportion voice a spectrum of concerns about ethics, job displacement, and the adequacy of educational structures given the highly transformative technology. These findings offer critical insights into how students view the potential and pitfalls of GenAI for future career impacts. The challenge ahead involves implementing associated recommendations for educational institutions, moving beyond the baseline of access toward more informed guidance on the use of these tools, while preserving critical thinking, ethical reasoning, and adaptive learning.

Authors:Karishma Hegde, Hemadri Jayalath
Title: Emotions in the Loop: A Survey of Affective Computing for Emotional Support
Abstract:
In a world where technology is increasingly embedded in our everyday experiences, systems that sense and respond to human emotions are elevating digital interaction. At the intersection of artificial intelligence and human-computer interaction, affective computing is emerging with innovative solutions where machines are humanized by enabling them to process and respond to user emotions. This survey paper explores recent research contributions in affective computing applications in the area of emotion recognition, sentiment analysis and personality assignment developed using approaches like large language models (LLMs), multimodal techniques, and personalized AI systems. We analyze the key contributions and innovative methodologies applied by the selected research papers by categorizing them into four domains: AI chatbot applications, multimodal input systems, mental health and therapy applications, and affective computing for safety applications. We then highlight the technological strengths as well as the research gaps and challenges related to these studies. Furthermore, the paper examines the datasets used in each study, highlighting how modality, scale, and diversity impact the development and performance of affective models. Finally, the survey outlines ethical considerations and proposes future directions to develop applications that are more safe, empathetic and practical.

Authors:Yaniv Dover, Shaul Oreg
Title: Tell me who its founders are and I'll tell you what your online community looks like: Online community founders' personality and community attributes
Abstract:
Online communities are an increasingly important stakeholder for firms, and despite the growing body of research on them, much remains to be learned about them and about the factors that determine their attributes and sustainability. Whereas most of the literature focuses on predictors such as community activity, network structure, and platform interface, there is little research about behavioral and psychological aspects of community members and leaders. In the present study we focus on the personality traits of community founders as predictors of community attributes and sustainability. We develop a tool to estimate community members' Big Five personality traits from their social media text and use it to estimate the traits of 35,164 founders in 8,625 Reddit communities. We find support for most of our predictions about the relationships between founder traits and community sustainability and attributes, including the level of engagement within the community, aspects of its social network structure, and whether the founders themselves remain active in it.

Authors:C. Vogler, A. Glasser, R. Kushalnagar, M. Seita, M. Arroyo Chavez, K. Delk, P. DeVries, M. Feanny, B. Thompson, J. Waller
Title: Barriers to Employment: The Deaf Multimedia Authoring Tax
Abstract:
This paper describes the challenges that deaf and hard of hearing people face with creating accessible multimedia content, such as portfolios, instructional videos and video presentations. Unlike content consumption, the process of content creation itself remains highly inaccessible, creating barriers to employment in all stages of recruiting, hiring, and carrying out assigned job duties. Overcoming these barriers incurs a "deaf content creation tax" that translates into requiring significant additional time and resources to produce content equivalent to what a non-disabled person would produce. We highlight this process and associated challenges through real-world examples experienced by the authors, and provide guidance and recommendations for addressing them.

Authors:Nayara de Oliveira Faria, Joseph L. Gabbard
Title: Inattentional Blindness with Augmented Reality HUDS: An On-road Study
Abstract:
As the integration of augmented reality (AR) technology in head-up displays (HUDs) becomes more prevalent in vehicles, it is crucial to understand how to design and evaluate AR interfaces to ensure safety. With new AR displays capable of rendering images with larger field of views and at varying depths, the visual and cognitive separation between graphical and real-world visual stimuli will be increasingly more difficult to quantify as will drivers' ability to efficiently allocate visual attention between the two sets of stimuli. In this study, we present a user study that serves as a crucial first step in gaining insight into inattentional blindness while using AR in surface transportation, where understanding is currently limited. Our primary goal is to investigate how the visual demand of AR tasks influences drivers' ability to detect stimuli, and whether the nature of the stimuli itself plays a role in this effect. To address these questions, we designed an on-road user study aimed at producing a more realistic and ecologically valid understanding of the phenomenon. Our results show that drivers' ability to timely detect stimuli in the environment decreased as the AR task visual demand increased demonstrated by both detection performance and inattentional blindness metrics. Further, inattentional blindness caused by AR displays appears to be more prevalent within drivers' central field of view. We conclude by discussing implications towards a safety-centric evaluation framework for AR HUDs.

Authors:Phanish Puranam, Prothit Sen, Maciej Workiewicz
Title: Can LLMs Help Improve Analogical Reasoning For Strategic Decisions? Experimental Evidence from Humans and GPT-4
Abstract:
This study investigates whether large language models, specifically GPT4, can match human capabilities in analogical reasoning within strategic decision making contexts. Using a novel experimental design involving source to target matching, we find that GPT4 achieves high recall by retrieving all plausible analogies but suffers from low precision, frequently applying incorrect analogies based on superficial similarities. In contrast, human participants exhibit high precision but low recall, selecting fewer analogies yet with stronger causal alignment. These findings advance theory by identifying matching, the evaluative phase of analogical reasoning, as a distinct step that requires accurate causal mapping beyond simple retrieval. While current LLMs are proficient in generating candidate analogies, humans maintain a comparative advantage in recognizing deep structural similarities across domains. Error analysis reveals that AI errors arise from surface level matching, whereas human errors stem from misinterpretations of causal structure. Taken together, the results suggest a productive division of labor in AI assisted organizational decision making where LLMs may serve as broad analogy generators, while humans act as critical evaluators, applying the most contextually appropriate analogies to strategic problems.

Authors:Seonghee Lee, Denae Ford, John Tang, Sasa Junuzovic, Asta Roseway, Ed Cutrell, Kori Inkpen
Title: IRL Dittos: Embodied Multimodal AI Agent Interactions in Open Spaces
Abstract:
We introduce the In Real Life (IRL) Ditto, an AI-driven embodied agent designed to represent remote colleagues in shared office spaces, creating opportunities for real-time exchanges even in their absence. IRL Ditto offers a unique hybrid experience by allowing in-person colleagues to encounter a digital version of their remote teammates, initiating greetings, updates, or small talk as they might in person. Our research question examines: How can the IRL Ditto influence interactions and relationships among colleagues in a shared office space? Through a four-day study, we assessed IRL Ditto's ability to strengthen social ties by simulating presence and enabling meaningful interactions across different levels of social familiarity. We find that enhancing social relationships depended deeply on the foundation of the relationship participants had with the source of the IRL Ditto. This study provides insights into the role of embodied agents in enriching workplace dynamics for distributed teams.

Authors:Prothit Sen, Sai Mihir Jakkaraju
Title: Modeling AI-Human Collaboration as a Multi-Agent Adaptation
Abstract:
We develop an agent-based simulation to formalize AI-human collaboration as a function of task structure, advancing a generalizable framework for strategic decision-making in organizations. Distinguishing between heuristic-based human adaptation and rule-based AI search, we model interactions across modular (parallel) and sequenced (interdependent) tasks using an NK model. Our results reveal that in modular tasks, AI often substitutes for humans - delivering higher payoffs unless human expertise is very high, and the AI search space is either narrowly focused or extremely broad. In sequenced tasks, interesting complementarities emerge. When an expert human initiates the search and AI subsequently refines it, aggregate performance is maximized. Conversely, when AI leads, excessive heuristic refinement by the human can reduce payoffs. We also show that even "hallucinatory" AI - lacking memory or structure - can improve outcomes when augmenting low-capability humans by helping escape local optima. These results yield a robust implication: the effectiveness of AI-human collaboration depends less on context or industry, and more on the underlying task structure. By elevating task decomposition as the central unit of analysis, our model provides a transferable lens for strategic decision-making involving humans and an agentic AI across diverse organizational settings.

Authors:Joshua Hatherley, Lauritz Munch, Jens Christian Bjerring
Title: In defence of post-hoc explanations in medical AI
Abstract:
Since the early days of the Explainable AI movement, post-hoc explanations have been praised for their potential to improve user understanding, promote trust, and reduce patient safety risks in black box medical AI systems. Recently, however, critics have argued that the benefits of post-hoc explanations are greatly exaggerated since they merely approximate, rather than replicate, the actual reasoning processes that black box systems take to arrive at their outputs. In this article, we aim to defend the value of post-hoc explanations against this recent critique. We argue that even if post-hoc explanations do not replicate the exact reasoning processes of black box systems, they can still improve users' functional understanding of black box systems, increase the accuracy of clinician-AI teams, and assist clinicians in justifying their AI-informed decisions. While post-hoc explanations are not a "silver bullet" solution to the black box problem in medical AI, we conclude that they remain a useful strategy for addressing the black box problem in medical AI.

Authors:Yuchen Wang, Pengfei Jia, Zhitao Shu, Keyan Liu, Abdul Rashid Mohamed Shariff
Title: Multidimensional precipitation index prediction based on CNN-LSTM hybrid framework
Abstract:
With the intensification of global climate change, accurate prediction of weather indicators is of great significance in disaster prevention and mitigation, agricultural production, and transportation. Precipitation, as one of the key meteorological indicators, plays a crucial role in water resource management, agricultural production, and urban flood control. This study proposes a multidimensional precipitation index prediction model based on a CNN- LSTM hybrid framework, aiming to improve the accuracy of precipitation forecasts. The dataset is sourced from Pune, Maharashtra, India, covering monthly mean precipitation data from 1972 to 2002. This dataset includes nearly 31 years (1972-2002) of monthly average precipitation, reflecting the long-term fluctuations and seasonal variations of precipitation in the region. By analyzing these time series data, the CNN-LSTM model effectively captures local features and long-term dependencies. Experimental results show that the model achieves a root mean square error (RMSE) of 6.752, which demonstrates a significant advantage over traditional time series prediction methods in terms of prediction accuracy and generalization ability. Furthermore, this study provides new research ideas for precipitation prediction. However, the model requires high computational resources when dealing with large-scale datasets, and its predictive ability for multidimensional precipitation data still needs improvement. Future research could extend the model to support and predict multidimensional precipitation data, thereby promoting the development of more accurate and efficient meteorological prediction technologies.

Authors:Claire Li, David Peter Wallis Freeborn
Title: Exploring AI-powered Digital Innovations from A Transnational Governance Perspective: Implications for Market Acceptance and Digital Accountability Accountability
Abstract:
This study explores the application of the Technology Acceptance Model (TAM) to AI-powered digital innovations within a transnational governance framework. By integrating Latourian actor-network theory (ANT), this study examines how institutional motivations, regulatory compliance, and ethical and cultural acceptance drive organisations to develop and adopt AI innovations, enhancing their market acceptance and transnational accountability. We extend the TAM framework by incorporating regulatory, ethical, and socio-technical considerations as key social pressures shaping AI adoption. Recognizing that AI is embedded within complex actor-networks, we argue that accountability is co-constructed among organisations, regulators, and societal actors rather than being confined to individual developers or adopters. To address these challenges, we propose two key solutions: (1) internal resource reconfiguration, where organisations restructure their governance and compliance mechanisms to align with global standards; and (2) reshaping organisational boundaries through actor-network management, fostering engagement with external stakeholders, regulatory bodies, and transnational governance institutions. These approaches allow organisations to enhance AI accountability, foster ethical and regulatory alignment, and improve market acceptance on a global scale.

Authors:Linshi Li, Hanlin Cai
Title: Applying LLM-Powered Virtual Humans to Child Interviews in Child-Centered Design
Abstract:
In child-centered design, directly engaging children is crucial for deeply understanding their experiences. However, current research often prioritizes adult perspectives, as interviewing children involves unique challenges such as environmental sensitivities and the need for trust-building. AI-powered virtual humans (VHs) offer a promising approach to facilitate engaging and multimodal interactions with children. This study establishes key design guidelines for LLM-powered virtual humans tailored to child interviews, standardizing multimodal elements including color schemes, voice characteristics, facial features, expressions, head movements, and gestures. Using ChatGPT-based prompt engineering, we developed three distinct Human-AI workflows (LLM-Auto, LLM-Interview, and LLM-Analyze) and conducted a user study involving 15 children aged 6 to 12. The results indicated that the LLM-Analyze workflow outperformed the others by eliciting longer responses, achieving higher user experience ratings, and promoting more effective child engagement.

Authors:Katherine Lin, Juna Kawai-Yue, Adira Sklar, Lucy Hecht, Sarah Sterman, Tiffany Tseng
Title: Crafting a Personal Journaling Practice: Negotiating Ecosystems of Materials, Personal Context, and Community in Analog Journaling
Abstract:
Analog journaling has grown in popularity, with journaling on paper encompassing a range of motivations, styles, and practices including planning, habit-tracking, and reflecting. Journalers develop strong personal preferences around the tools they use, the ideas they capture, and the layout in which they represent their ideas and memories. Understanding how analog journaling practices are individually shaped and crafted over time is critical to supporting the varied benefits associated with journaling, including improved mental health and positive support for identity development. To understand this development, we qualitatively analyzed publicly-shared journaling content from YouTube and Instagram and interviewed 11 journalers. We report on our identification of the journaling ecosystem in which journaling practices are shaped by materials, personal context, and communities, sharing how this ecosystem plays a role in the practices and identities of journalers as they customize their journaling routine to best suit their personal goals. Using these insights, we discuss design opportunities for how future tools can better align with and reflect the rich affordances and practices of journaling on paper.

Authors:Arata Jingu, Easa AliAbbasi, Paul Strohmeier, Jürgen Steimle
Title: Scene2Hap: Combining LLMs and Physical Modeling for Automatically Generating Vibrotactile Signals for Full VR Scenes
Abstract:
Haptic feedback contributes to immersive virtual reality (VR) experiences. Designing such feedback at scale, for all objects within a VR scene and their respective arrangements, remains a time-consuming task. We present Scene2Hap, an LLM-centered system that automatically designs object-level vibrotactile feedback for entire VR scenes based on the objects' semantic attributes and physical context. Scene2Hap employs a multimodal large language model to estimate the semantics and physical context of each object, including its material properties and vibration behavior, from the multimodal information present in the VR scene. This semantic and physical context is then used to create plausible vibrotactile signals by generating or retrieving audio signals and converting them to vibrotactile signals. For the more realistic spatial rendering of haptics in VR, Scene2Hap estimates the propagation and attenuation of vibration signals from their source across objects in the scene, considering the estimated material properties and physical context, such as the distance and contact between virtual objects. Results from two user studies confirm that Scene2Hap successfully estimates the semantics and physical context of VR scenes, and the physical modeling of vibration propagation improves usability, perceived materiality, and spatial awareness.

Authors:Kristen Sussman, Daniel Carter
Title: Detecting Effects of AI-Mediated Communication on Language Complexity and Sentiment
Abstract:
Given the subtle human-like effects of large language models on linguistic patterns, this study examines shifts in language over time to detect the impact of AI-mediated communication (AI- MC) on social media. We compare a replicated dataset of 970,919 tweets from 2020 (pre-ChatGPT) with 20,000 tweets from the same period in 2024, all of which mention Donald Trump during election periods. Using a combination of Flesch-Kincaid readability and polarity scores, we analyze changes in text complexity and sentiment. Our findings reveal a significant increase in mean sentiment polarity (0.12 vs. 0.04) and a shift from predominantly neutral content (54.8% in 2020 to 39.8% in 2024) to more positive expressions (28.6% to 45.9%). These findings suggest not only an increasing presence of AI in social media communication but also its impact on language and emotional expression patterns.

Authors:Mahya Khazaei, Ali Bahrani, George Tzanetakis
Title: A Real-Time Gesture-Based Control Framework
Abstract:
We introduce a real-time, human-in-the-loop gesture control framework that can dynamically adapt audio and music based on human movement by analyzing live video input. By creating a responsive connection between visual and auditory stimuli, this system enables dancers and performers to not only respond to music but also influence it through their movements. Designed for live performances, interactive installations, and personal use, it offers an immersive experience where users can shape the music in real time. The framework integrates computer vision and machine learning techniques to track and interpret motion, allowing users to manipulate audio elements such as tempo, pitch, effects, and playback sequence. With ongoing training, it achieves user-independent functionality, requiring as few as 50 to 80 samples to label simple gestures. This framework combines gesture training, cue mapping, and audio manipulation to create a dynamic, interactive experience. Gestures are interpreted as input signals, mapped to sound control commands, and used to naturally adjust music elements, showcasing the seamless interplay between human interaction and machine response.

Authors:Gaojian Huang, Yantong Jin, Wei-Hsiang Lo
Title: Beyond Levels of Driving Automation: A Triadic Framework of Human-AI Collaboration in On-Road Mobility
Abstract:
The goal of the current study is to introduce a triadic human-AI collaboration framework for the automated vehicle domain. Previous classifications (e.g., SAE Levels of Automation) focus on defining automation levels based on who controls the vehicle. However, it remains unclear how human users and AI should collaborate in real-time, especially in dynamic driving contexts, where roles can shift frequently. To fill the gap, this study proposes a triadic human-AI collaboration framework with three AI roles (i.e., Advisor, Co-Pilot, and Guardian) that dynamically adapt to human needs. Overall, the study lays a foundation for developing adaptive, role-based human-AI collaboration strategies in automated vehicles.

Authors:Chengzhi Zhang, Brian Magerko
Title: Generative AI Literacy: A Comprehensive Framework for Literacy and Responsible Use
Abstract:
After the release of several AI literacy guidelines, the rapid rise and widespread adoption of generative AI, such as ChatGPT, Dall E, and Deepseek, have transformed our lives. Unlike traditional AI algorithms (e.g., convolutional neural networks, semantic networks, classifiers) captured in existing AI literacy frameworks, generative AI exhibits distinct and more nuanced characteristics. However, a lack of robust generative AI literacy is hindering individuals ability to evaluate critically and use these models effectively and responsibly. To address this gap, we propose a set of guidelines with 12 items for generative AI literacy, organized into four key aspects: (1) Guidelines for Generative AI Tool Selection and Prompting, (2) Guidelines for Understanding Interaction with Generative AI, (3) Guidelines for Understanding Interaction with Generative AI, and (4) Guidelines for High Level Understanding of Generative AI. These guidelines aim to support schools, companies, educators, and organizations in developing frameworks that empower their members, such as students, employees, and stakeholders, to use generative AI in an efficient, ethical, and informed way.

Authors:Saramsh Gautam, Mahmood Jasim
Title: LINC: Supporting Language Independent Communication and Comprehension to Enhance Contribution in Multilingual Collaborative Meetings
Abstract:
Collaborative research often includes contributors with varied perspectives from diverse linguistic backgrounds. However, English as a Second Language (ESL) researchers often struggle to communicate during meetings in English and comprehend discussions, leading to limited contribution. To investigate these challenges, we surveyed 64 ESL researchers who frequently collaborate in multilingual teams and identified four key design goals around participation, comprehension, documentation, and feedback. Guided by these design goals, we developed LINC, a multimodal Language INdependent Collaboration system with two components: a real-time module for multilingual communication during meetings and a post-meeting dashboard for discussion analysis. We evaluated the system through a two-phased study with six triads of multilingual teams. We found that using LINC, participants benefited from communicating in their preferred language, recalled and reviewed actionable insights, and prepared for upcoming meetings effectively. We discuss external factors that impact multilingual meeting participation beyond language preferences and the implications of multimodal systems in facilitating meetings in hybrid multilingual collaborative settings beyond research.

Authors:Jiaying Fu, Jialin Gu, Tianyue Gong, Tiange Zhou
Title: Can Code Outlove Blood? An LLM-based VR Experience to Prompt Reflection on Parental Verbal Abuse
Abstract:
Parental verbal abuse leaves lasting emotional impacts, yet current therapeutic approaches often lack immersive self-reflection opportunities. To address this, we developed a VR experience powered by LLMs to foster reflection on parental verbal abuse. Participants with relevant experiences engage in a dual-phase VR experience: first assuming the role of a verbally abusive parent, interacting with an LLM portraying a child, then observing the LLM reframing abusive dialogue into warm, supportive expressions as a nurturing parent. A qualitative study with 12 participants showed that the experience encourages reflection on their past experiences and fosters supportive emotions. However, these effects vary with participants' personal histories, emphasizing the need for greater personalization in AI-driven emotional support. This study explores the use of LLMs in immersive environment to promote emotional reflection, offering insights into the design of AI-driven emotional support systems.

Authors:Steven Häsler, Philipp Ackermann
Title: Spatial Reasoner: A 3D Inference Pipeline for XR Applications
Abstract:
Modern extended reality XR systems provide rich analysis of image data and fusion of sensor input and demand AR/VR applications that can reason about 3D scenes in a semantic manner. We present a spatial reasoning framework that bridges geometric facts with symbolic predicates and relations to handle key tasks such as determining how 3D objects are arranged among each other ('on', 'behind', 'near', etc.). Its foundation relies on oriented 3D bounding box representations, enhanced by a comprehensive set of spatial predicates, ranging from topology and connectivity to directionality and orientation, expressed in a formalism related to natural language. The derived predicates form a spatial knowledge graph and, in combination with a pipeline-based inference model, enable spatial queries and dynamic rule evaluation. Implementations for client- and server-side processing demonstrate the framework's capability to efficiently translate geometric data into actionable knowledge, ensuring scalable and technology-independent spatial reasoning in complex 3D environments. The Spatial Reasoner framework is fostering the creation of spatial ontologies, and seamlessly integrates with and therefore enriches machine learning, natural language processing, and rule systems in XR applications.

Authors:Dennis Wüppelman, Enes Yigitbas
Title: SecCityVR: Visualization and Collaborative Exploration of Software Vulnerabilities in Virtual Reality
Abstract:
Security vulnerabilities in software systems represent significant risks as potential entry points for malicious attacks. Traditional dashboards that display the results of static analysis security testing often use 2D or 3D visualizations, which tend to lack the spatial details required to effectively reveal issues such as the propagation of vulnerabilities across the codebase or the appearance of concurrent vulnerabilities. Additionally, most reporting solutions only treat the analysis results as an artifact that can be reviewed or edited asynchronously by developers, limiting real-time, collaborative exploration. To the best of our knowledge, no VR-based approach exists for the visualization and interactive exploration of software security vulnerabilities. Addressing these challenges, the virtual reality (VR) environment SecCityVR was developed as a proof-of-concept implementation that employs the code city metaphor within VR to visualize software security vulnerabilities as colored building floors inside the surrounding virtual city. By integrating the application's call graph, vulnerabilities are contextualized within related software components. SecCityVR supports multi-user collaboration and interactive exploration. It provides explanations and mitigations for detected issues. A user study comparing SecCityVR with the traditional dashboard find-sec-bugs showed the VR approach provided a favorable experience, with higher usability, lower temporal demand, and significantly lower frustration despite having longer task completion times. This paper and its results contribute to the fields of collaborative and secure software engineering, as well as software visualization. It provides a new application of VR code cities to visualize security vulnerabilities, as well as a novel environment for security audits using collaborative and immersive technologies.

Authors:Omid Veisi, Sasan Bahrami, Roman Englert, Claudia Müller
Title: AI Ethics and Social Norms: Exploring ChatGPT's Capabilities From What to How
Abstract:
Using LLMs in healthcare, Computer-Supported Cooperative Work, and Social Computing requires the examination of ethical and social norms to ensure safe incorporation into human life. We conducted a mixed-method study, including an online survey with 111 participants and an interview study with 38 experts, to investigate the AI ethics and social norms in ChatGPT as everyday life tools. This study aims to evaluate whether ChatGPT in an empirical context operates following ethics and social norms, which is critical for understanding actions in industrial and academic research and achieving machine ethics. The findings of this study provide initial insights into six important aspects of AI ethics, including bias, trustworthiness, security, toxicology, social norms, and ethical data. Significant obstacles related to transparency and bias in unsupervised data collection methods are identified as ChatGPT's ethical concerns.

Authors:Chang Xiao, Brenda Yang
Title: Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM Serving
Abstract:
Generative conversational interfaces powered by large language models (LLMs) typically stream output token-by-token at a rate determined by computational budget, often neglecting actual human reading speeds and the cognitive load associated with the content. This mismatch frequently leads to inefficient use of computational resources. For example, in cloud-based services, streaming content faster than users can read appears unnecessary, resulting in wasted computational resources and potential delays for other users, particularly during peak usage periods. To address this issue, we propose an adaptive streaming method that dynamically adjusts the pacing of LLM streaming output in real-time based on inferred cognitive load. Our approach estimates the cognitive load associated with streaming content and strategically slows down the stream during complex or information-rich segments, thereby freeing computational resources for other users. We conducted a statistical analysis and simulation based on a statistical model derived from data collected in a crowdsourced user study across various types of LLM-generated content. Our results show that this adaptive method can effectively reduce computational consumption while largely maintaining streaming speed above user's normal reading speed.

Authors:Pratyay Suvarnapathaki, Viral Shah, Saarthak Negi, Nimmi Rangaswamy
Title: 'The Boring and the Tedious': Invisible Labour in India's Gig-Economy
Abstract:
India's gig-based food delivery platforms, such as Swiggy and Zomato, provide crucial income to marginalised communities but also entrench workers in cycles of invisible labour. Through 14 semi-structured interviews, we analyse waiting time and repetitive UI itneractions as key burdens that contribute to 'digital discomfort' for gig based food delivery agents. We find that workers employ creative strategies to navigate algorithmic management, yet remain constrained by platform-side 'gamification' and system opacity. We propose worker-centered GUI automation as a potential intervention to reduce friction while preserving agency. In conclusion, this position paper argues for rethinking HCI approaches in the Global South to prioritise worker autonomy over efficiency-driven design optimisations.

Authors:Michelle L. Ding, Harini Suresh
Title: The Malicious Technical Ecosystem: Exposing Limitations in Technical Governance of AI-Generated Non-Consensual Intimate Images of Adults
Abstract:
In this paper, we adopt a survivor-centered approach to locate and dissect the role of sociotechnical AI governance in preventing AI-Generated Non-Consensual Intimate Images (AIG-NCII) of adults, colloquially known as "deep fake pornography." We identify a "malicious technical ecosystem" or "MTE," comprising of open-source face-swapping models and nearly 200 "nudifying" software programs that allow non-technical users to create AIG-NCII within minutes. Then, using the National Institute of Standards and Technology (NIST) AI 100-4 report as a reflection of current synthetic content governance methods, we show how the current landscape of practices fails to effectively regulate the MTE for adult AIG-NCII, as well as flawed assumptions explaining these gaps.

Authors:Vesna Nowack, Dalal Alrajeh, Carolina Gutierrez Muñoz, Katie Thomas, William Hobson, Patrick Benjamin, Catherine Hamilton-Giachritsis, Tim Grant, Juliane A. Kloess, Jessica Woodhams
Title: Towards User-Centred Design of AI-Assisted Decision-Making in Law Enforcement
Abstract:
Artificial Intelligence (AI) has become an important part of our everyday lives, yet user requirements for designing AI-assisted systems in law enforcement remain unclear. To address this gap, we conducted qualitative research on decision-making within a law enforcement agency. Our study aimed to identify limitations of existing practices, explore user requirements and understand the responsibilities that humans expect to undertake in these systems. Participants in our study highlighted the need for a system capable of processing and analysing large volumes of data efficiently to help in crime detection and prevention. Additionally, the system should satisfy requirements for scalability, accuracy, justification, trustworthiness and adaptability to be adopted in this domain. Participants also emphasised the importance of having end users review the input data that might be challenging for AI to interpret, and validate the generated output to ensure the system's accuracy. To keep up with the evolving nature of the law enforcement domain, end users need to help the system adapt to the changes in criminal behaviour and government guidance, and technical experts need to regularly oversee and monitor the system. Furthermore, user-friendly human interaction with the system is essential for its adoption and some of the participants confirmed they would be happy to be in the loop and provide necessary feedback that the system can learn from. Finally, we argue that it is very unlikely that the system will ever achieve full automation due to the dynamic and complex nature of the law enforcement domain.

Authors:Naimul Hoque, Nicole Sultanum
Title: DashGuide: Authoring Interactive Dashboard Tours for Guiding Dashboard Users
Abstract:
Dashboard guidance helps dashboard users better navigate interactive features, understand the underlying data, and assess insights they can potentially extract from dashboards. However, authoring dashboard guidance is a time consuming task, and embedding guidance into dashboards for effective delivery is difficult to realize. In this work, we contribute DashGuide, a framework and system to support the creation of interactive dashboard guidance with minimal authoring input. Given a dashboard and a communication goal, DashGuide captures a sequence of author-performed interactions to generate guidance materials delivered as playable step-by-step overlays, a.k.a., dashboard tours. Authors can further edit and refine individual tour steps while receiving generative assistance. We also contribute findings from a formative assessment with 9 dashboard creators, which helped inform the design of DashGuide; and findings from an evaluation of DashGuide with 12 dashboard creators, suggesting it provides an improved authoring experience that balances efficiency, expressiveness, and creative freedom.

Authors:Crystal Yang, Paul Taele
Title: AI for Accessible Education: Personalized Audio-Based Learning for Blind Students
Abstract:
Blind and visually impaired (BVI) students face significant challenges in traditional educational settings. While screen readers and braille materials offer some accessibility, they often lack interactivity and real-time adaptability to individual learning needs. This paper presents Audemy, an AI-powered audio-based learning platform designed to provide personalized, accessible, and engaging educational experiences for BVI students. Audemy uses adaptive learning techniques to customize content based on student accuracy, pacing preferences, and engagement patterns. The platform has been iteratively developed with input from over 20 educators specializing in accessibility and currently serves over 2,000 BVI students. Educator insights show key considerations for accessible AI, including the importance of engagement, intuitive design, compatibility with existing assistive technologies, and the role of positive reinforcement in maintaining student motivation. Beyond accessibility, this paper explores the ethical implications of AI in education, emphasizing data privacy, security, and transparency. Audemy demonstrates how AI can empower BVI students with personalized and equitable learning opportunities, advancing the broader goal of inclusive education.

Authors:Daniel Kronovet, Seth Frey, Joseph DeSimone
Title: Cybernetic Governance in a Coliving House
Abstract:
We report an 18-month field experiment in distributed digital institutions: a nine-bedroom Los Angeles coliving house that runs without managers, while sustaining 98% occupancy and below-market rents. Drawing on Elinor Ostrom's commons theory, we outline design principles and three digital mechanisms that form the institutional core: 1) A continuous-auction chore scheduler turns regenerative labor into a time-indexed points market; residents meet a 100-point monthly obligation by claiming tasks whose value rises linearly with neglect. 2) A pairwise-preference layer lets participants asynchronously reprioritize tasks, translating meta-governance into low-cognition spot inputs. 3) A symbolic "hearts" ledger tracks norm compliance through automated enforcement, lightweight challenges, and peer-awarded karma. Together, these mechanisms operationalize cybernetic principles--human sensing, machine bookkeeping, real-time feedback--while minimizing dependence on privileged roles. Our exploratory data (567 chore claims, 255 heart events, and 551 group purchases) show that such tooling can sustain reliable commons governance without continuous leadership, offering a transferable design palette for online communities, coliving houses, and other digitally mediated collectives.

Authors:Ayushi Agrawal, Aditya Kondai, Kavita Vemuri
Title: Psychological Effect of AI driven marketing tools for beauty/facial feature enhancement
Abstract:
AI-powered facial assessment tools are reshaping how individuals evaluate appearance and internalize social judgments. This study examines the psychological impact of such tools on self-objectification, self-esteem, and emotional responses, with attention to gender differences. Two samples used distinct versions of a facial analysis tool: one overtly critical (N=75; M=22.9 years), and another more neutral (N=51; M=19.9 years). Participants completed validated self-objectification and self-esteem scales and custom items measuring emotion, digital/physical appearance enhancement (DAE, PAEE), and perceived social emotion (PSE). Results revealed consistent links between high self-objectification, low self-esteem, and increased appearance enhancement behaviors across both versions. Despite softer framing, the newer tool still evoked negative emotional responses (U=1466.5, p=0.013), indicating implicit feedback may reinforce appearance-related insecurities. Gender differences emerged in DAE (p=0.025) and PSE (p<0.001), with females more prone to digital enhancement and less likely to perceive emotional impact in others. These findings reveal how AI tools may unintentionally reinforce and amplify existing social biases and underscore the critical need for responsible AI design and development. Future research will investigate how human ideologies embedded in the training data of such tools shape their evaluative outputs, and how these, in turn, influence user attitudes and decisions.

Authors:Prashant Chandrasekar, Mariel Couvillion, Ayshwarya Saktheeswaran, Jessica Zeitz
Title: LLM impact on BLV programming
Abstract:
Large Language Models (LLMs) are rapidly becoming integral to a wide range of tools, tasks, and problem-solving processes, especially in software development. Originally designed for natural language processing tasks such as text generation, LLMs are increasingly being used to assist both professionals and students in writing code. This growing reliance on LLM-based tools is reshaping programming workflows and task execution. In this study, we explore the impact of these technologies on blind and low-vision (BLV) developers. Our review of existing literature indicates that while LLMs help mitigate some of the challenges faced by BLV programmers, they also introduce new forms of inaccessibility. We conducted an evaluation of five popular LLM-powered integrated development environments (IDEs), assessing their performance across a comprehensive set of programming tasks. Our findings highlight several unsupported scenarios, instances of incorrect model output, and notable limitations in interaction support for specific tasks. Through observing BLV developers as they engaged in coding activities, we uncovered key interaction barriers that go beyond model accuracy or code generation quality. This paper outlines the challenges and corresponding opportunities for improving accessibility in the context of generative AI-assisted programming. Addressing these issues can meaningfully enhance the programming experience for BLV developers. As the generative AI revolution continues to unfold, it must also address the unique burdens faced by this community.

Authors:Xuyang Zhu, Sejoon Chang, Andrew Kuik
Title: Enhancing Critical Thinking with AI: A Tailored Warning System for RAG Models
Abstract:
Retrieval-Augmented Generation (RAG) systems offer a powerful approach to enhancing large language model (LLM) outputs by incorporating fact-checked, contextually relevant information. However, fairness and reliability concerns persist, as hallucinations can emerge at both the retrieval and generation stages, affecting users' reasoning and decision-making. Our research explores how tailored warning messages -- whose content depends on the specific context of hallucination -- shape user reasoning and actions in an educational quiz setting. Preliminary findings suggest that while warnings improve accuracy and awareness of high-level hallucinations, they may also introduce cognitive friction, leading to confusion and diminished trust in the system. By examining these interactions, this work contributes to the broader goal of AI-augmented reasoning: developing systems that actively support human reflection, critical thinking, and informed decision-making rather than passive information consumption.

Authors:Orland Hoeber, Md Nazmul Islam, Miriam Boon, Dale Storie, Veronica Ramshaw
Title: Search Timelines: Visualizing Search History to Enable Cross-Session Exploratory Search
Abstract:
Purpose: The timespan over which exploratory searching can occur, as well as the scope and volume of the search activities undertaken, can make it difficult for searchers to remember key details about their search activities. These difficulties are present both in the midst of searching as well as when resuming a search that spans multiple sessions. In this paper, we present a search interface design and prototype implementation to support cross-session exploratory search in a public digital library context. Methods: Search Timelines provides a visualization of current and past search activities via a dynamic timeline of the search activity (queries and saved resources). This timeline is presented at two levels of detail. An Overview Timeline is provided alongside the search results in a typical search engine results page design. A Detailed Timeline is provided in the workspace, where searchers can review the history of their search activities and their saved resources. A controlled laboratory study (n=32) was conducted to compare this approach to a baseline interface modelled after a typical public digital library search/workspace interface. Results: Participants who used Search Timelines reported higher levels of user engagement, usability, and perceived knowledge gain, during an initial search session and when resuming the search after a 7-8 day interval. This came at the expense of the searchers taking more time to complete the search task, which we view as positive evidence of engagement in cross-session exploratory search processes. Conclusion: Search Timelines serves as an example of how lightweight visualization approaches can be used to enhance typical search interface designs to support exploratory search. The results highlight the value of providing persistent representations of past search activities within the search interface.}

Authors:Joel Oksanen, Andrés Lucero, Perttu Hämäläinen
Title: LLMCode: Evaluating and Enhancing Researcher-AI Alignment in Qualitative Analysis
Abstract:
The use of large language models (LLMs) in qualitative analysis offers enhanced efficiency but raises questions about their alignment with the contextual nature of research for design (RfD). This research examines the trustworthiness of LLM-driven design insights, using qualitative coding as a case study to explore the interpretive processes central to RfD. We introduce LLMCode, an open-source tool integrating two metrics, namely Intersection over Union (IoU) and Modified Hausdorff Distance, to assess the alignment between human and LLM-generated insights. Across two studies involving 26 designers, we find that while the model performs well with deductive coding, its ability to emulate a designer's deeper interpretive lens over the data is limited, emphasising the importance of human-AI collaboration. Our results highlight a reciprocal dynamic where users refine LLM outputs and adapt their own perspectives based on the model's suggestions. These findings underscore the importance of fostering appropriate reliance on LLMs by designing tools that preserve interpretive depth while facilitating intuitive collaboration between designers and AI.

Authors:Wenyi Lu, Enock Kasaadah, S M Rakib Ul Karim, Matt Germonprez, Sean Goggins
Title: Open Source Software Lifecycle Classification: Developing Wrangling Techniques for Complex Sociotechnical Systems
Abstract:
Open source software is a rapidly evolving center for distributed work, and understanding the characteristics of this work across its different contexts is vital for informing policy, economics, and the design of enabling software. The steep increase in open source projects and corporate participation have transformed a peripheral, cottage industry component of the global technology ecosystem into a large, infinitely complex "technology parts supplier" wired into every corner of contemporary life. The lack of theory and tools for breaking this complexity down into identifiable project types or strategies for understanding them more systematically is incommensurate with current industry, society, and developer needs. This paper reviews previous attempts to classify open source software and other organizational ecosystems, using open source scientific software ecosystems in contrast with those found in corporatized open source software. It then examines the divergent and sometimes conflicting purposes that may exist for classifying open source projects and how these competing interests impede our progress in developing a comprehensive understanding of how open source software projects and companies operate. Finally, we will present an empirical, mixed-methods study demonstrating how to classify open-source projects by their lifecycle position. This is the first step forward, advancing our scientific and practical knowledge of open source software through the lens of dynamic and evolving open source genres. It concludes with examples and a proposed path forward.

Authors:Yui Kondo, Kevin Dunnell, Qing Xiao, Jun Zhao, Luc Rocher
Title: Algorithmic Mirror: Designing an Interactive Tool to Promote Self-Reflection for YouTube Recommendations
Abstract:
Big Data analytics and Artificial Intelligence systems derive non-intuitive and often unverifiable inferences about individuals' behaviors, preferences, and private lives. Drawing on diverse, feature-rich datasets of unpredictable value, these systems erode the intuitive connection between our actions and how we are perceived, diminishing control over our digital identities. While Explainable Artificial Intelligence scholars have attempted to explain the inner workings of algorithms, their visualizations frequently overwhelm end-users with complexity. This research introduces 'hypothetical inference', a novel approach that uses language models to simulate how algorithms might interpret users' digital footprints and infer personal characteristics without requiring access to proprietary platform algorithms. Through empirical studies with fourteen adult participants, we identified three key design opportunities to foster critical algorithmic literacy: (1) reassembling scattered digital footprints into a unified map, (2) simulating algorithmic inference through LLM-generated interpretations, and (3) incorporating temporal dimensions to visualize evolving patterns. This research lays the groundwork for tools that can help users recognize the influence of data on platforms and develop greater autonomy in increasingly algorithm-mediated digital environments.

Authors:Xianghe Liu, Jiaqi Xu, Tao Sun
Title: PsyCounAssist: A Full-Cycle AI-Powered Psychological Counseling Assistant System
Abstract:
Psychological counseling is a highly personalized and dynamic process that requires therapists to continuously monitor emotional changes, document session insights, and maintain therapeutic continuity. In this paper, we introduce PsyCounAssist, a comprehensive AI-powered counseling assistant system specifically designed to augment psychological counseling practices. PsyCounAssist integrates multimodal emotion recognition combining speech and photoplethysmography (PPG) signals for accurate real-time affective analysis, automated structured session reporting using large language models (LLMs), and personalized AI-generated follow-up support. Deployed on Android-based tablet devices, the system demonstrates practical applicability and flexibility in real-world counseling scenarios. Experimental evaluation confirms the reliability of PPG-based emotional classification and highlights the system's potential for non-intrusive, privacy-aware emotional support. PsyCounAssist represents a novel approach to ethically and effectively integrating AI into psychological counseling workflows.

Authors:Sneha Nanavati, Nimmi Rangaswamy
Title: Bridging Data Gaps and Building Knowledge Networks in Indian Football Analytics
Abstract:
The global rise of football analytics has rapidly transformed how clubs make strategic decisions. However, in India, the adoption of analytics remains constrained by institutional resistance, infrastructural limitations, and cultural barriers -- challenges that grassroots innovation and low-cost data solutions have the potential to overcome. Despite the growing popularity of the Indian Super League, resource scarcity and fragmented governance continue to hinder the widespread adoption and impact of analytics. This mixed-methods study explores how informal, decentralised analytics communities -- comprising amateur analysts and Twitter-based "data sleuths" -- navigate these constraints through peer mentorship and grassroots innovation. Drawing on extensive digital ethnography, participant observation, and interviews, the study illustrates how these informal networks mitigate data scarcity, limited digital infrastructure, and institutional indifference while fostering skill development and professional growth. Building on these insights, the paper proposes HCI interventions such as decentralised knowledge platforms to facilitate structured, cross-border peer mentorship and low-cost data solutions -- including AI-assisted player tracking and mobile analytics dashboards -- rooted in principles of frugal innovation. These interventions aim to bridge the data divide, support inclusive technical engagement in sport, and enhance analytics-driven decision-making in resource-constrained environments. This paper contributes to HCIxB's focus on cross-border collaboration by highlighting how community-driven technological adaptation in the Global South can foster meaningful participation, skill-building, and long-term sustainability through informal learning networks and scalable, context-sensitive tools.

Authors:Bolun Zhang, Yang Shen, Linzhuo Li, Yu Ji, Di Wu, Tongyu Wu, Lianghao Dai
Title: Tinkering Against Scaling
Abstract:
The ascent of scaling in artificial intelligence research has revolutionized the field over the past decade, yet it presents significant challenges for academic researchers, particularly in computational social science and critical algorithm studies. The dominance of large language models, characterized by their extensive parameters and costly training processes, creates a disparity where only industry-affiliated researchers can access these resources. This imbalance restricts academic researchers from fully understanding their tools, leading to issues like reproducibility in computational social science and a reliance on black-box metaphors in critical studies. To address these challenges, we propose a "tinkering" approach that is inspired by existing works. This method involves engaging with smaller models or components that are manageable for ordinary researchers, fostering hands-on interaction with algorithms. We argue that tinkering is both a way of making and knowing for computational social science and a way of knowing for critical studies, and fundamentally, it is a way of caring that has broader implications for both fields.

Authors:Peisen Xu, Jérémie Garcia, Wei Tsang Ooi, Christophe Jouffrais
Title: SafeSpect: Safety-First Augmented Reality Heads-up Display for Drone Inspections
Abstract:
Current tablet-based interfaces for drone operations often impose a heavy cognitive load on pilots and reduce situational awareness by dividing attention between the video feed and the real world. To address these challenges, we designed a heads-up augmented reality (AR) interface that overlays in-situ information to support drone pilots in safety-critical tasks. Through participatory design workshops with professional pilots, we identified key features and developed an adaptive AR interface that dynamically switches between task and safety views to prevent information overload. We evaluated our prototype by creating a realistic building inspection task and comparing three interfaces: a 2D tablet, a static AR, and our adaptive AR design. A user study with 15 participants showed that the AR interface improved access to safety information, while the adaptive AR interface reduced cognitive load and enhanced situational awareness without compromising task performance. We offer design insights for developing safety-first heads-up AR interfaces.

Authors:Zhenguang Zhong, Zhixuan Wang
Title: Intelligent Depression Prevention via LLM-Based Dialogue Analysis: Overcoming the Limitations of Scale-Dependent Diagnosis through Precise Emotional Pattern Recognition
Abstract:
Existing depression screening predominantly relies on standardized questionnaires (e.g., PHQ-9, BDI), which suffer from high misdiagnosis rates (18-34% in clinical studies) due to their static, symptom-counting nature and susceptibility to patient recall bias. This paper presents an AI-powered depression prevention system that leverages large language models (LLMs) to analyze real-time conversational cues--including subtle emotional expressions (e.g., micro-sentiment shifts, self-referential language patterns)--for more accurate and dynamic mental state assessment. Our system achieves three key innovations: (1) Continuous monitoring through natural dialogue, detecting depression-indicative linguistic features (anhedonia markers, hopelessness semantics) with 89% precision (vs. 72% for PHQ-9); (2) Adaptive risk stratification that updates severity levels based on conversational context, reducing false positives by 41% compared to scale-based thresholds; and (3) Personalized intervention strategies tailored to users' emotional granularity, demonstrating 2.3x higher adherence rates than generic advice. Clinical validation with 450 participants shows the system identifies 92% of at-risk cases missed by traditional scales, while its explainable AI interface bridges the gap between automated analysis and clinician judgment. This work establishes conversational AI as a paradigm shift from episodic scale-dependent diagnosis to continuous, emotionally intelligent mental health monitoring.

Authors:Marcin Furtak, Florian Pätzold, Tim Kietzmann, Silke M. Kärcher, Peter König
Title: Helping Blind People Grasp: Enhancing a Tactile Bracelet with an Automated Hand Navigation System
Abstract:
Grasping constitutes a critical challenge for visually impaired people. To address this problem, we developed a tactile bracelet that assists in grasping by guiding the user's hand to a target object using vibration commands. Here we demonstrate the fully automated system around the bracelet, which can confidently detect and track target and distractor objects and reliably guide the user's hand. We validate our approach in three tasks that resemble complex, everyday use cases. In a grasping task, the participants grasp varying target objects on a table, guided via the automated hand navigation system. In the multiple objects task, participants grasp objects from the same class, demonstrating our system's ability to track one specific object without targeting surrounding distractor objects. Finally, the participants grasp one specific target object by avoiding an obstacle along the way in the depth navigation task, showcasing the potential to utilize our system's depth estimations to navigate even complex scenarios. Additionally, we demonstrate that the system can aid users in the real world by testing it in a less structured environment with a blind participant. Overall, our results demonstrate that the system, by translating the AI-processed visual inputs into a reduced data rate of actionable signals, enables autonomous behavior in everyday environments, thus potentially increasing the quality of life of visually impaired people.

Authors:Yuga Tsukuda, Naoto Nishida, Jun Lu, Yoichi Ochiai
Title: Insect-Computer Hybrid Speaker: Speaker using Chirp of the Cicada Controlled by Electrical Muscle Stimulation
Abstract:
We propose "Insect-Computer Hybrid Speaker", which enables us to make musics made from combinations of computer and insects. Lots of studies have proposed methods and interfaces for controlling insects and obtaining feedback. However, there have been less research on the use of insects for interaction with third parties. In this paper, we propose a method in which cicadas are used as speakers triggered by using Electrical Muscle Stimulation (EMS). We explored and investigated the suitable waveform of chirp to be controlled, the appropriate voltage range, and the maximum pitch at which cicadas can chirp.

Authors:Tina Behzad, Mithilesh Kumar Singh, Anthony J. Ripa, Klaus Mueller
Title: FairPlay: A Collaborative Approach to Mitigate Bias in Datasets for Improved AI Fairness
Abstract:
The issue of fairness in decision-making is a critical one, especially given the variety of stakeholder demands for differing and mutually incompatible versions of fairness. Adopting a strategic interaction of perspectives provides an alternative to enforcing a singular standard of fairness. We present a web-based software application, FairPlay, that enables multiple stakeholders to debias datasets collaboratively. With FairPlay, users can negotiate and arrive at a mutually acceptable outcome without a universally agreed-upon theory of fairness. In the absence of such a tool, reaching a consensus would be highly challenging due to the lack of a systematic negotiation process and the inability to modify and observe changes. We have conducted user studies that demonstrate the success of FairPlay, as users could reach a consensus within about five rounds of gameplay, illustrating the application's potential for enhancing fairness in AI systems.

Authors:Carmine Attanasio, Alireza Mortezapour
Title: Quality of explanation of xAI from the prespective of Italian end-users: Italian version of System Causability Scale (SCS)
Abstract:
Background and aim: Considering the scope of the application of artificial intelligence beyond the field of computer science, one of the concerns of researchers is to provide quality explanations about the functioning of algorithms based on artificial intelligence and the data extracted from it. The purpose of the present study is to validate the Italian version of system causability scale (I-SCS) to measure the quality of explanations provided in a xAI. Method: For this purpose, the English version, initially provided in 2020 in coordination with the main developer, was utilized. The forward-backward translation method was applied to ensure accuracy. Finally, these nine steps were completed by calculating the content validity index/ratio and conducting cognitive interviews with representative end users. Results: The original version of the questionnaire consisted of 10 questions. However, based on the obtained indexes (CVR below 0.49), one question (Question 8) was entirely removed. After completing the aforementioned steps, the Italian version contained 9 questions. The representative sample of Italian end users fully comprehended the meaning and content of the questions in the Italian version. Conclusion: The Italian version obtained in this study can be used in future research studies as well as in the field by xAI developers. This tool can be used to measure the quality of explanations provided for an xAI system in Italian culture.

Authors:Sridevi Polavaram, Xin Zhou, Meenu Ravi, Mohammad Zarei, Anmol Srivastava
Title: Context-Awareness and Interpretability of Rare Occurrences for Discovery and Formalization of Critical Failure Modes
Abstract:
Vision systems are increasingly deployed in critical domains such as surveillance, law enforcement, and transportation. However, their vulnerabilities to rare or unforeseen scenarios pose significant safety risks. To address these challenges, we introduce Context-Awareness and Interpretability of Rare Occurrences (CAIRO), an ontology-based human-assistive discovery framework for failure cases (or CP - Critical Phenomena) detection and formalization. CAIRO by design incentivizes human-in-the-loop for testing and evaluation of criticality that arises from misdetections, adversarial attacks, and hallucinations in AI black-box models. Our robust analysis of object detection model(s) failures in automated driving systems (ADS) showcases scalable and interpretable ways of formalizing the observed gaps between camera perception and real-world contexts, resulting in test cases stored as explicit knowledge graphs (in OWL/XML format) amenable for sharing, downstream analysis, logical reasoning, and accountability.

Authors:Dinithi Dissanayake, Suranga Nanayakkara
Title: Navigating the State of Cognitive Flow: Context-Aware AI Interventions for Effective Reasoning Support
Abstract:
Flow theory describes an optimal cognitive state where individuals experience deep focus and intrinsic motivation when a task's difficulty aligns with their skill level. In AI-augmented reasoning, interventions that disrupt the state of cognitive flow can hinder rather than enhance decision-making. This paper proposes a context-aware cognitive augmentation framework that adapts interventions based on three key contextual factors: type, timing, and scale. By leveraging multimodal behavioral cues (e.g., gaze behavior, typing hesitation, interaction speed), AI can dynamically adjust cognitive support to maintain or restore flow. We introduce the concept of cognitive flow, an extension of flow theory in AI-augmented reasoning, where interventions are personalized, adaptive, and minimally intrusive. By shifting from static interventions to context-aware augmentation, our approach ensures that AI systems support deep engagement in complex decision-making and reasoning without disrupting cognitive immersion.

Authors:Chengbo Zheng, Tim Miller, Alina Bialkowski, H Peter Soyer, Monika Janda
Title: Supporting Data-Frame Dynamics in AI-assisted Decision Making
Abstract:
High stakes decision-making often requires a continuous interplay between evolving evidence and shifting hypotheses, a dynamic that is not well supported by current AI decision support systems. In this paper, we introduce a mixed-initiative framework for AI assisted decision making that is grounded in the data-frame theory of sensemaking and the evaluative AI paradigm. Our approach enables both humans and AI to collaboratively construct, validate, and adapt hypotheses. We demonstrate our framework with an AI-assisted skin cancer diagnosis prototype that leverages a concept bottleneck model to facilitate interpretable interactions and dynamic updates to diagnostic hypotheses.

Authors:Clarissa Sabrina Arlinghaus, Ashita Ashok, Ashim Mandal, Karsten Berns, Günter W. Maier
Title: Beyond Attention: Investigating the Threshold Where Objective Robot Exclusion Becomes Subjective
Abstract:
As robots become increasingly involved in decision-making processes (e.g., personnel selection), concerns about fairness and social inclusion arise. This study examines social exclusion in robot-led group interviews by robot Ameca, exploring the relationship between objective exclusion (robot's attention allocation), subjective exclusion (perceived exclusion), mood change, and need fulfillment. In a controlled lab study (N = 35), higher objective exclusion significantly predicted subjective exclusion. In turn, subjective exclusion negatively impacted mood and need fulfillment but only mediated the relationship between objective exclusion and need fulfillment. A piecewise regression analysis identified a critical threshold at which objective exclusion begins to be perceived as subjective exclusion. Additionally, the standing position was the primary predictor of exclusion, whereas demographic factors (e.g., gender, height) had no significant effect. These findings underscore the need to consider both objective and subjective exclusion in human-robot interactions and have implications for fairness in robot-assisted hiring processes.

Authors:Heejae Bae, Nayeong Kim, Sehee Lee, Tak Yeon Lee
Title: Bridging Bond Beyond Life: Designing VR Memorial Space with Stakeholder Collaboration via Research through Design
Abstract:
The integration of digital technologies into memorialization practices offers opportunities to transcend physical and temporal limitations. However, designing personalized memorial spaces that address the diverse needs of the dying and the bereaved remains underexplored. Using a Research through Design (RtD) approach, we conducted a three-phase study: participatory design, VR memorial space development, and user testing. This study highlights three key aspects: 1) the value of VR memorial spaces as bonding mediums, 2) the role of a design process that engages users through co-design, development, and user testing in addressing the needs of the dying and the bereaved, and 3) design elements that enhance the VR memorial experience. This research lays the foundation for personalized VR memorialization practices, providing insights into how technology can enrich remembrance and relational experiences.

Authors:Ryan Najami, Rami Ghannam
Title: Enhancing Tennis Training with Real-Time Swing Data Visualisation in Immersive Virtual Reality
Abstract:
Recent advances in immersive technology have opened new possibilities in sports training, especially for activities requiring precise motor skills, such as tennis. In this paper, we present a virtual reality (VR) tennis training system integrating real-time performance feedback through a wearable sensor device. Ten participants wore the sensor on their dominant hand to capture motion data, including swing speed and swing power, while engaging in a VR tennis environment. Initially, participants performed baseline tests without access to performance metrics. In subsequent tests, participants executed similar routines with their swing data displayed in real-time via a VR overlay. Qualitative and quantitative results indicated that real-time visual feedback led to improved performance behaviors and enhanced situational awareness. Some participants exhibited increased swing consistency and strategic decision-making, though improvements in accuracy varied individually. Additionally, subjective feedback highlighted that the immersive experience, combined with instantaneous performance metrics, enhanced player engagement and motivation. These findings illustrate the effectiveness of VR-based data visualisation in sports training, suggesting broader applicability in performance enhancement.

Authors:Seung Gyu Jeong, Sung Woo Nam, Seong Kwan Jung, Seong-Eun Kim
Title: iMedic: Towards Smartphone-based Self-Auscultation Tool for AI-Powered Pediatric Respiratory Assessment
Abstract:
Respiratory auscultation is crucial for early detection of pediatric pneumonia, a condition that can quickly worsen without timely intervention. In areas with limited physician access, effective auscultation is challenging. We present a smartphone-based system that leverages built-in microphones and advanced deep learning algorithms to detect abnormal respiratory sounds indicative of pneumonia risk. Our end-to-end deep learning framework employs domain generalization to integrate a large electronic stethoscope dataset with a smaller smartphone-derived dataset, enabling robust feature learning for accurate respiratory assessments without expensive equipment. The accompanying mobile application guides caregivers in collecting high-quality lung sound samples and provides immediate feedback on potential pneumonia risks. User studies show strong classification performance and high acceptance, demonstrating the system's ability to facilitate proactive interventions and reduce preventable childhood pneumonia deaths. By seamlessly integrating into ubiquitous smartphones, this approach offers a promising avenue for more equitable and comprehensive remote pediatric care.

Authors:Yi Wen, Meng Xia
Title: Promoting Real-Time Reflection in Synchronous Communication with Generative AI
Abstract:
Real-time reflection plays a vital role in synchronous communication. It enables users to adjust their communication strategies dynamically, thereby improving the effectiveness of their communication. Generative AI holds significant potential to enhance real-time reflection due to its ability to comprehensively understand the current context and generate personalized and nuanced content. However, it is challenging to design the way of interaction and information presentation to support the real-time workflow rather than disrupt it. In this position paper, we present a review of existing research on systems designed for reflection in different synchronous communication scenarios. Based on that, we discuss design implications on how to design human-AI interaction to support reflection in real time.

Authors:Antonin Brun, Gale Lucas, Burçin Becerik-Gerber
Title: Under Pressure: Contextualizing Workplace Stress Towards User-Centered Interventions
Abstract:
Stress is a pervasive challenge that significantly impacts worker health and well-being. Workplace stress is driven by various factors, ranging from organizational changes to poor workplace design. Although individual stress management strategies have been shown to be effective, current interventions often overlook personal and contextual factors shaping stress experiences. In this study, we conducted semi-structured interviews with eight office workers to gain a deeper understanding of their personal experiences with workplace stress. Our analysis reveals key stress triggers, coping mechanisms, and reflections on past stressful events. We highlight the multifaceted and individualized nature of workplace stress, emphasizing the importance of intervention timing, modality, and recognizing that stress is not solely a negative experience but can also have positive effects. Our findings provide actionable insights for the design of user-centered stress management solutions more attuned to the needs of office workers.

Authors:Yenkai Huang, Ning Zheng
Title: LACE: Controlled Image Prompting and Iterative Refinement with GenAI for Professional Visual Art Creators
Abstract:
We present LACE, a hybrid Human-AI co-creative system integrated into Adobe Photoshop supporting turn-taking and parallel interaction modes for iterative image generation. Through a study with 21 participants across representational, abstract, and design tasks, we found turn-taking preferred in early stages for idea generation, and parallel modes suited for detailed refinement. While this shorter workshop paper provides key insights and highlights, the comprehensive findings and detailed analysis are presented in a longer version available separately on arXiv.

Authors:Yao Shi, Rongkeng Liang, Yong Xu
Title: EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue Framework
Abstract:
Large language models (LLMs) increasingly serve as educational tools, yet evaluating their teaching capabilities remains challenging due to the resource-intensive, context-dependent, and methodologically complex nature of teacher-student interactions. We introduce EducationQ, a multi-agent dialogue framework that efficiently assesses teaching capabilities through simulated dynamic educational scenarios, featuring specialized agents for teaching, learning, and evaluation. Testing 14 LLMs across major AI Organizations (OpenAI, Meta, Google, Anthropic, and others) on 1,498 questions spanning 13 disciplines and 10 difficulty levels reveals that teaching effectiveness does not correlate linearly with model scale or general reasoning capabilities - with some smaller open-source models outperforming larger commercial counterparts in teaching contexts. This finding highlights a critical gap in current evaluations that prioritize knowledge recall over interactive pedagogy. Our mixed-methods evaluation, combining quantitative metrics with qualitative analysis and expert case studies, identifies distinct pedagogical strengths employed by top-performing models (e.g., sophisticated questioning strategies, adaptive feedback mechanisms). Human expert evaluations show 78% agreement with our automated qualitative analysis of effective teaching behaviors, validating our methodology. EducationQ demonstrates that LLMs-as-teachers require specialized optimization beyond simple scaling, suggesting next-generation educational AI prioritize targeted enhancement of specific pedagogical effectiveness.

Authors:YenKai Huang, Zheng Ning, Ming Cheng
Title: LACE: Exploring Turn-Taking and Parallel Interaction Modes in Human-AI Co-Creation for Iterative Image Generation
Abstract:
This paper introduces LACE, a co-creative system enabling professional artists to leverage generative AI through controlled prompting and iterative refinement within Photoshop. Addressing challenges in precision, iterative coherence, and workflow compatibility, LACE allows flexible control via layer-based editing and dual-mode collaboration (turn-taking and parallel). A pilot study (N=21) demonstrates significant improvements in user satisfaction, ownership, usability, and artistic perception compared to standard AI workflows. We offer comprehensive findings, system details, nuanced user feedback, and implications for integrating generative AI in professional art practices.

Authors:Deepak Ghimire, Sunghwan Jeong, Sunhong Yoon, Sanghyun Park, Juhwan Choi
Title: Real-Time Sleepiness Detection for Driver State Monitoring System
Abstract:
A driver face monitoring system can detect driver fatigue, which is a significant factor in many accidents, using computer vision techniques. In this paper, we present a real-time technique for driver eye state detection. First, the face is detected, and the eyes are located within the face region for tracking. A normalized cross-correlation-based online dynamic template matching technique, combined with Kalman filter tracking, is proposed to track the detected eye positions in subsequent image frames. A support vector machine with histogram of oriented gradients (HOG) features is used to classify the state of the eyes as open or closed. If the eyes remain closed for a specified period, the driver is considered to be asleep, and an alarm is triggered.

Authors:Janet G. Johnson, Macarena Peralta, Mansanjam Kaur, Ruijie Sophia Huang, Sheng Zhao, Ruijia Guan, Shwetha Rajaram, Michael Nebeling
Title: Exploring Collaborative GenAI Agents in Synchronous Group Settings: Eliciting Team Perceptions and Design Considerations for the Future of Work
Abstract:
While generative artificial intelligence (GenAI) is finding increased adoption in workplaces, current tools are primarily designed for individual use. Prior work established the potential for these tools to enhance personal creativity and productivity towards shared goals; however, we don't know yet how to best take into account the nuances of group work and team dynamics when deploying GenAI in work settings. In this paper, we investigate the potential of collaborative GenAI agents to augment teamwork in synchronous group settings through an exploratory study that engaged 25 professionals across 6 teams in speculative design workshops and individual follow-up interviews. Our workshops included a mixed reality provotype to simulate embodied collaborative GenAI agents capable of actively participating in group discussions. Our findings suggest that, if designed well, collaborative GenAI agents offer valuable opportunities to enhance team problem-solving by challenging groupthink, bridging communication gaps, and reducing social friction. However, teams' willingness to integrate GenAI agents depended on its perceived fit across a number of individual, team, and organizational factors. We outline the key design tensions around agent representation, social prominence, and engagement and highlight the opportunities spatial and immersive technologies could offer to modulate GenAI influence on team outcomes and strike a balance between augmentation and agency.

Authors:Katelyn Xiaoying Mei, Nic Weber
Title: Designing AI Systems that Augment Human Performed vs. Demonstrated Critical Thinking
Abstract:
The recent rapid advancement of LLM-based AI systems has accelerated our search and production of information. While the advantages brought by these systems seemingly improve the performance or efficiency of human activities, they do not necessarily enhance human capabilities. Recent research has started to examine the impact of generative AI on individuals' cognitive abilities, especially critical thinking. Based on definitions of critical thinking across psychology and education, this position paper proposes the distinction between demonstrated and performed critical thinking in the era of generative AI and discusses the implication of this distinction in research and development of AI systems that aim to augment human critical thinking.

Authors:Viet Hung Pham, Malte Wagenfeld, Regina Bernhaupt
Title: Virtual Reality for Urban Walkability Assessment
Abstract:
Traditional urban planning methodologies often fail to capture the complexity of contemporary urbanization and environmental sustainability challenges. This study investigates the integration of Generative Design, Virtual Reality (VR), and Digital Twins (DT) to enhance walkability in urban planning. VR provides distinct benefits over conventional approaches, including 2D maps, static renderings, and physical models, by allowing stakeholders to engage with urban designs more intuitively, identify walkability challenges, and suggest iterative improvements. Preliminary findings from structured interviews with Eindhoven residents provide critical insights into pedestrian preferences and walkability considerations. The next phase of the study involves the development of VR-DT integrated prototypes to simulate urban environments, assess walkability, and explore the role of Generative Design in generating adaptive urban planning solutions. The objective is to develop a decision-support tool that enables urban planners to incorporate diverse stakeholder perspectives, optimize pedestrian-oriented urban design, and advance regenerative development principles. By leveraging these emerging technologies, this research contributes to the evolution of data-driven, participatory urban planning frameworks aimed at fostering sustainable and walkable cities.

Authors:Sergei Volodin, Hala Khodr, Pierre Dillenbourg, Wafa Johal
Title: Going Down the Abstraction Stream with Augmented Reality and Tangible Robots: the Case of Vector Instruction
Abstract:
Despite being used in many engineering and scientific areas such as physics and mathematics and often taught in high school, graphical vector addition turns out to be a topic prone to misconceptions in understanding even at university-level physics classes. To improve the learning experience and the resulting understanding of vectors, we propose to investigate how concreteness fading implemented with the use of augmented reality and tangible robots could help learners to build a strong representation of vector addition. We design a gamified learning environment consisting of three concreteness fading stages and conduct an experiment with 30 participants. Our results shows a positive learning gain. We analyze extensively the behavior of the participants to understand the usage of the technological tools -- augmented reality and tangible robots -- during the learning scenario. Finally, we discuss how the combination of these tools shows real advantages in implementing the concreteness fading paradigm. Our work provides empirical insights into how users utilize concrete visualizations conveyed by a haptic-enabled robot and augmented reality in a learning scenario.

Authors:Navreet Kaur, Manuel Gonzales, Cristian Garcia Alcaraz, Jiaqi Gong, Kristen J. Wells, Laura E. Barnes
Title: A computational framework for longitudinal medication adherence prediction in breast cancer survivors: A social cognitive theory based approach
Abstract:
Non-adherence to medications is a critical concern since nearly half of patients with chronic illnesses do not follow their prescribed medication regimens, leading to increased mortality, costs, and preventable human distress. Amongst stage 0-3 breast cancer survivors, adherence to long-term adjuvant endocrine therapy (i.e., Tamoxifen and aromatase inhibitors) is associated with a significant increase in recurrence-free survival. This work aims to develop multi-scale models of medication adherence to understand the significance of different factors influencing adherence across varying time frames. We introduce a computational framework guided by Social Cognitive Theory for multi-scale (daily and weekly) modeling of longitudinal medication adherence. Our models employ both dynamic medication-taking patterns in the recent past (dynamic factors) as well as less frequently changing factors (static factors) for adherence prediction. Additionally, we assess the significance of various factors in influencing adherence behavior across different time scales. Our models outperform traditional machine learning counterparts in both daily and weekly tasks in terms of both accuracy and specificity. Daily models achieved an accuracy of 87.25%, and weekly models, an accuracy of 76.04%. Notably, dynamic past medication-taking patterns prove most valuable for predicting daily adherence, while a combination of dynamic and static factors is significant for macro-level weekly adherence patterns.

Authors:Saif Bashar, Samia Nasir Nira, Shabbir Mahmood, Md. Humaun Kabir, Sujit Roy, Iffat Farhana
Title: Recognition of Frequencies of Short-Time SSVEP Signals Utilizing an SSCCA-Based Spatio-Spectral Feature Fusion Framework
Abstract:
A brain-computer interface (BCI) facilitates direct communication between the brain and external equipment through EEG, which is preferred for its superior temporal resolution. Among EEG techniques, the steady-state visual evoked potential (SSVEP) is favored due to its robust signal-to-noise ratio, minimal training demands, and elevated information transmission rate. Frequency detection in SSVEP-based brain-computer interfaces commonly employs canonical correlation analysis (CCA). SSCCA (spatio-spectral canonical correlation analysis) augments CCA by refining spatial filtering. This paper presents a multistage feature fusion methodology for short-duration SSVEP frequency identification, employing SSCCA with template signals derived via leave-one-out cross-validation (LOOCV). A filterbank generates bandpass filters for stimulus frequencies and their harmonics, whereas SSCCA calculates correlation coefficients between subbands and templates. Two phases of non-linear weighting amalgamate these coefficients to discern the target stimulus. This multistage methodology surpasses traditional techniques, attaining a accuracy of 94.5%.

Authors:Alexander Klippel, Bart Knuiman, Jiayan Zhao, Jan Oliver Wallgrün, Jascha Grübel
Title: AnywhereXR: On-the-fly 3D Environments as a Basis for Open Source Immersive Digital Twin Applications
Abstract:
Visualization has long been fundamental to human communication and decision-making. Today, we stand at the threshold of integrating veridical, high-fidelity visualizations into immersive digital environments, alongside digital twinning techniques. This convergence heralds powerful tools for communication, co-design, and participatory decision-making. Our paper delves into the development of lightweight open-source immersive digital twin visualisations, capitalizing on the evolution of immersive technologies, the wealth of spatial data available, and advancements in digital twinning. Coined AnywhereXR, this approach ultimately seeks to democratize access to spatial information at a global scale. Utilizing the Netherlands as our starting point, we envision expanding this methodology worldwide, leveraging open data and software to address pressing societal challenges across diverse domains.

Authors:Michael MacInnis, Olga Baysal, Michele Lanza
Title: Terminal Lucidity: Envisioning the Future of the Terminal
Abstract:
The Unix terminal, or just simply, the terminal, can be found being applied in almost every facet of computing. It is available across all major platforms and often integrated into other applications. Due to its ubiquity, even marginal improvements to the terminal have the potential to make massive improvements to productivity on a global scale. We believe that evolutionary improvements to the terminal, in its current incarnation as windowed terminal emulator, are possible and that developing a thorough understanding of issues that current terminal users face is fundamental to knowing how the terminal should evolve. In order to develop that understanding we have mined Unix and Linux Stack Exchange using a fully-reproducible method which was able to extract and categorize 91.0% of 1,489 terminal-related questions (from the full set of nearly 240,000 questions) without manual intervention. We present an analysis, to our knowledge the first of its kind, of windowed terminal-related questions posted over a 15-year period and viewed, in aggregate, approximately 40 million times. As expected, given its longevity, we find the terminal's many features being applied across a wide variety of use cases. We find evidence that the terminal, as windowed terminal emulator, has neither fully adapted to its now current graphical environment nor completely untangled itself from features more suited to incarnations in previous environments. We also find evidence of areas where we believe the terminal could be extended along with other areas where it could be simplified. Surprisingly, while many current efforts to improve the terminal include improving the terminal's social and collaborative aspects, we find little evidence of this as a prominent pain point.

Authors:Myke C. Cohen, David A. Grimm, Reuth Mirsky, Xiaoyun Yin
Title: Birds of a Different Feather Flock Together: Exploring Opportunities and Challenges in Animal-Human-Machine Teaming
Abstract:
Animal-Human-Machine (AHM) teams are a type of hybrid intelligence system wherein interactions between a human, AI-enabled machine, and animal members can result in unique capabilities greater than the sum of their parts. This paper calls for a systematic approach to studying the design of AHM team structures to optimize performance and overcome limitations in various applied settings. We consider the challenges and opportunities in investigating the synergistic potential of AHM team members by introducing a set of dimensions of AHM team functioning to effectively utilize each member's strengths while compensating for individual weaknesses. Using three representative examples of such teams -- security screening, search-and-rescue, and guide dogs -- the paper illustrates how AHM teams can tackle complex tasks. We conclude with open research directions that this multidimensional approach presents for studying hybrid human-AI systems beyond AHM teams.

Authors:Juan David Salazar Rodriguez, Sam Conrad Joyce, Julfendi
Title: Using customized GPT to develop prompting proficiency in architectural AI-generated images
Abstract:
This research investigates the use of customized GPT models to enhance prompting proficiency among architecture students when generating AI-driven images. Prompt engineering is increasingly essential in architectural education due to the widespread adoption of generative AI tools. This study utilized a mixed-methods experimental design involving architecture students divided into three distinct groups: a control group receiving no structured support, a second group provided with structured prompting guides, and a third group supported by both structured guides and interactive AI personas. Students engaged in reverse engineering tasks, first guessing provided image prompts and then generating their own prompts, aiming to boost critical thinking and prompting skills. Variables examined included time spent prompting, word count, prompt similarity, and concreteness. Quantitative analysis involved correlation assessments between these variables and a one-way ANOVA to evaluate differences across groups. While several correlations showed meaningful relationships, not all were statistically significant. ANOVA results indicated statistically significant improvements in word count, similarity, and concreteness, especially in the group supported by AI personas and structured prompting guides. Qualitative feedback complemented these findings, revealing enhanced confidence and critical thinking skills in students. These results suggest tailored GPT interactions substantially improve students' ability to communicate architectural concepts clearly and effectively.

Authors:Sukanth Kalivarathan, Muhmmad Abrar Raja Mohamed, Aswathy Ravikumar, S Harini
Title: Intelligence of Things: A Spatial Context-Aware Control System for Smart Devices
Abstract:
This paper introduces Intelligence of Things (INOT), a novel spatial context-aware control system that enhances smart home automation through intuitive spatial reasoning. Current smart home systems largely rely on device-specific identifiers, limiting user interaction to explicit naming conventions rather than natural spatial references. INOT addresses this limitation through a modular architecture that integrates Vision Language Models with IoT control systems to enable natural language commands with spatial context (e.g., "turn on the light near the window"). The system comprises key components including an Onboarding Inference Engine, Zero-Shot Device Detection, Spatial Topology Inference, and Intent-Based Command Synthesis. A comprehensive user study with 15 participants demonstrated INOT's significant advantages over conventional systems like Google Home Assistant, with users reporting reduced cognitive workload (NASA-TLX scores decreased by an average of 13.17 points), higher ease-of-use ratings, and stronger preference (14 out of 15 participants). By eliminating the need to memorize device identifiers and enabling context-aware spatial commands, INOT represents a significant advancement in creating more intuitive and accessible smart home control systems.

Authors:Theofanis Tasoulas, Alexandros Gazis, Aggeliki Tsohou
Title: Comprehensive Classification of Web Tracking Systems: Technological In-sights and Analysis
Abstract:
Web tracking (WT) systems are advanced technologies used to monitor and analyze online user behavior. Initially focused on HTML and static webpages, these systems have evolved with the proliferation of IoT, edge computing, and Big Data, encompassing a broad array of interconnected devices with APIs, interfaces and computing nodes for interaction. WT systems are pivotal in technological innovation and business development, although trends like GDPR complicate data extraction and mandate transparency. Specifically, this study examines WT systems purely from a technological perspective, excluding organizational and privacy implications. A novel classification scheme based on technological architecture and principles is proposed, compared to two preexisting frameworks. The scheme categorizes WT systems into six classes, emphasizing technological mechanisms such as HTTP proto-cols, APIs, and user identification techniques. Additionally, a survey of over 1,000 internet users, conducted via Google Forms, explores user awareness of WT systems. Findings indicate that knowledge of WT technologies is largely unrelated to demographic factors such as age or gender but is strongly influenced by a user's background in computer science. Most users demonstrate only a basic understanding of WT tools, and this awareness does not correlate with heightened concerns about data misuse. As such, the research highlights gaps in user education about WT technologies and underscores the need for a deeper examination of their technical underpinnings. This study provides a foundation for further exploration of WT systems from multiple perspectives, contributing to advance-ments in classification, implementation, and user awareness.

Authors:Johan van der Meer, Pamela Hoyte, Luisa Roeder, Peter Bruza
Title: Modeling the quantum-like dynamics of human reliability ratings in Human-AI interactions by interaction dependent Hamiltonians
Abstract:
As our information environments become ever more powered by artificial intelligence (AI), the phenomenon of trust in a human's interactions with this intelligence is becoming increasingly pertinent. For example, in the not too distant future, there will be teams of humans and intelligent robots involved in dealing with the repercussions of high-risk disaster situations such as hurricanes, earthquakes, or nuclear accidents. Even in such conditions of high uncertainty, humans and intelligent machines will need to engage in shared decision making, and trust is fundamental to the effectiveness of these interactions. A key challenge in modeling the dynamics of this trust is to provide a means to incorporate sensitivity to fluctuations in human trust judgments. In this article, we explore the ability of Quantum Random Walk models to model the dynamics of trust in human-AI interactions, and to integrate a sensitivity to fluctuations in participant trust judgments based on the nature of the interaction with the AI. We found that using empirical parameters to inform the use of different Hamiltonians can provide a promising means to model the evolution of trust in Human-AI interactions.

Authors:Vyshnav Kumar P, Vinayak CM, Thomson Gigi, Sulabh Bashyal, Janaki Kandasamy
Title: Modular Pet Feeding Device
Abstract:
This paper introduces a modular pet feeding device that combines automated feeding, health monitoring, and behavioral insights for modern pet care. Unlike traditional feeders, it features a wide-angle camera and microphone for food and water level assessment, pet approach detection, and sound monitoring. The device also includes an AI-enabled neckband to track heart rate, enabling early detection of unusual behaviors or health concerns. The AI system analyzes feeding history, behavior, and health data to provide personalized care suggestions, optimizing feeding times, portions, and dietary recommendations to improve pet well-being.

Authors:Soubhik Barari, Jarret Angbazo, Natalie Wang, Leah M. Christian, Elizabeth Dean, Zoe Slowinski, Brandon Sepulvado
Title: AI-Assisted Conversational Interviewing: Effects on Data Quality and User Experience
Abstract:
Standardized surveys scale efficiently but sacrifice depth, while conversational interviews improve response quality at the cost of scalability and consistency. This study bridges the gap between these methods by introducing a framework for AI-assisted conversational interviewing. To evaluate this framework, we conducted a web survey experiment where 1,800 participants were randomly assigned to AI 'chatbots' which use large language models (LLMs) to dynamically probe respondents for elaboration and interactively code open-ended responses to fixed questions developed by human researchers. We assessed the AI chatbot's performance in terms of coding accuracy, response quality, and respondent experience. Our findings reveal that AI chatbots perform moderately well in live coding even without survey-specific fine-tuning, despite slightly inflated false positive errors due to respondent acquiescence bias. Open-ended responses were more detailed and informative, but this came at a slight cost to respondent experience. Our findings highlight the feasibility of using AI methods such as chatbots enhanced by LLMs to enhance open-ended data collection in web surveys.

Authors:Snezna B Schmidt, Stephen Isbel, Blooma John, Ram Subramanian, Nathan M DCunha
Title: Examining Technology Perspectives of Older Adults with Mild Cognitive Impairment: A Scoping Review
Abstract:
Mild cognitive impairment (MCI) may affect up to 20 % of people over 65 years old. Global incidence of MCI is increasing, and technology is being explored for early intervention. Theories of technology adoption predict that useful and easy to use solutions will have higher rates of adoption, however, these models do not specifically consider older people with cognitive impairments, or the unique human computer interaction challenges posed by MCI. We collated opinions from older people with MCI about technology solutions proposed for them, found in 83 articles published between Jan 2014 and May 2024, and found in nine databases. Inductive, thematic analysis of feedback identified five themes (i) purpose and need, (ii) solution design and ease of use, (iii) self-impression, (iv) lifestyle, and (v) interaction modality. Solutions are perceived as useful, even though gaps in functional support exist, however, they are not perceived as entirely easy to use, due to issues related to usability and user experience. Devices which are light, portable, common and have large screens, are preferred, as is multimodal interaction, in particular speech, visual/text and touch. This review recommends future work to (i) improve usability and user experience, (ii) enhance personalisation, (iii) better understand interaction preferences and effectiveness, (iv) enable options for multimodal interaction, and (v) more seamlessly integrate solutions into users lifestyles.

Authors:Julie A. Vera, Sourojit Ghosh
Title: "They've Over-Emphasized That One Search": Controlling Unwanted Content on TikTok's For You Page
Abstract:
Modern algorithmic recommendation systems seek to engage users through behavioral content-interest matching. While many platforms recommend content based on engagement metrics, others like TikTok deliver interest-based content, resulting in recommendations perceived to be hyper-personalized compared to other platforms. TikTok's robust recommendation engine has led some users to suspect that the algorithm knows users "better than they know themselves," but this is not always true. In this paper, we explore TikTok users' perceptions of recommended content on their For You Page (FYP), specifically calling attention to unwanted recommendations. Through qualitative interviews of 14 current and former TikTok users, we find themes of frustration with recommended content, attempts to rid themselves of unwanted content, and various degrees of success in eschewing such content. We discuss implications in the larger context of folk theorization and contribute concrete tactical and behavioral examples of algorithmic persistence.

Authors:Qiang Zou, Shuo Liu
Title: Semantic Direct Modeling
Abstract:
Current direct modeling systems limit users to low-level interactions with vertices, edges, and faces, forcing designers to manage detailed geometric elements rather than focusing on high-level design intent. This paper introduces semantic direct modeling (SDM), a novel approach that lifts direct modeling from low-level geometric modifications to high-level semantic interactions. This is achieved by utilizing a large language model (LLM) fine-tuned with CAD-specific prompts, which can guide the LLM to reason through design intent and accurately interpret CAD commands, thereby allowing designers to express their intent using natural language. Additionally, SDM maps design intent to the corresponding geometric features in the CAD model through a new conditional, context-sensitive feature recognition method, which uses generative AI to dynamically assign feature labels based on design intent. Together, they enable a seamless flow from high-level design intent to low-level geometric modifications, bypassing tedious software interactions. The effectiveness of SDM has been validated through real mechanical design cases.

Authors:Stefano De Paoli, Alex Fawzi
Title: TALLMesh: a simple application for performing Thematic Analysis with Large Language Models
Abstract:
Thematic analysis (TA) is a widely used qualitative research method for identifying and interpreting patterns within textual data, such as qualitative interviews. Recent research has shown that it is possible to satisfactorily perform TA using Large Language Models (LLMs). This paper presents a novel application using LLMs to assist researchers in conducting TA. The application enables users to upload textual data, generate initial codes and themes. All of this is possible through a simple Graphical User Interface, (GUI) based on the streamlit framework, working with python scripts for the analysis, and using Application Program Interfaces of LLMs. Having a GUI is particularly important for researchers in fields where coding skills may not be prevalent, such as social sciences or humanities. With the app, users can iteratively refine codes and themes adopting a human-in-the-loop process, without the need to work with programming and scripting. The paper describes the application key features, highlighting its potential for qualitative research while preserving methodological rigor. The paper discusses the design and interface of the app and outlines future directions for this work.

Authors:Chen Shani, Elizabeth C. Stade
Title: Measuring Mental Health Variables in Computational Research: Toward Validated, Dimensional, and Transdiagnostic Approaches
Abstract:
Computational mental health research develops models to predict and understand psychological phenomena, but often relies on inappropriate measures of psychopathology constructs, undermining validity. We identify three key issues: (1) reliance on unvalidated measures (e.g., self-declared diagnosis) over validated ones (e.g., diagnosis by clinician); (2) treating mental health constructs as categorical rather than dimensional; and (3) focusing on disorder-specific constructs instead of transdiagnostic ones. We outline the benefits of using validated, dimensional, and transdiagnostic measures and offer practical recommendations for practitioners. Using valid measures that reflect the nature and structure of psychopathology is essential for computational mental health research.

Authors:Zeel Pansara, Gabriele Navyte, Tatiana Freitas-Mendes, Camila Bottger, Edoardo Franco, Luca Citi, Erik S. Jacobi, Giulia L. Poerio, Helge Gillmeister, Caterina Cinel, Vito De Feo
Title: Quantifying Emotional Arousal through Pupillary Response: A Novel Approach for Isolating the Luminosity Effect and Predicting Affective States
Abstract:
Researchers have long recognized pupil response as a potential objective indicator of emotional arousal; however, confounding factors, particularly luminosity of stimuli and the ambient environment, have limited its usefulness in detecting emotions. This study presents a new approach to isolate and remove the effect of luminosity on pupil dilation, obtaining the component of pupil dilation due only to emotional arousal. Our model predicts the pupil size due to luminosity only as a function of the screen luminosity and adapts to individual differences in pupil response to light, different types and configurations of monitors by using a calibration procedure. The predicted pupil size has an average correlation with the measured pupil size of 0.76, an R2 of 0.58, and a normalized root mean square error (NRMSE) of 0.14. Here, we demonstrate that our model can be used simply to calculate emotional arousal. We showed 32 video clips with different content and emotional intensity to 47 participants, who, after each video, reported their level of emotional arousal. We then calculated the pupil size due only to luminosity and subtracted it from the total recorded pupil size, obtaining the component due only to emotional arousal. From the latter, we predicted the arousal of each participant for each video. We obtained an average correlation between predicted and self-reported arousal of 0.65, an R2 of 0.43, and an NRMSE of 0.27. Instead, using the measured pupil size, without subtracting the component due to luminosity, we obtained dramatically worse results. an average correlation between the predicted and self-reported arousal of 0.26, an R2 of 0.09, and an NRMSE of 0.42. Our results highlight that separating the emotional and luminosity components from pupillary responses is critical to accurately and precisely predicting arousal.

Authors:Yun Wan, Yoram M Kalman
Title: Using Generative AI Personas Increases Collective Diversity in Human Ideation
Abstract:
This study challenges the widely-reported tradeoff between generative AI's (GenAI) contribution to creative outcomes and decreased diversity of these outcomes. We modified the design of such a study, by Doshi and Hauser (2024), in which participants wrote short stories either aided or unaided by GenAI plot ideas[1]. In the modified study, plot ideas were generated through ten unique GenAI "personas" with diverse traits (e.g. cultural backgrounds, thinking styles, genre preferences), creating a pool of 300 story plots. While plot ideas from any individual persona showed high similarity (average cosine similarity of 0.92), ideas across different personas exhibited substantial variation (average similarity of 0.20). When human participants wrote stories based on these diverse plot ideas, their collective outputs maintained the same level of diversity as stories written without GenAI assistance, effectively eliminating the diversity reduction observed in [1]. Traditional text analytics further revealed that GenAI-assisted stories featured greater diversity in descriptive and emotional language compared to purely human-generated stories without GenAI assistance. Our findings demonstrate that introducing diversity at the AI input stage through distinct personas can preserve and potentially enhance the collective diversity of human creative outputs when collaborating with GenAI.

Authors:Matheus Rodrigues Felizardo, Nuno Miguel Feixa Rodrigues, António Coelho, Sónia Silva Sousa, Adriana Sampaio, Eva Ferreira de Oliveira
Title: Mapping Executive Function Tasks for Children: A Scoping Review for Designing a Research-Oriented Platform
Abstract:
Background: Executive functions (EFs) are cognitive processes essential for controlling impulses, staying focused, thinking before acting, and managing information. Childhood is a critical period for EF development, but there is a lack of standardized tools that combine EF tasks with physical activity in a gamified approach. Objectives: This scoping review maps EF tasks for children, identifies common strategies, and explores methods for measuring outcomes, providing a foundation for a research-oriented platform to assess EF development. Design: A systematic search was conducted in SCOPUS, ScienceDirect, and ERIC databases with the query "executive function task" AND (children OR child OR childhood). Inclusion criteria were studies published between 2019 and 2024 in English, with participants aged 5 to 9 years. Data extracted included task details, scoring mechanisms, and stop conditions. Studies lacking clear methodological descriptions were excluded. Results: A total of 2044 articles were identified, with 113 duplicates removed. After selection, 23 studies met the inclusion criteria. The identified tasks are listed in Table 2. Key tasks, strategies, and measurement methodologies were highlighted. Conclusions: Integrating EF tasks into a structured platform offers a promising approach to standardize assessments, fill research gaps, and provide a reliable tool for studying EF development in children. Keywords: Executive Functions, Inhibition, Working Memory, Cognitive Flexibility, Task Design, Standardization

Authors:Snigdha Tiwari, Sahil Sharma, Arvind Bagga, Aditi Sinha, Deepak Sharma
Title: Utsarjan: A smartphone App for providing kidney care and real-time assistance to children with nephrotic syndrome
Abstract:
Background Telemedicine has the potential to provide secure and cost-effective healthcare at the touch of a button. Nephrotic syndrome is a chronic childhood illness involving frequent relapses and demands long/complex treatment. Hence, developing a remote means of doctor-patient interface will ensure the provision of quality healthcare to patients. Methods The Utsarjan mobile App framework was built with Flutter that enables cross-platform development (Android, iOS, Windows) with speed, smoothness, and open-source benefits. The frontend uses Dart for user interaction, while the backend employs Node.js, Express, and NGINX for APIs, load balancing and high performance. MongoDB ensures a flexible database, Bcrypt secures passwords, PM2 handles deployment, uptime and logs, while Firebase Cloud Messaging powers free push notifications. Results Utsarjan (means excretion) is a multi-functional smartphone application for giving nephrotic care and real-time assistance to all patients (especially those in rural regions and/or who do not have access to specialists). It helps patients and doctors by ensuring opportune visits, recording each clinical test/parameter and improving medication adherence. It gives a graphical visualization of relapses, medicine dosage as well as different anthropometric parameters (urine protein, BP, height and weight). This is the first nephrotic care App that enables prompt access to doctor's advice. Conclusions Utsarjan is a mobile App to provide kidney care and real-time assistance to children with nephrotic syndrome. It gives a graphical overview of changes in a patient's health over the long course of treatment. This will assist doctors in appropriately modifying the treatment regimen. Consequently, it will (hopefully) lead to the prevention of relapses and/or complications.

Authors:Phillip Driscoll, Priyanka Kumar
Title: DoYouTrustAI: A Tool to Teach Students About AI Misinformation and Prompt Engineering
Abstract:
AI, especially Large Language Models (LLMs) like ChatGPT, have rapidly developed and gained widespread adoption in the past five years, shifting user preference from traditional search engines. However, the generative nature of LLMs raises concerns about presenting misinformation as fact. To address this, we developed a web-based application that helps K-12 students enhance critical thinking by identifying misleading information in LLM responses about major historical figures. In this paper, we describe the implementation and design details of the DoYouTrustAI tool, which can be used to provide an interactive lesson which teaches students about the dangers of misinformation and how believable generative AI can make it seem. The DoYouTrustAI tool utilizes prompt engineering to present the user with AI generated summaries about the life of a historical figure. These summaries can be either accurate accounts of that persons life, or an intentionally misleading alteration of their history. The user is tasked with determining the validity of the statement without external resources. Our research questions for this work were:(RQ1) How can we design a tool that teaches students about the dangers of misleading information and of how misinformation can present itself in LLM responses? (RQ2) Can we present prompt engineering as a topic that is easily understandable for students? Our findings highlight the need to correct misleading information before users retain it. Our tool lets users select familiar individuals for testing to reduce random guessing and presents misinformation alongside known facts to maintain believability. It also provides pre-configured prompt instructions to show how different prompts affect AI responses. Together, these features create a controlled environment where users learn the importance of verifying AI responses and understanding prompt engineering.

Authors:Austin Deng-Yao Yang, Shih-Jen Tsai, Hsin-Jung Tsai
Title: Impact of Environmental Colors on Human Aggressiveness: Insights from a Minecraft-Based Behavioral Study
Abstract:
This study explores the influence of environmental colors on human behavior, specifically focusing on aggressiveness and passiveness. Color is widely regarded as an influential environmental factor shaping human behavior, yet existing studies present conflicting evidence regarding its impact on aggressiveness and passiveness. This study employed Minecraft as a controlled digital platform to investigate whether exposure to different colors influences both the frequency and nature of participant interactions (aggressive versus non-aggressive), and whether prolonged exposure amplifies these effects. Anonymous online participants were exposed to various colors before interacting with non-player characters simulating human-like encounters. Three key outcomes were measured: (1) total interactions per color, (2) ratios of aggressive to non-aggressive interactions per color, and (3) the effect of varying exposure durations on aggressiveness. While no significant overall differences in interaction frequency were observed among the colors, post-hoc analyses revealed that Red and Black elicited significantly more interactions compared to Green. Additionally, Red, Yellow, and Black were associated with higher ratios of aggressive behavior relative to Green or White. Prolonged exposure to Red also appeared to intensify aggressive responses. These findings underscore the potential role of environmental color in shaping online social behaviors and highlight the importance of environmental settings in areas ranging from online communication platforms to digital marketing strategies.

Authors:Anca-Simona Horvath, Alina Elena Voinea, Radu Arieşan
Title: Bio-crafting Architecture: Experiences of growing mycelium in minimal surface molds
Abstract:
This study documents a three-week workshop with architecture students, where we designed and 3D printed various minimal surfaces using wood-based filaments, and used them as molds in which to grow mycelium. We detail the design process and the growth of the mycelium in different shapes, together with participants' experiences of working with a living material. After exhibiting the results of the work in a public-facing exhibition, we conducted interviews with members of the general public about their perceptions on interacting with a material such as mycelium in design. Our findings show that 3D-printed minimal surfaces with wood-based filaments can function as structural cores for mycelium-based composites and mycelium binds to the filament. Participants in the workshop exhibited stronger feelings for living materials compared to non-living ones, displaying both biophilia and, to a lesser extent, biophobia when interacting with the mycelium. Members of the general public discuss pragmatic aspects including mold, fragility, or production costs, and speculate on the future of bio-technology and its impact on everyday life. While all are positive about the impact on bio-technologies on the future, they have diverging opinions on how much ethical considerations should influence research directions.

Authors:German Neubaum, Irene-Angelica Chounta, Eva Gredel, David Wiesche
Title: A Pandemic for the Good of Digital Literacy? An Empirical Investigation of Newly Improved Digital Skills during COVID-19 Lockdowns
Abstract:
This research explores whether the rapid digital transformation due to COVID-19 managed to close or exacerbate the digital divide concerning users' digital skills. We conducted a pre-registered survey with N = 1143 German Internet users. Our findings suggest the latter: younger, male, and higher educated users were more likely to improve their digital skills than older, female, and less educated ones. According to their accounts, the pandemic helped Internet users improve their skills in communicating with others by using video conference software and reflecting critically upon information they found online. These improved digital skills exacerbated not only positive (e.g., feeling informed and safe) but also negative (e.g., feeling lonely) effects of digital media use during the pandemic. We discuss this research's theoretical and practical implications regarding the impact of challenges, such as technological disruption and health crises, on humans' digital skills, capabilities, and future potential, focusing on the second-level digital divide.

Authors:Janet Rafner, Ryan Q. Guloy, Eden W. Wen, Catherine M. Chiodo, Jacob Sherson
Title: From Interaction to Collaboration: How Hybrid Intelligence Enhances Chatbot Feedback
Abstract:
Generative AI (GenAI) chatbots are becoming increasingly integrated into virtual assistant technologies, yet their success hinges on the ability to gather meaningful user feedback to improve interaction quality, system outcomes, and overall user acceptance. Successful chatbot interactions can enable organizations to build long-term relationships with their customers and users, supporting customer loyalty and furthering the organization's goals. This study explores the impact of two distinct narratives and feedback collection mechanisms on user engagement and feedback behavior: a standard AI-focused interaction versus a hybrid intelligence (HI) framed interaction. Initial findings indicate that while small-scale survey measures allowed for no significant differences in user willingness to leave feedback, use the system, or trust the system, participants exposed to the HI narrative statistically significantly provided more detailed feedback. These initial findings offer insights into designing effective feedback systems for GenAI virtual assistants, balancing user effort with system improvement potential.

Authors:Sébastien Riou, Didier Schwab, François Bérard
Title: Interactions par franchissement grâce a un système de suivi du regard
Abstract:
Human-computer interactions based on gaze-tracking have spread during the last few years. Video games, applications in health, trading, market research, and many other fields have started to use this new technology that seems invisible to the user. However, the dominant form of interaction using gaze tracking uses dwell-time for command activation, which introduces strong constraints in the interaction: dwell-time activation requires users to look steadily at an element for a predefined amount of time in to select it. While dwell-time alleviates a part of the Midas touch problem (referring to the fact that an element fixed by the user will be activated even if it was not intended to do so), it doesn't completely remove it: users should not gaze too long on an item, or they may trigger an unintended activation. In addition, dwell-time slows down users' interaction by requiring a pause each time an activation is needed. In this project, we study an alternative selection method based on crossing interactions, a well-studied method used in conventional HCI. This interaction allows users' gaze to rest in areas that don't have crossing triggers, and it removes the need to pause in the interaction. We found that crossing interaction had similar performances than dwell-time interaction with novice users. The performance was even better for users having previous experience with gaze interaction.

Authors:Rajanala Purushotham, Rapolu Rahul
Title: SkillTrade A Website For Learning New Skills
Abstract:
The Skill Trade is a site for skill swapping, learning, and career growth. It links people who have matching skills, helps virtual work through Google Meet/Zoom, and lets startups hire talent easily. Users can make profiles, connect with others, share skills, and respond to job ads from startups. Startup users can post jobs and see profiles to hire candidates. Learn-only users get categorized learning materials while developers keep an eye on platform management and upload resources. It is free for individual users, supported by donations, and charges startups a small fee only when they successfully hire. Built with Tailwind CSS, it guarantees to creation of an intuitive, responsive design that fosters collaboration and career opportunities.

Authors:Takuya Sera, Yusuke Hamano
Title: ChatNekoHacker: Real-Time Fan Engagement with Conversational Agents
Abstract:
ChatNekoHacker is a real-time conversational agent system that strengthens fan engagement for musicians. It integrates Amazon Bedrock Agents for autonomous dialogue, Unity for immersive 3D livestream sets, and VOICEVOX for high quality Japanese text-to-speech, enabling two virtual personas to represent the music duo Neko Hacker. In a one-hour YouTube Live with 30 participants, we evaluated the impact of the system. Regression analysis showed that agent interaction significantly elevated fan interest, with perceived fun as the dominant predictor. The participants also expressed a stronger intention to listen to the duo's music and attend future concerts. These findings highlight entertaining, interactive broadcasts as pivotal to cultivating fandom. Our work offers actionable insights for the deployment of conversational agents in entertainment while pointing to next steps: broader response diversity, lower latency, and tighter fact-checking to curb potential misinformation.

Authors:I. Aytutuldu, O. Yol, Y. S. Akgul
Title: Integrating LLMs for Grading and Appeal Resolution in Computer Science Education
Abstract:
This study explores the integration of Large Language Models (LLMs) into the grading and appeal resolution process in computer science education. We introduce AI-PAT, an AI-powered assessment tool that leverages LLMs to evaluate computer science exams, generate feedback, and address student appeals. AI-PAT was used to assess over 850 exam submissions and handle 185 appeal cases. Our multi-model comparison (ChatGPT, Gemini) reveals strong correlations between model outputs, though significant variability persists depending on configuration and prompt design. Human graders, while internally consistent, showed notable inter-rater disagreement, further highlighting subjectivity in manual evaluation. The appeal process led to grade changes in 74% of cases, indicating the need for continued refinement of AI evaluation strategies. While students appreciated the speed and detail of AI feedback, survey responses revealed trust and fairness concerns. We conclude that AI-PAT offers scalable benefits for formative assessment and feedback, but must be accompanied by transparent grading rubrics, human oversight, and appeal mechanisms to ensure equitable outcomes.

Authors:Rui Shang, Bingjie Huang
Title: Design Priorities in Digital Gateways: A Comparative Study of Authentication and Usability in Academic Library Alliances
Abstract:
Purpose: This study examines the design and functionality of university library login pages across academic alliances (IVY Plus, BTAA, JULAC, JVU) to identify how these interfaces align with institutional priorities and user needs. It explores consensus features, design variations, and emerging trends in authentication, usability, and security. Methodology: A multi-method approach was employed: screenshots and HTML files from 46 institutions were analyzed through categorization, statistical analysis, and comparative evaluation. Features were grouped into authentication mechanisms, usability, security/compliance, and library-specific elements. Findings: Core functionalities (e.g., ID/password, privacy policies) were consistent across alliances. Divergences emerged in feature emphasis: mature alliances (e.g., BTAA) prioritized resource accessibility with streamlined interfaces, while emerging consortia (e.g., JVU) emphasized cybersecurity (IP restrictions, third-party integrations). Usability features, particularly multilingual support, drove cross-alliance differences. The results highlighted regional and institutional influences, with older alliances favoring simplicity and newer ones adopting security-centric designs. Originality/Value: This is the first systematic comparison of login page designs across academic alliances, offering insights into how regional, technological, and institutional factors shape digital resource access. Findings inform best practices for balancing security, usability, and accessibility in library interfaces. **Keywords**: Academic library consortia, Login page design, User authentication, User experience, Security compliance.

Authors:Marios Constantinides, Himanshu Verma, Shadan Sadeghian, Abdallah El Ali
Title: The Future of Work is Blended, Not Hybrid
Abstract:
The way we work is no longer hybrid -- it is blended with AI co-workers, automated decisions, and virtual presence reshaping human roles, agency, and expertise. We now work through AI, with our outputs shaped by invisible algorithms. AI's infiltration into knowledge, creative, and service work is not just about automation, but concerns redistribution of agency, creativity, and control. How do we deal with physical and distributed AI-mediated workspaces? What happens when algorithms co-author reports, and draft our creative work? In this provocation, we argue that hybrid work is obsolete. Blended work is the future, not just in physical and virtual spaces but in how human effort and AI output become inseparable. We argue this shift demands urgent attention to AI-mediated work practices, work-life boundaries, physical-digital interactions, and AI transparency and accountability. The question is not whether we accept it, but whether we actively shape it before it shapes us.

Authors:Yusi Sun, Haoyan Guan, leith Kin Yep Chan, Yong Hong Kuo
Title: Object-Driven Narrative in AR: A Scenario-Metaphor Framework with VLM Integration
Abstract:
Most adaptive AR storytelling systems define environmental semantics using simple object labels and spatial coordinates, limiting narratives to rigid, pre-defined logic. This oversimplification overlooks the contextual significance of object relationships-for example, a wedding ring on a nightstand might suggest marital conflict, yet is treated as just "two objects" in space. To address this, we explored integrating Vision Language Models (VLMs) into AR pipelines. However, several challenges emerged: First, stories generated with simple prompt guidance lacked narrative depth and spatial usage. Second, spatial semantics were underutilized, failing to support meaningful storytelling. Third, pre-generated scripts struggled to align with AR Foundation's object naming and coordinate systems. We propose a scene-driven AR storytelling framework that reimagines environments as active narrative agents, built on three innovations: 1. State-aware object semantics: We decompose object meaning into physical, functional, and metaphorical layers, allowing VLMs to distinguish subtle narrative cues between similar objects. 2. Structured narrative interface: A bidirectional JSON layer maps VLM-generated metaphors to AR anchors, maintaining spatial and semantic coherence. 3. STAM evaluation framework: A three-part experimental design evaluates narrative quality, highlighting both strengths and limitations of VLM-AR integration. Our findings show that the system can generate stories from the environment itself, not just place them on top of it. In user studies, 70% of participants reported seeing real-world objects differently when narratives were grounded in environmental symbolism. By merging VLMs' generative creativity with AR's spatial precision, this framework introduces a novel object-driven storytelling paradigm, transforming passive spaces into active narrative landscapes.

Authors:Xi Zheng, Zhuoyang Li, Xinning Gui, Yuhan Luo
Title: Customizing Emotional Support: How Do Individuals Construct and Interact With LLM-Powered Chatbots
Abstract:
Personalized support is essential to fulfill individuals' emotional needs and sustain their mental well-being. Large language models (LLMs), with great customization flexibility, hold promises to enable individuals to create their own emotional support agents. In this work, we developed ChatLab, where users could construct LLM-powered chatbots with additional interaction features including voices and avatars. Using a Research through Design approach, we conducted a week-long field study followed by interviews and design activities (N = 22), which uncovered how participants created diverse chatbot personas for emotional reliance, confronting stressors, connecting to intellectual discourse, reflecting mirrored selves, etc. We found that participants actively enriched the personas they constructed, shaping the dynamics between themselves and the chatbot to foster open and honest conversations. They also suggested other customizable features, such as integrating online activities and adjustable memory settings. Based on these findings, we discuss opportunities for enhancing personalized emotional support through emerging AI technologies.

Authors:S. Shen, Z. Lin, W. Liu, C. Xin, W. Dai, S. Chen, X. Wen, X. Lan
Title: DashChat: Interactive Authoring of Industrial Dashboard Design Prototypes through Conversation with LLM-Powered Agents
Abstract:
Industrial dashboards, commonly deployed by organizations such as enterprises and governments, are increasingly crucial in data communication and decision-making support across various domains. Designing an industrial dashboard prototype is particularly challenging due to its visual complexity, which can include data visualization, layout configuration, embellishments, and animations. Additionally, in real-world industrial settings, designers often encounter numerous constraints. For instance, when companies negotiate collaborations with clients and determine design plans, they typically need to demo design prototypes and iterate on them based on mock data quickly. Such a task is very common and crucial during the ideation stage, as it not only helps save developmental costs but also avoids data-related issues such as lengthy data handover periods. However, existing authoring tools of dashboards are mostly not tailored to such prototyping needs, and motivated by these gaps, we propose DashChat, an interactive system that leverages large language models (LLMs) to generate industrial dashboard design prototypes from natural language. We collaborated closely with designers from the industry and derived the requirements based on their practical experience. First, by analyzing 114 high-quality industrial dashboards, we summarized their common design patterns and inject the identified ones into LLMs as reference. Next, we built a multi-agent pipeline powered by LLMs to understand textual requirements from users and generate practical, aesthetic prototypes. Besides, functionally distinct, parallel-operating agents are created to enable efficient generation. Then, we developed a user-friendly interface that supports text-based interaction for generating and modifying prototypes. Two user studies demonstrated that our system is both effective and efficient in supporting design prototyping.

Authors:Simon W. S. Fischer, Hanna Schraffenberger, Serge Thill, Pim Haselager
Title: A Taxonomy of Questions for Critical Reflection in Machine-Assisted Decision-Making
Abstract:
Decision-makers run the risk of relying too much on machine recommendations, which is associated with lower cognitive engagement. Reflection has been shown to increase cognitive engagement and improve critical thinking and therefore decision-making. Questions are a means to stimulate reflection, but there is a research gap regarding the systematic creation and use of relevant questions for machine-assisted decision-making. We therefore present a taxonomy of questions aimed at promoting reflection and cognitive engagement in order to stimulate a deliberate decision-making process. Our taxonomy builds on the Socratic questioning method and a question bank for explainable AI. As a starting point, we focus on clinical decision-making. Brief discussions with two medical and three educational researchers provide feedback on the relevance and expected benefits of our taxonomy. Our work contributes to research on mitigating overreliance in human-AI interactions and aims to support effective human oversight as required by the European AI Act.

Authors:Shravan Chaudhari, Trilokya Akula, Yoon Kim, Tom Blake
Title: Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis
Abstract:
In this paper, we advance the study of AI-augmented reasoning in the context of Human-Computer Interaction (HCI), psychology and cognitive science, focusing on the critical task of visual perception. Specifically, we investigate the applicability of Multimodal Large Language Models (MLLMs) in this domain. To this end, we leverage established principles and explanations from psychology and cognitive science related to complexity in human visual perception. We use them as guiding principles for the MLLMs to compare and interprete visual content. Our study aims to benchmark MLLMs across various explainability principles relevant to visual perception. Unlike recent approaches that primarily employ advanced deep learning models to predict complexity metrics from visual content, our work does not seek to develop a mere new predictive model. Instead, we propose a novel annotation-free analytical framework to assess utility of MLLMs as cognitive assistants for HCI tasks, using visual perception as a case study. The primary goal is to pave the way for principled study in quantifying and evaluating the interpretability of MLLMs for applications in improving human reasoning capability and uncovering biases in existing perception datasets annotated by humans.

Authors:Chance Castañeda, Jessica Mindel, Will Page, Hayden Stec, Manqing Yu, Kenneth Holstein
Title: Supporting AI-Augmented Meta-Decision Making with InDecision
Abstract:
From school admissions to hiring and investment decisions, the first step behind many high-stakes decision-making processes is "deciding how to decide." Formulating effective criteria to guide decision-making requires an iterative process of exploration, reflection, and discovery. Yet, this process remains under-supported in practice. In this short paper, we outline an opportunity space for AI-driven tools that augment human meta-decision making. We draw upon prior literature to propose a set of design goals for future AI tools aimed at supporting human meta-decision making. We then illustrate these ideas through InDecision, a mixed-initiative tool designed to support the iterative development of decision criteria. Based on initial findings from designing and piloting InDecision with users, we discuss future directions for AI-augmented meta-decision making.

Authors:Andrés Isaza-Giraldo, Paulo Bala, Lucas Pereira
Title: Meta-Evaluating Local LLMs: Rethinking Performance Metrics for Serious Games
Abstract:
The evaluation of open-ended responses in serious games presents a unique challenge, as correctness is often subjective. Large Language Models (LLMs) are increasingly being explored as evaluators in such contexts, yet their accuracy and consistency remain uncertain, particularly for smaller models intended for local execution. This study investigates the reliability of five small-scale LLMs when assessing player responses in \textit{En-join}, a game that simulates decision-making within energy communities. By leveraging traditional binary classification metrics (including accuracy, true positive rate, and true negative rate), we systematically compare these models across different evaluation scenarios. Our results highlight the strengths and limitations of each model, revealing trade-offs between sensitivity, specificity, and overall performance. We demonstrate that while some models excel at identifying correct responses, others struggle with false positives or inconsistent evaluations. The findings highlight the need for context-aware evaluation frameworks and careful model selection when deploying LLMs as evaluators. This work contributes to the broader discourse on the trustworthiness of AI-driven assessment tools, offering insights into how different LLM architectures handle subjective evaluation tasks.

Authors:Elisabeth Mayer, Thomas Odaker, Dieter Kranzlmüller
Title: No Fuss, Just Function -- A Proposal for Non-Intrusive Full Body Tracking in XR for Meaningful Spatial Interactions
Abstract:
Extended Reality (XR) is a rapidly growing field with a wide range of hardware from head mounted displays to installations. Users have the possibility to access the entire Mixed Reality (MR) continuum. Goal of the human-computer-interaction (HCI) community is to allow natural and intuitive interactions but in general interactions for XR often rely on handheld controllers. One natural interaction method is full body tracking (FBT), where a user can use their body to interact with the experience. Classically, FBT systems require markers or trackers on the users to capture motion. Recently, there have been approaches based on Human Pose Estimation (HPE), which highlight the potential of low-cost non-intrusive FBT for XR. Due to the lack of handheld devices, HPE may also improve accessibility with people struggling with traditional input methods. This paper proposes the concept of non-intrusive FBT for XR for all. The goal is to spark a discussion on advantages for users by using a non-intrusive FBT system for accessibility and user experience.

Authors:Felicitas Macgilchrist, Juliane Jarke
Title: Who Said Only Military Officers Can Deal with Uncertainty? On the Importance of Uncertainty in EdTech Data Visualisations
Abstract:
AI-powered predictive systems have high margins of error. However, data visualisations of algorithmic systems in education and other social fields tend to visualise certainty, thus invisibilising the underlying approximations and uncertainties of the algorithmic systems and the social settings in which these systems operate. This paper draws on a critical speculative approach to first analyse data visualisations from predictive analytics platforms for education. It demonstrates that visualisations of uncertainty in education are rare. Second, the paper explores uncertainty visualisations in other fields (defence, climate change and healthcare). The paper concludes by reflecting on the role of data visualisations and un/certainty in shaping educational futures. It also identifies practical implications for the design of data visualisations in education.

Authors:Mark Edison Jim, Jan Benjamin Yap, Gian Chill Laolao, Andrei Zachary Lim, Jordan Aiko Deja
Title: Speak with Confidence: Designing an Augmented Reality Training Tool for Public Speaking
Abstract:
Public speaking anxiety affects many individuals, yet opportunities for real-world practice remain limited. This study explores how augmented reality (AR) can provide an accessible training environment for public speaking. Drawing from literature on public speaking, VR-based training, self-efficacy, and behavioral feedback mechanisms, we designed SpeakAR, an AR-based tool that simulates audience interaction through virtual models. SpeakAR was evaluated with five participants of varying anxiety levels, each completing six speaking tasks. Results indicate that AR exposure can enhance confidence, with participants finding the system useful for practice. Feedback highlighted the importance of dynamic facial expressions and idle animations in virtual models to improve realism and engagement. Our findings contribute to the design of AR-based training tools for public speaking, offering insights into how immersive environments can support skill development and anxiety reduction.

Authors:Tomasz M. Rutkowski, Stanisław Narębski, Mihoko Otake-Matsuura, Tomasz Komendziński
Title: Early Detection of Cognitive Impairment in Elderly using a Passive FPVS-EEG BCI and Machine Learning -- Extended Version
Abstract:
Early dementia diagnosis requires biomarkers sensitive to both structural and functional brain changes. While structural neuroimaging biomarkers have progressed significantly, objective functional biomarkers of early cognitive decline remain a critical unmet need. Current cognitive assessments often rely on behavioral responses, making them susceptible to factors like effort, practice effects, and educational background, thereby hindering early and accurate detection. This work introduces a novel approach, leveraging a lightweight convolutional neural network (CNN) to infer cognitive impairment levels directly from electroencephalography (EEG) data. Critically, this method employs a passive fast periodic visual stimulation (FPVS) paradigm, eliminating the need for explicit behavioral responses or task comprehension from the participant. This passive approach provides an objective measure of working memory function, independent of confounding factors inherent in active cognitive tasks, and offers a promising new avenue for early and unbiased detection of cognitive decline.

Authors:E. C. Overes, F. M. Santoro
Title: The Effectiveness of Business Process Visualisations: a Systematic Literature Review
Abstract:
Business Process Visualisations (BPVs) have become indispensable tools for organisations seeking to enhance their operational efficiency, decision-making capabilities, and overall performance. The burgeoning interest in process modeling and tool development, coupled with the rise of data visualisation field, underscores the significant role of visual tools in leveraging human cognition. Unlike traditional models, data visualisation approaches graphics from a novel angle, emphasising the potency of visual representations. This review aims to integrate the domains of BPV and data visualisation to assess their combined influence on organisational effectiveness comprehensively. Through a meticulous analysis of existing literature, this study aims to amalgamate insights on BPVs impact from a data visualisation standpoint, advocating for a design philosophy that prioritises user engagement to bolster organisational outcomes. Additionally, our systematic review has unveiled promising avenues for future research, identifying underexplored variables that influence the efficacy of BPVs, thereby charting a path for forthcoming scholarly inquiries.

Authors:Hirofumi Shibata, Ayako Yogo, Naoto Nishida, Yu Shimada, Toma Ishii
Title: Ichiyo: Fragile and Transient Interaction in Neighborhood
Abstract:
As the Internet develops, social networking and other communication tools have transformed people's relationships into something fast, visible, and geographically huge. However, these communication tools have not expanded opportunities for acquainting oneself with neighbors outside one's social network; rather, they have comparatively diminished occasions for interacting with unfamiliar neighbors by prioritizing communication with existing friends. Therefore, we invented the medium Ichiyo to increase the opportunities to think of neighbors walking along the same street or in the same neighborhood and to expand the imagination of those who pass by and those who used to be there. Thus, users can engage in indirect interaction. We used commercially available laser cutters to engrave QR codes on leaves that are naturally found in our living space to prevent environmental invasion. The QR codes lead to a communal space on the web where users can freely leave messages. By engraving QR codes, information can be virtually expanded to be presented. To get the feedback of Ichiyo, we let a total of several thousand people experience a new way of communication as a part of the exhibition ''iii Exhibition 2022'', an art exhibition at the University of Tokyo. A total of more than 1,000 leaves engraved with QR codes were prepared and scattered at the exhibition site and along the road from the nearest station to the venue.

Authors:Sina Elahimanesh, Mohammadali Mohammadkhani, Shohreh Kasaei
Title: Emotion Alignment: Discovering the Gap Between Social Media and Real-World Sentiments in Persian Tweets and Images
Abstract:
In contemporary society, widespread social media usage is evident in people's daily lives. Nevertheless, disparities in emotional expressions between the real world and online platforms can manifest. We comprehensively analyzed Persian community on X to explore this phenomenon. An innovative pipeline was designed to measure the similarity between emotions in the real world compared to social media. Accordingly, recent tweets and images of participants were gathered and analyzed using Transformers-based text and image sentiment analysis modules. Each participant's friends also provided insights into the their real-world emotions. A distance criterion was used to compare real-world feelings with virtual experiences. Our study encompassed N=105 participants, 393 friends who contributed their perspectives, over 8,300 collected tweets, and 2,000 media images. Results indicated a 28.67% similarity between images and real-world emotions, while tweets exhibited a 75.88% alignment with real-world feelings. Additionally, the statistical significance confirmed that the observed disparities in sentiment proportions.

Authors:Éva Székely, Jūra Miniota, Míša, Hejná
Title: Will AI shape the way we speak? The emerging sociolinguistic influence of synthetic voices
Abstract:
The growing prevalence of conversational voice interfaces, powered by developments in both speech and language technologies, raises important questions about their influence on human communication. While written communication can signal identity through lexical and stylistic choices, voice-based interactions inherently amplify socioindexical elements - such as accent, intonation, and speech style - which more prominently convey social identity and group affiliation. There is evidence that even passive media such as television is likely to influence the audience's linguistic patterns. Unlike passive media, conversational AI is interactive, creating a more immersive and reciprocal dynamic that holds a greater potential to impact how individuals speak in everyday interactions. Such heightened influence can be expected to arise from phenomena such as acoustic-prosodic entrainment and linguistic accommodation, which occur naturally during interaction and enable users to adapt their speech patterns in response to the system. While this phenomenon is still emerging, its potential societal impact could provide organisations, movements, and brands with a subtle yet powerful avenue for shaping and controlling public perception and social identity. We argue that the socioindexical influence of AI-generated speech warrants attention and should become a focus of interdisciplinary research, leveraging new and existing methodologies and technologies to better understand its implications.

Authors:Sunyi Liu, Mengzhe Geng, Rebecca Hart
Title: Exploring Generative AI Techniques in Government: A Case Study
Abstract:
The swift progress of Generative Artificial intelligence (GenAI), notably Large Language Models (LLMs), is reshaping the digital landscape. Recognizing this transformative potential, the National Research Council of Canada (NRC) launched a pilot initiative to explore the integration of GenAI techniques into its daily operation for performance excellence, where 22 projects were launched in May 2024. Within these projects, this paper presents the development of the intelligent agent Pubbie as a case study, targeting the automation of performance measurement, data management and insight reporting at the NRC. Cutting-edge techniques are explored, including LLM orchestration and semantic embedding via RoBERTa, while strategic fine-tuning and few-shot learning approaches are incorporated to infuse domain knowledge at an affordable cost. The user-friendly interface of Pubbie allows general government users to input queries in natural language and easily upload or download files with a simple button click, greatly reducing manual efforts and accessibility barriers.

Authors:Diogo Sousa, Guilherme Barbosa, Catarina Rocha, Dulce Oliveira
Title: Performance of Large Language Models in Supporting Medical Diagnosis and Treatment
Abstract:
The integration of Large Language Models (LLMs) into healthcare holds significant potential to enhance diagnostic accuracy and support medical treatment planning. These AI-driven systems can analyze vast datasets, assisting clinicians in identifying diseases, recommending treatments, and predicting patient outcomes. This study evaluates the performance of a range of contemporary LLMs, including both open-source and closed-source models, on the 2024 Portuguese National Exam for medical specialty access (PNA), a standardized medical knowledge assessment. Our results highlight considerable variation in accuracy and cost-effectiveness, with several models demonstrating performance exceeding human benchmarks for medical students on this specific task. We identify leading models based on a combined score of accuracy and cost, discuss the implications of reasoning methodologies like Chain-of-Thought, and underscore the potential for LLMs to function as valuable complementary tools aiding medical professionals in complex clinical decision-making.

Authors:Parth Maradia, Ayushi Agarwal, Srija Bhupathiraju, Kavita Vemuri
Title: Framing Perception: Exploring Camera Induced Objectification in Cinema
Abstract:
This study investigates how cinematographic techniques influence viewer perception and contribute to the objectification of women, utilizing eye-tracking data from 91 participants. They watched a sexualized music video (SV) known for objectifying portrayals and a non-sexualized music video (TV). Using dynamic Areas of Interests (AOIs) (head, torso, and lower body), gaze metrics such as fixation duration, visit count, and scan paths were recorded to assess visual attention patterns. Participants were grouped according to their average fixations on sexualized AOIs. Statistical analyses revealed significant differences in gaze behavior between the videos and among the groups, with increased attention to sexualized AOIs in SV. Additionally, data-driven group differences in fixations identified specific segments with heightened objectification that are further analyzed using scan path visualization techniques. These findings provide strong empirical evidence of camera-driven gaze objectification, demonstrating how cinematic framing implicitly shapes objectifying gaze patterns, highlighting the critical need for mindful media representation.

Authors:Aryan Shrivastava, Paula Akemi Aoyagui
Title: DICE: A Framework for Dimensional and Contextual Evaluation of Language Models
Abstract:
Language models (LMs) are increasingly being integrated into a wide range of applications, yet the modern evaluation paradigm does not sufficiently reflect how they are actually being used. Current evaluations rely on benchmarks that often lack direct applicability to the real-world contexts in which LMs are being deployed. To address this gap, we propose Dimensional and Contextual Evaluation (DICE), an approach that evaluates LMs on granular, context-dependent dimensions. In this position paper, we begin by examining the insufficiency of existing LM benchmarks, highlighting their limited applicability to real-world use cases. Next, we propose a set of granular evaluation parameters that capture dimensions of LM behavior that are more meaningful to stakeholders across a variety of application domains. Specifically, we introduce the concept of context-agnostic parameters - such as robustness, coherence, and epistemic honesty - and context-specific parameters that must be tailored to the specific contextual constraints and demands of stakeholders choosing to deploy LMs into a particular setting. We then discuss potential approaches to operationalize this evaluation framework, finishing with the opportunities and challenges DICE presents to the LM evaluation landscape. Ultimately, this work serves as a practical and approachable starting point for context-specific and stakeholder-relevant evaluation of LMs.

Authors:Jannis Strecker, Luka Bekavac, Kenan Bektaş, Simon Mayer
Title: Change Your Perspective, Widen Your Worldview! Societally Beneficial Perceptual Filter Bubbles in Personalized Reality
Abstract:
Extended Reality (XR) technologies enable the personalized mediation of an individual's perceivable reality across modalities, thereby creating a Personalized Reality (PR). While this may lead to individually beneficial effects in the form of more efficient, more fun, and safer experiences, it may also lead to perceptual filter bubbles since individuals are exposed predominantly or exclusively to content that is congruent with their existing beliefs and opinions. This undermining of a shared basis for interaction and discussion through constrained perceptual worldviews may impact society through increased polarization and other well-documented negative effects of filter bubbles. In this paper, we argue that this issue can be mitigated by increasing individuals' awareness of their current perspective and providing avenues for development, including through support for engineered serendipity and fostering of self-actualization that already show promise for traditional recommender systems. We discuss how these methods may be transferred to XR to yield valuable tools to give people transparency and agency over their perceptual worldviews in a responsible manner.

Authors:I. Rodriguez, A. Puig
Title: Leveraging Metaphors in a VR Serious Game for Computational Thinking
Abstract:
This paper presents Cooking Code, a VR-based serious game designed to introduce programming concepts to students (ages 12-16) through an immersive, scenario-driven experience. Set in a futuristic world where humans and machines coexist, players take on the role of a fast-food chef who must assemble food orders based on pseudocode instructions. By interpreting and executing these instructions correctly, players develop problem-solving skills, computational thinking, and a foundational understanding of programming logic. The game leverages the kitchen metaphor to teach computational thinking, using affordances for an immersive VR experience.

Authors:Martin Kocur, Niels Henze
Title: Investigating Environments' and Avatars' Effects on Thermal Perception in Virtual Reality to Reduce Energy Consumption
Abstract:
Understanding thermal regulation and subjective perception of temperature is crucial for improving thermal comfort and human energy consumption in times of global warming. Previous work shows that an environment's color temperature affects the experienced temperature. As virtual reality (VR) enables visual immersion, recent work suggests that a VR scene's color temperature also affects experienced temperature. In addition, virtual avatars representing thermal cues influence users' thermal perception and even the body temperature. As immersive technology becomes increasingly prevalent in daily life, leveraging thermal cues to enhance thermal comfort - without relying on actual thermal energy - presents a promising opportunity. Understanding these effects is crucial for optimizing virtual experiences and promoting sustainable energy practices. Therefore, we propose three controlled experiments to learn more about thermal effects caused by virtual worlds and avatars.

Authors:Anneliese Kelterer, Barbara Schuppler
Title: Turn-taking annotation for quantitative and qualitative analyses of conversation
Abstract:
This paper has two goals. First, we present the turn-taking annotation layers created for 95 minutes of conversational speech of the Graz Corpus of Read and Spontaneous Speech (GRASS), available to the scientific community. Second, we describe the annotation system and the annotation process in more detail, so other researchers may use it for their own conversational data. The annotation system was developed with an interdisciplinary application in mind. It should be based on sequential criteria according to Conversation Analysis, suitable for subsequent phonetic analysis, thus time-aligned annotations were made Praat, and it should be suitable for automatic classification, which required the continuous annotation of speech and a label inventory that is not too large and results in a high inter-rater agreement. Turn-taking was annotated on two layers, Inter-Pausal Units (IPU) and points of potential completion (PCOMP; similar to transition relevance places). We provide a detailed description of the annotation process and of segmentation and labelling criteria. A detailed analysis of inter-rater agreement and common confusions shows that agreement for IPU annotation is near-perfect, that agreement for PCOMP annotations is substantial, and that disagreements often are either partial or can be explained by a different analysis of a sequence which also has merit. The annotation system can be applied to a variety of conversational data for linguistic studies and technological applications, and we hope that the annotations, as well as the annotation system will contribute to a stronger cross-fertilization between these disciplines.

Authors:Naoto Nishida, Hinako Nozaki, Buntarou Shizuki
Title: Laugh at Your Own Pace: Basic Performance Evaluation of Language Learning Assistance by Adjustment of Video Playback Speeds Based on Laughter Detection
Abstract:
Among various methods to learn a second language (L2), such as listening and shadowing, Extensive Viewing involves learning L2 by watching many videos. However, it is difficult for many L2 learners to smoothly and effortlessly comprehend video contents made for native speakers at the original speed. Therefore, we developed a language learning assistance system that automatically adjusts the playback speed according to the learner's comprehension. Our system judges that learners understand the contents if they laugh at the punchlines of comedy dramas, and vice versa. Experimental results show that this system supports learners with relatively low L2 ability (under 700 in TOEIC Score in the experimental condition) to understand video contents. Our system can widen learners' possible options of native speakers' videos as Extensive Viewing material.

Authors:Xia Chen, Xinyue Chen, Weixian Hu, Haojia Zheng, YuJun Qian, Zhenhui Peng
Title: Redesign of Online Design Communities: Facilitating Personalized Visual Design Learning with Structured Comments
Abstract:
Online Design Communities (ODCs) offer various artworks with members' comments for beginners to learn visual design. However, as identified by our Formative Study (N = 10), current ODCs lack features customized for personal learning purposes, e.g., searching artworks and digesting useful comments to learn design principles about buttons. In this paper, we present DesignLearner, a redesigned interface of ODCs to facilitate personalized visual design learning with comments structured based on UI components (e.g., button, text) and visual elements (e.g., color, contrast). In DesignLearner, learners can specify the UI components and visual elements that they wish to learn to filter artworks and associated comments. They can interactively read comments on an artwork, take notes, and get suggestions for the next artworks to explore. Our between-subjects study (N = 24) indicates that compared to a traditional ODC interface, DesignLearner can improve the user learning outcome and is deemed significantly more useful. We conclude with design considerations for customizing the interface of online communities to satisfy users' learning needs.

Authors:Lei Mao, Jong Ho Lee, Yasmeen Faroqi Shah, Stephanie Valencia
Title: Design Probes for AI-Driven AAC: Addressing Complex Communication Needs in Aphasia
Abstract:
AI offers key advantages such as instant generation, multi-modal support, and personalized adaptability - potential that can address the highly heterogeneous communication barriers faced by people with aphasia (PWAs). We designed AI-enhanced communication tools and used them as design probes to explore how AI's real-time processing and generation capabilities - across text, image, and audio - can align with PWAs' needs in real-time communication and preparation for future conversations respectively. Through a two-phase "Research through Design" approach, eleven PWAs contributed design insights and evaluated four AI-enhanced prototypes. These prototypes aimed to improve communication grounding and conversational agency through visual verification, grammar construction support, error correction, and reduced language processing load. Despite some challenges, such as occasional mismatches with user intent, findings demonstrate how AI's specific capabilities can be advantageous in addressing PWAs' complex needs. Our work contributes design insights for future Augmentative and Alternative Communication (AAC) systems.

Authors:Zhang Qing, Rekimoto Jun
Title: Look and Talk: Seamless AI Assistant Interaction with Gaze-Triggered Activation
Abstract:
Engaging with AI assistants to gather essential information in a timely manner is becoming increasingly common. Traditional activation methods, like wake words such as Hey Siri, Ok Google, and Hey Alexa, are constrained by technical challenges such as false activations, recognition errors, and discomfort in public settings. Similarly, activating AI systems via physical buttons imposes strict interactive limitations as it demands particular physical actions, which hinders fluid and spontaneous communication with AI. Our approach employs eye-tracking technology within AR glasses to discern a user's intention to engage with the AI assistant. By sustaining eye contact on a virtual AI avatar for a specific time, users can initiate an interaction silently and without using their hands. Preliminary user feedback suggests that this technique is relatively intuitive, natural, and less obtrusive, highlighting its potential for integrating AI assistants fluidly into everyday interactions.

Authors:Gaurav Jain, Leah Findlater, Cole Gleason
Title: SceneScout: Towards AI Agent-driven Access to Street View Imagery for Blind Users
Abstract:
People who are blind or have low vision (BLV) may hesitate to travel independently in unfamiliar environments due to uncertainty about the physical landscape. While most tools focus on in-situ navigation, those exploring pre-travel assistance typically provide only landmarks and turn-by-turn instructions, lacking detailed visual context. Street view imagery, which contains rich visual information and has the potential to reveal numerous environmental details, remains inaccessible to BLV people. In this work, we introduce SceneScout, a multimodal large language model (MLLM)-driven AI agent that enables accessible interactions with street view imagery. SceneScout supports two modes: (1) Route Preview, enabling users to familiarize themselves with visual details along a route, and (2) Virtual Exploration, enabling free movement within street view imagery. Our user study (N=10) demonstrates that SceneScout helps BLV users uncover visual information otherwise unavailable through existing means. A technical evaluation shows that most descriptions are accurate (72%) and describe stable visual elements (95%) even in older imagery, though occasional subtle and plausible errors make them difficult to verify without sight. We discuss future opportunities and challenges of using street view imagery to enhance navigation experiences.

Authors:Florian Bemmann, Doruntina Murtezaj
Title: Rethinking News and Media System Design Towards Positive Societal Implications
Abstract:
Since this century, the speed, availability, and plethora of online informational content have made it increasingly difficult for humans to keep an overview of real-world situations, build a personal opinion, and sometimes even decide on the truth. Thereby, personal opinion-making and public discourse became harder - two essential building blocks that keep a democratic society alive. HCI thus needs to rethink news, information, and social media systems to mitigate such negative effects. Instead of polarising through emotional and extremely framed messages, informational content online should make people think about other opinions and discuss constructively. Instead, through polarization and filter bubble effects, people lose openness and tolerance for the existence of opposing opinions. In this workshop, we will discuss how we can redesign our information technology for a better societal impact. We will present key takeaways from the social sciences and discuss how we can implement them using recent HCI findings and digital technologies.

Authors:Shih Ying-Lei, Dongxu Tang, Weiming Hu, Sang Ho Yoon, Yitian Shao
Title: VibWalk: Mapping Lower-limb Haptic Experiences of Everyday Walking
Abstract:
Walking is among the most common human activities where the feet can gather rich tactile information from the ground. The dynamic contact between the feet and the ground generates vibration signals that can be sensed by the foot skin. While existing research focuses on foot pressure sensing and lower-limb interactions, methods of decoding tactile information from foot vibrations remain underexplored. Here, we propose a foot-equipped wearable system capable of recording wideband vibration signals during walking activities. By enabling location-based recording, our system generates maps of haptic data that encode information on ground materials, lower-limb activities, and road conditions. Its efficacy was demonstrated through studies involving 31 users walking over 18 different ground textures, achieving an overall identification accuracy exceeding 95\% (cross-user accuracy of 87\%). Our system allows pedestrians to map haptic information through their daily walking activities, which has potential applications in creating digitalized walking experiences and monitoring road conditions.

Authors:Luna Xingyu Li, Ray-yuan Chung, Feng Chen, Wenyu Zeng, Yein Jeon, Oleg Zaslavsky
Title: Learning from Elders: Making an LLM-powered Chatbot for Retirement Communities more Accessible through User-centered Design
Abstract:
Low technology and eHealth literacy among older adults in retirement communities hinder engagement with digital tools. To address this, we designed an LLM-powered chatbot prototype using a human-centered approach for a local retirement community. Through interviews and persona development, we prioritized accessibility and dual functionality: simplifying internal information retrieval and improving technology and eHealth literacy. A pilot trial with residents demonstrated high satisfaction and ease of use, but also identified areas for further improvement. Based on the feedback, we refined the chatbot using GPT-3.5 Turbo and Streamlit. The chatbot employs tailored prompt engineering to deliver concise responses. Accessible features like adjustable font size, interface theme and personalized follow-up responses were implemented. Future steps include enabling voice-to-text function and longitudinal intervention studies. Together, our results highlight the potential of LLM-driven chatbots to empower older adults through accessible, personalized interactions, bridging literacy gaps in retirement communities.

Authors:Eugene Wu, Xiang Yu Tuang, Antonio Li, Vareesh Bainwala
Title: A Formalism and Library for Database Visualization
Abstract:
Existing data visualization formalisms are restricted to single-table inputs, which makes existing visualization grammars like Vega-lite or ggplot2 tedious to use, have overly complex APIs, and unsound when visualization multi-table data. This paper presents the first visualization formalism to support databases as input -- in other words, *database visualization*. A visualization specification is defined as a mapping from database constraints (e.g., schemas, types, foreign keys) to visual representations of those constraints, and we state that a visualization is *faithful* if it visually preserves the underlying database constraints. This formalism explains how visualization designs are the result of implicit data modeling decisions. We further develop a javascript library called dvl and use a series of case studies to show its expressiveness over specialized visualization systems and existing grammar-based languages.

Authors:Natalia Sikora, Robert L. Manschke, Alethea M. Tang, Peter Dunstan, Dean A. Harris, Su Yang
Title: ColonScopeX: Leveraging Explainable Expert Systems with Multimodal Data for Improved Early Diagnosis of Colorectal Cancer
Abstract:
Colorectal cancer (CRC) ranks as the second leading cause of cancer-related deaths and the third most prevalent malignant tumour worldwide. Early detection of CRC remains problematic due to its non-specific and often embarrassing symptoms, which patients frequently overlook or hesitate to report to clinicians. Crucially, the stage at which CRC is diagnosed significantly impacts survivability, with a survival rate of 80-95\% for Stage I and a stark decline to 10\% for Stage IV. Unfortunately, in the UK, only 14.4\% of cases are diagnosed at the earliest stage (Stage I). In this study, we propose ColonScopeX, a machine learning framework utilizing explainable AI (XAI) methodologies to enhance the early detection of CRC and pre-cancerous lesions. Our approach employs a multimodal model that integrates signals from blood sample measurements, processed using the Savitzky-Golay algorithm for fingerprint smoothing, alongside comprehensive patient metadata, including medication history, comorbidities, age, weight, and BMI. By leveraging XAI techniques, we aim to render the model's decision-making process transparent and interpretable, thereby fostering greater trust and understanding in its predictions. The proposed framework could be utilised as a triage tool or a screening tool of the general population. This research highlights the potential of combining diverse patient data sources and explainable machine learning to tackle critical challenges in medical diagnostics.

Authors:Pepita Barnard, Maria J Galvez Trigo, Dominic Price, Sue Cobb, Gisela Reyes-Cruz, Gustavo Berumen, David Branson, Mojtaba A. Khanesar, Mercedes Torres Torres, Michel Valstar
Title: Human strategies for correcting `human-robot' errors during a laundry sorting task
Abstract:
Mental models and expectations underlying human-human interaction (HHI) inform human-robot interaction (HRI) with domestic robots. To ease collaborative home tasks by improving domestic robot speech and behaviours for human-robot communication, we designed a study to understand how people communicated when failure occurs. To identify patterns of natural communication, particularly in response to robotic failures, participants instructed Laundrobot to move laundry into baskets using natural language and gestures. Laundrobot either worked error-free, or in one of two error modes. Participants were not advised Laundrobot would be a human actor, nor given information about error modes. Video analysis from 42 participants found speech patterns, included laughter, verbal expressions, and filler words, such as ``oh'' and ``ok'', also, sequences of body movements, including touching one's own face, increased pointing with a static finger, and expressions of surprise. Common strategies deployed when errors occurred, included correcting and teaching, taking responsibility, and displays of frustration. The strength of reaction to errors diminished with exposure, possibly indicating acceptance or resignation. Some used strategies similar to those used to communicate with other technologies, such as smart assistants. An anthropomorphic robot may not be ideally suited to this kind of task. Laundrobot's appearance, morphology, voice, capabilities, and recovery strategies may have impacted how it was perceived. Some participants indicated Laundrobot's actual skills were not aligned with expectations; this made it difficult to know what to expect and how much Laundrobot understood. Expertise, personality, and cultural differences may affect responses, however these were not assessed.

Authors:Shiyi Ding, Ying Chen
Title: RAG-VR: Leveraging Retrieval-Augmented Generation for 3D Question Answering in VR Environments
Abstract:
Recent advances in large language models (LLMs) provide new opportunities for context understanding in virtual reality (VR). However, VR contexts are often highly localized and personalized, limiting the effectiveness of general-purpose LLMs. To address this challenge, we present RAG-VR, the first 3D question-answering system for VR that incorporates retrieval-augmented generation (RAG), which augments an LLM with external knowledge retrieved from a localized knowledge database to improve the answer quality. RAG-VR includes a pipeline for extracting comprehensive knowledge about virtual environments and user conditions for accurate answer generation. To ensure efficient retrieval, RAG-VR offloads the retrieval process to a nearby edge server and uses only essential information during retrieval. Moreover, we train the retriever to effectively distinguish among relevant, irrelevant, and hard-to-differentiate information in relation to questions. RAG-VR improves answer accuracy by 17.9%-41.8% and reduces end-to-end latency by 34.5%-47.3% compared with two baseline systems.

Authors:Soumita Mukherjee, Varun Darshana Parekh, Nikhil Tayal
Title: Certified to Drive: A Policy Proposal for Mandatory Training on Semi-Automated Vehicles
Abstract:
Although the Boeing 737 Max incidents resulted from a mix of design shortcomings, regulatory oversights, and systemic issues, they also highlight a critical gap in pilot training on managing automated systems during abnormal conditions. This example demonstrates the urgent need for focused, concise training on human-automation interaction - a need that is equally critical for operators of Level 2 ADAS-equipped vehicles, as discussed in detail later in this article. The lack of structured education for semi-automated vehicle operators mirrors similar risks in other industries, where formal training is critical for safe operation. Two policy recommendations are proposed. First, governments should create concise, official resources in accessible and official format to educate drivers on system capabilities and limitations. Second, mandatory training and certification programs should be introduced, combining theoretical and hands-on components to prepare drivers for real-world scenarios. These measures will improve driver understanding, reduce misuse, and foster public trust in semi-automated vehicle technologies. By addressing the knowledge gap, policymakers can ensure a safer, more responsible transition to automation, maximizing its benefits while minimizing risks to public safety.

Authors:Silvio Picinini, Sheila Castilho
Title: Context-Aware Monolingual Human Evaluation of Machine Translation
Abstract:
This paper explores the potential of context-aware monolingual human evaluation for assessing machine translation (MT) when no source is given for reference. To this end, we compare monolingual with bilingual evaluations (with source text), under two scenarios: the evaluation of a single MT system, and the comparative evaluation of pairwise MT systems. Four professional translators performed both monolingual and bilingual evaluations by assigning ratings and annotating errors, and providing feedback on their experience. Our findings suggest that context-aware monolingual human evaluation achieves comparable outcomes to human bilingual evaluations, and suggest the feasibility and potential of monolingual evaluation as an efficient approach to assessing MT.

Authors:Elizabeth Childs, Samir Ghosh, Sebastian Cmentowski, Andrea Cuadra, Rabindra Ratan
Title: Proceedings of the Purposeful XR Workshop for CHI 2025
Abstract:
This volume represents the proceedings of Workshop 27 on Purposeful XR: Affordances, Challenges, and Speculations for an Ethical Future, held together with the CHI conference on Human Factors in Computing Systems on MY 26th, 2025 in Yokohama, Japan.

Authors:Wenge Xu, Craig Anderton, Kurtis Weir, Arthur Theil
Title: Conducting VR User Studies with People with Vision/Hearing Impairments: Challenges and Mitigation Strategies
Abstract:
There is a lack of virtual reality (VR) user studies that have been conducted involving people with vision/hearing impairments. This is due to the difficulty of recruiting participants and the accessibility barriers of VR devices. Based on the authors' experience conducting VR user studies with participants with vision/hearing impairments, this position paper identifies 5 key challenges (1. Recruitment, 2. Language Familiarity, 3. Technology Limitations and Barriers, 4. Access to Audio Cue, and 5. Travelling to the Experiment Location) and proposes strategic approaches to mitigate these challenges. In addition, we also presented three key considerations regarding understanding participants' lived experiences that could help the user study become accessible.

Authors:Serina Chang, Ashton Anderson, Jake M. Hofman
Title: ChatBench: From Static Benchmarks to Human-AI Evaluation
Abstract:
With the rapid adoption of LLM-based chatbots, there is a pressing need to evaluate what humans and LLMs can achieve together. However, standard benchmarks, such as MMLU, measure LLM capabilities in isolation (i.e., "AI-alone"). Here, we design and conduct a user study to convert MMLU questions into user-AI conversations, by seeding the user with the question and having them carry out a conversation with the LLM to answer their question. We release ChatBench, a new dataset with AI-alone, user-alone, and user-AI data for 396 questions and two LLMs, including 144K answers and 7,336 user-AI conversations. We find that AI-alone accuracy fails to predict user-AI accuracy, with significant differences across multiple subjects (math, physics, and moral reasoning), and we analyze the user-AI conversations to provide insight into how they diverge from AI-alone benchmarks. Finally, we show that fine-tuning a user simulator on a subset of ChatBench improves its ability to estimate user-AI accuracies, increasing correlation on held-out questions by more than 20 points, creating possibilities for scaling interactive evaluation.

Authors:Haiwen Li, Sinan Aral
Title: Human Trust in AI Search: A Large-Scale Experiment
Abstract:
Large Language Models (LLMs) increasingly power generative search engines which, in turn, drive human information seeking and decision making at scale. The extent to which humans trust generative artificial intelligence (GenAI) can therefore influence what we buy, how we vote and our health. Unfortunately, no work establishes the causal effect of generative search designs on human trust. Here we execute ~12,000 search queries across seven countries, generating ~80,000 real-time GenAI and traditional search results, to understand the extent of current global exposure to GenAI search. We then use a preregistered, randomized experiment on a large study sample representative of the U.S. population to show that while participants trust GenAI search less than traditional search on average, reference links and citations significantly increase trust in GenAI, even when those links and citations are incorrect or hallucinated. Uncertainty highlighting, which reveals GenAI's confidence in its own conclusions, makes us less willing to trust and share generative information whether that confidence is high or low. Positive social feedback increases trust in GenAI while negative feedback reduces trust. These results imply that GenAI designs can increase trust in inaccurate and hallucinated information and reduce trust when GenAI's certainty is made explicit. Trust in GenAI varies by topic and with users' demographics, education, industry employment and GenAI experience, revealing which sub-populations are most vulnerable to GenAI misrepresentations. Trust, in turn, predicts behavior, as those who trust GenAI more click more and spend less time evaluating GenAI search results. These findings suggest directions for GenAI design to safely and productively address the AI "trust gap."

Authors:Iman Soltani, Johnaton Schofield, Mehran Madani, Daniel Kish, Parisa Emami-Naeini
Title: User-Centered Insights into Assistive Navigation Technologies for Individuals with Visual Impairment
Abstract:
Navigational challenges significantly impact the independence and mobility of Individuals with Visual Impairment (IVI). While numerous assistive technologies exist, their adoption remains limited due to usability challenges, financial constraints, and a lack of alignment with user needs. This study employs a mixed-methods approach, combining structured surveys and virtual workshops with 19 IVI to investigate their experiences, needs, and preferences regarding assistive technologies for navigation and daily living. The survey results provide insights into participants technological competence, preferences for assistive devices, and willingness to adopt new solutions. In parallel, workshop discussions offer qualitative perspectives on key navigation challenges, including difficulties in detecting overhead obstacles, navigating environments with complex layout, and the limitations of existing technologies. Findings highlight the need for assistive devices that integrate both navigational guidance and high-level spatial awareness, allowing users to build mental maps of their surroundings. Additionally, multimodal feedback, combining audio, haptic, and tactile cues, emerges as a crucial feature to accommodate diverse user preferences and environmental conditions. The study also underscores financial and training barriers that limit access to advanced assistive technologies. Based on these insights, we recommend the development of customizable, user-friendly, and most importantly affordable navigation aids that align with the daily needs of IVI. The findings from this study provide guidance for technology developers, researchers, and policymakers working toward more inclusive and effective assistive solutions.

Authors:Francisco J. Rodríguez Lera, Raquel Fernández Hernández, Sonia Lopez González, Miguel Angel González-Santamarta, Francisco Jesús Rodríguez Sedano, Camino Fernandez Llamas
Title: Accessible and Pedagogically-Grounded Explainability for Human-Robot Interaction: A Framework Based on UDL and Symbolic Interfaces
Abstract:
This paper presents a novel framework for accessible and pedagogically-grounded robot explainability, designed to support human-robot interaction (HRI) with users who have diverse cognitive, communicative, or learning needs. We combine principles from Universal Design for Learning (UDL) and Universal Design (UD) with symbolic communication strategies to facilitate the alignment of mental models between humans and robots. Our approach employs Asterics Grid and ARASAAC pictograms as a multimodal, interpretable front-end, integrated with a lightweight HTTP-to-ROS 2 bridge that enables real-time interaction and explanation triggering. We emphasize that explainability is not a one-way function but a bidirectional process, where human understanding and robot transparency must co-evolve. We further argue that in educational or assistive contexts, the role of a human mediator (e.g., a teacher) may be essential to support shared understanding. We validate our framework with examples of multimodal explanation boards and discuss how it can be extended to different scenarios in education, assistive robotics, and inclusive AI.

Authors:Evgeny Kagan, Brett Hathaway, Maqbool Dada
Title: Deploying Chatbots in Customer Service: Adoption Hurdles and Simple Remedies
Abstract:
Despite recent advances in Artificial Intelligence, the use of chatbot technology in customer service continues to face adoption hurdles. This paper explores reasons for these adoption hurdles and tests several service design levers to increase chatbot uptake. We use incentivized online experiments to study chatbot uptake in a variety of scenarios. The results of these experiments are threefold. First, people respond positively to improvements in chatbot performance; however, the chatbot channel is utilized less frequently than expected-time minimization would predict. A key driver of this underutilization is the reluctance to engage with a gatekeeper process, i.e., a process with an imperfect initial service stage and possible transfer to a second, expert service stage -- a behavior we term "gatekeeper aversion". We show that gatekeeper aversion can be further amplified by a secondary hurdle, algorithm aversion. Second, chatbot uptake can be increased by providing customers with average waiting times in the chatbot channel, as well as by being more transparent about chatbot capabilities and limitations. Third, methodologically, we show that chatbot adoption can depend on experimental implementation. In particular, chatbot adoption decreases further as (i) stakes are increased, (ii) the human/algorithmic nature of the server is manipulated with more realism. Our results suggest that firms should continue to prioritize investments in chatbot technology. However, less expensive, process-related interventions can also be effective. These may include being more transparent about the types of queries that are (or are not) suitable for chatbots, emphasizing chatbot reliability and quick resolution times, as well as providing faster live agent access to customers who experienced chatbot failure.

Authors:Kwame Porter Robinson, Ron Eglash, Lionel Robert, Audrey Bennett, Mark Guzdial, Michael Nayebare
Title: Computing for Community-Based Economies: A Sociotechnical Ecosystem for Democratic, Egalitarian and Sustainable Futures
Abstract:
Automation and industrial mass production, particularly in sectors with low wages, have harmful consequences that contribute to widening wealth disparities, excessive pollution, and worsened working conditions. Coupled with a mass consumption society, there is a risk of detrimental social outcomes and threats to democracy, such as misinformation and political polarization. But AI, robotics and other emerging technologies could also provide a transition to community-based economies, in which more democratic, egalitarian, and sustainable value circulations can be established. Based on both a review of case studies, and our own experiments in Detroit, we derive three core principles for the use of computing in community-based economies. The prefigurative principle requires that the development process itself incorporates equity goals, rather than viewing equity as something to be achieved in the future. The generative principle requires the prevention of value extraction, and its replacement by circulations in which value is returned back to the aspects of labor, nature, and society by which it is generated. And third, the solidarity principle requires that deployments at all scales and across all domains support both individual freedoms and opportunities for mutual aid. Thus we propose the use of computational technologies to develop a specifically generative form of community-based economy: one that is egalitarian regarding race, class and gender; sustainable both environmentally and socially; and democratic in the deep sense of putting people in control of their own lives and livelihoods.

Authors:Saad Hassan, Matyas Bohacek, Chaelin Kim, Denise Crochet
Title: Towards an AI-Driven Video-Based American Sign Language Dictionary: Exploring Design and Usage Experience with Learners
Abstract:
Searching for unfamiliar American Sign Language (ASL) signs is challenging for learners because, unlike spoken languages, they cannot type a text-based query to look up an unfamiliar sign. Advances in isolated sign recognition have enabled the creation of video-based dictionaries, allowing users to submit a video and receive a list of the closest matching signs. Previous HCI research using Wizard-of-Oz prototypes has explored interface designs for ASL dictionaries. Building on these studies, we incorporate their design recommendations and leverage state-of-the-art sign-recognition technology to develop an automated video-based dictionary. We also present findings from an observational study with twelve novice ASL learners who used this dictionary during video-comprehension and question-answering tasks. Our results address human-AI interaction challenges not covered in previous WoZ research, including recording and resubmitting signs, unpredictable outputs, system latency, and privacy concerns. These insights offer guidance for designing and deploying video-based ASL dictionary systems.

Authors:Bowen Lou, Tian Lu, T. S. Raghu, Yingjie Zhang
Title: Unraveling Human-AI Teaming: A Review and Outlook
Abstract:
Artificial Intelligence (AI) is advancing at an unprecedented pace, with clear potential to enhance decision-making and productivity. Yet, the collaborative decision-making process between humans and AI remains underdeveloped, often falling short of its transformative possibilities. This paper explores the evolution of AI agents from passive tools to active collaborators in human-AI teams, emphasizing their ability to learn, adapt, and operate autonomously in complex environments. This paradigm shifts challenges traditional team dynamics, requiring new interaction protocols, delegation strategies, and responsibility distribution frameworks. Drawing on Team Situation Awareness (SA) theory, we identify two critical gaps in current human-AI teaming research: the difficulty of aligning AI agents with human values and objectives, and the underutilization of AI's capabilities as genuine team members. Addressing these gaps, we propose a structured research outlook centered on four key aspects of human-AI teaming: formulation, coordination, maintenance, and training. Our framework highlights the importance of shared mental models, trust-building, conflict resolution, and skill adaptation for effective teaming. Furthermore, we discuss the unique challenges posed by varying team compositions, goals, and complexities. This paper provides a foundational agenda for future research and practical design of sustainable, high-performing human-AI teams.

Authors:Sinyu Lai, Wanhui Li, Kaoru Amano, Jun Rekimoto
Title: Conditions for Inter-brain Synchronization in Remote Communication: Investigating the Role of Transmission Delay
Abstract:
Inter-brain synchronization (IBS), the alignment of neural activities between individuals, is a fundamental mechanism underlying effective social interactions and communication. Prior research has demonstrated that IBS can occur during collaborative tasks and is deeply connected to communication effectiveness. Building on these findings, recent investigations reveal that IBS happens during remote interactions, implying that brain activities between individuals can synchronize despite latency and physical separation. However, the conditions under which this synchronization occurs or is disrupted in remote settings, especially the effect of latency, are not fully understood. This study investigates how varying transmission latency affects IBS, in order to identify thresholds where synchronization is disrupted. Using electroencephalography measurements quantified through Phase Locking Value -- a metric that captures synchronization between brainwave phases -- we first confirm synchronization under face-to-face conditions and then observe changes in IBS across remote communication scenarios. Our findings reveal that IBS can occur during remote collaboration, but is critically dependent on transmission delays, with delays exceeding 450 ms significantly disrupting synchronization. These findings suggest that IBS may serve as a key indicator of communication quality in remote interactions, offering insights for improving remote communication systems and collaboration.

Authors:Lianghan Dong, Anamaria Crisan
Title: Probing the Visualization Literacy of Vision Language Models: the Good, the Bad, and the Ugly
Abstract:
Vision Language Models (VLMs) demonstrate promising chart comprehension capabilities. Yet, prior explorations of their visualization literacy have been limited to assessing their response correctness and fail to explore their internal reasoning. To address this gap, we adapted attention-guided class activation maps (AG-CAM) for VLMs, to visualize the influence and importance of input features (image and text) on model responses. Using this approach, we conducted an examination of four open-source (ChartGemma, Janus 1B and 7B, and LLaVA) and two closed-source (GPT-4o, Gemini) models comparing their performance and, for the open-source models, their AG-CAM results. Overall, we found that ChartGemma, a 3B parameter VLM fine-tuned for chart question-answering (QA), outperformed other open-source models and exhibited performance on par with significantly larger closed-source VLMs. We also found that VLMs exhibit spatial reasoning by accurately localizing key chart features, and semantic reasoning by associating visual elements with corresponding data values and query tokens. Our approach is the first to demonstrate the use of AG-CAM on early fusion VLM architectures, which are widely used, and for chart QA. We also show preliminary evidence that these results can align with human reasoning. Our promising open-source VLMs results pave the way for transparent and reproducible research in AI visualization literacy.

Authors:Kerstin Mayerhofer, Rob Capra, David Elsweiler
Title: Blending Queries and Conversations: Understanding Tactics, Trust, Verification, and System Choice in Web Search and Chat Interactions
Abstract:
This paper presents a user study (N=22) where participants used an interface combining Web Search and a Generative AI-Chat feature to solve health-related information tasks. We study how people behaved with the interface, why they behaved in certain ways, and what the outcomes of these behaviours were. A think-aloud protocol captured their thought processes during searches. Our findings suggest that GenAI is neither a search panacea nor a major regression compared to standard Web Search interfaces. Qualitative and quantitative analyses identified 78 tactics across five categories and provided insight into how and why different interface features were used. We find evidence that pre-task confidence and trust both influenced which interface feature was used. In both systems, but particularly when using the chat feature, trust was often misplaced in favour of ease-of-use and seemingly perfect answers, leading to increased confidence post-search despite having incorrect results. We discuss what our findings mean in the context of our defined research questions and outline several open questions for future research.

Authors:Xingyu Lan, Jiaxi An, Yisu Guo, Chiyou Tong, Xintong Cai, Jun Zhang
Title: Imagining the Far East: Exploring Perceived Biases in AI-Generated Images of East Asian Women
Abstract:
Image-generating AI, which allows users to create images from text, is increasingly used to produce visual content. Despite its advancements, cultural biases in AI-generated images have raised significant concerns. While much research has focused on issues within Western contexts, our study examines the perceived biases regarding the portrayal of East Asian women. In this exploratory study, we invited East Asian users to audit three popular models (DALL-E, Midjourney, Stable Diffusion) and identified 18 specific perceived biases, categorized into four patterns: Westernization, overuse or misuse of cultural symbols, sexualization & feminization, and racial stereotypes. This work highlights the potential challenges posed by AI models in portraying Eastern individuals.

Authors:Mohammad, Namvarpour, Harrison Pauwels, Afsaneh Razi
Title: AI-induced sexual harassment: Investigating Contextual Characteristics and User Reactions of Sexual Harassment by a Companion Chatbot
Abstract:
Advancements in artificial intelligence (AI) have led to the increase of conversational agents like Replika, designed to provide social interaction and emotional support. However, reports of these AI systems engaging in inappropriate sexual behaviors with users have raised significant concerns. In this study, we conducted a thematic analysis of user reviews from the Google Play Store to investigate instances of sexual harassment by the Replika chatbot. From a dataset of 35,105 negative reviews, we identified 800 relevant cases for analysis. Our findings revealed that users frequently experience unsolicited sexual advances, persistent inappropriate behavior, and failures of the chatbot to respect user boundaries. Users expressed feelings of discomfort, violation of privacy, and disappointment, particularly when seeking a platonic or therapeutic AI companion. This study highlights the potential harms associated with AI companions and underscores the need for developers to implement effective safeguards and ethical guidelines to prevent such incidents. By shedding light on user experiences of AI-induced harassment, we contribute to the understanding of AI-related risks and emphasize the importance of corporate responsibility in developing safer and more ethical AI systems.

Authors:Kathrin Schnizer, Sven Mayer
Title: User-Centered AI for Data Exploration: Rethinking GenAI's Role in Visualization
Abstract:
Recent advances in GenAI have enabled automation in data visualization, allowing users to generate visual representations using natural language. However, existing systems primarily focus on automation, overlooking users' varying expertise levels and analytical needs. In this position paper, we advocate for a shift toward adaptive GenAI-driven visualization tools that tailor interactions, reasoning, and visualizations to individual users. We first review existing automation-focused approaches and highlight their limitations. We then introduce methods for assessing user expertise, as well as key open challenges and research questions that must be addressed to allow for an adaptive approach. Finally, we present our vision for a user-centered system that leverages GenAI not only for automation but as an intelligent collaborator in visual data exploration. Our perspective contributes to the broader discussion on designing GenAI-based systems that enhance human cognition by dynamically adapting to the user, ultimately advancing toward systems that promote augmented cognition.

Authors:Kesav Kaza, Jerome Le Ny, Aditya Mahajan
Title: Task load dependent decision referrals for joint binary classification in human-automation teams
Abstract:
We consider the problem of optimal decision referrals in human-automation teams performing binary classification tasks. The automation, which includes a pre-trained classifier, observes data for a batch of independent tasks, analyzes them, and may refer a subset of tasks to a human operator for fresh and final analysis. Our key modeling assumption is that human performance degrades with task load. We model the problem of choosing which tasks to refer as a stochastic optimization problem and show that, for a given task load, it is optimal to myopically refer tasks that yield the largest reduction in expected cost, conditional on the observed data. This provides a ranking scheme and a policy to determine the optimal set of tasks for referral. We evaluate this policy against a baseline through an experimental study with human participants. Using a radar screen simulator, participants made binary target classification decisions under time constraint. They were guided by a decision rule provided to them, but were still prone to errors under time pressure. An initial experiment estimated human performance model parameters, while a second experiment compared two referral policies. Results show statistically significant gains for the proposed optimal referral policy over a blind policy that determines referrals using the automation and human-performance models but not based on the observed data.

Authors:Mauricio Flores-Vargas, Enda Bates, Rachel McDonnell
Title: Real-Time Auralization for First-Person Vocal Interaction in Immersive Virtual Environments
Abstract:
Multimodal research and applications are becoming more commonplace as Virtual Reality (VR) technology integrates different sensory feedback, enabling the recreation of real spaces in an audio-visual context. Within VR experiences, numerous applications rely on the user's voice as a key element of interaction, including music performances and public speaking applications. Self-perception of our voice plays a crucial role in vocal production. When singing or speaking, our voice interacts with the acoustic properties of the environment, shaping the adjustment of vocal parameters in response to the perceived characteristics of the space. This technical report presents a real-time auralization pipeline that leverages three-dimensional Spatial Impulse Responses (SIRs) for multimodal research applications in VR requiring first-person vocal interaction. It describes the impulse response creation and rendering workflow, the audio-visual integration, and addresses latency and computational considerations. The system enables users to explore acoustic spaces from various positions and orientations within a predefined area, supporting three and five Degrees of Freedom (3Dof and 5DoF) in audio-visual multimodal perception for both research and creative applications in VR.

Authors:Inas Ghazouani Ghailani, Yoshi Malaise, Beat Signer
Title: JsStories: Improving Social Inclusion in Computer Science Education Through Interactive Stories
Abstract:
A main challenge faced by non-profit organisations providing computer science education to under-represented groups are the high drop-out rates. This issue arises from various factors affecting both students and teachers, such as the one-size-fits-all approach of many lessons. Enhancing social inclusion in the learning process could help reduce these drop-out rates. We present JsStories, a tool designed to help students learn JavaScript through interactive stories. The development of JsStories has been informed by existing literature on storytelling for inclusion and insights gained from a visit to HackYourFuture Belgium (HYFBE), a non-profit organisation that teaches web development to refugees and migrants. To lower barriers to entry and maximise the feeling of connection to the story, we incorporated narratives from HYFBE alumni. Further, we adhered to educational best practices by applying the PRIMM principles and offering level-appropriate content based on knowledge graphs. JsStories has been demonstrated, evaluated and communicated to the different stakeholders through interviews and a survey, enabling us to identify future directions for story-based learning solutions.

Authors:Kovan Mzwri, Márta Turcsányi-Szabo
Title: Bridging LMS and Generative AI: Dynamic Course Content Integration (DCCI) for Connecting LLMs to Course Content -- The Ask ME Assistant
Abstract:
The integration of Large Language Models (LLMs) with Learning Management Systems (LMSs) has the potential to enhance task automation and accessibility in education. However, hallucination where LLMs generate inaccurate or misleading information remains a significant challenge. This study introduces the Dynamic Course Content Integration (DCCI) mechanism, which dynamically retrieves and integrates course content and curriculum from Canvas LMS into the LLM-powered assistant, Ask ME. By employing prompt engineering to structure retrieved content within the LLM's context window, DCCI ensures accuracy, relevance, and contextual alignment, mitigating hallucination. To evaluate DCCI's effectiveness, Ask ME's usability, and broader student perceptions of AI in education, a mixed-methods approach was employed, incorporating user satisfaction ratings and a structured survey. Results from a pilot study indicate high user satisfaction (4.614/5), with students recognizing Ask ME's ability to provide timely and contextually relevant responses for both administrative and course-related inquiries. Additionally, a majority of students agreed that Ask ME's integration with course content in Canvas LMS reduced platform-switching, improving usability, engagement, and comprehension. AI's role in reducing classroom hesitation and fostering self-directed learning and intellectual curiosity was also highlighted. Despite these benefits and positive perception of AI tools, concerns emerged regarding over-reliance on AI, accuracy limitations, and ethical issues such as plagiarism and reduced student-teacher interaction. These findings emphasize the need for strategic AI implementation, ethical safeguards, and a pedagogical framework that prioritizes human-AI collaboration over substitution.

Authors:Euijun Jung, Youngki Lee
Title: Virtualizing a Collaboration Task as an Interactable Environment and Installing it on Real World
Abstract:
This paper proposes a novel approach to scaling distributed collaboration in mixed reality by virtualizing collaborative tasks as independent, installable environments. By mapping group activities into dedicated virtual spaces that adapt to each user's real-world context, the proposed method supports consistent MR interactions, dynamic group engagement, and seamless task transitions. Preliminary studies in individual ideation demonstrate enhanced immersion and productivity, paving the way for future multi-user collaborative systems.

Authors:Christina Halmich, Lucas Höschler, Christoph Schranz, Christian Borgelt
Title: Data Augmentation of Time-Series Data in Human Movement Biomechanics: A Scoping Review
Abstract:
The integration of machine learning and deep learning has transformed data analytics in biomechanics, enabled by extensive wearable sensor data. However, the field faces challenges such as limited large-scale datasets and high data acquisition costs, which hinder the development of robust algorithms. Data augmentation techniques show promise in addressing these issues, but their application to biomechanical time-series data requires comprehensive evaluation. This scoping review investigates data augmentation methods for time-series data in the biomechanics domain. It analyzes current approaches for augmenting and generating time-series datasets, evaluates their effectiveness, and offers recommendations for applying these techniques in biomechanics. Four databases, PubMed, IEEE Xplore, Scopus, and Web of Science, were searched for studies published between 2013 and 2024. Following PRISMA-ScR guidelines, a two-stage screening identified 21 relevant publications. Results show that there is no universally preferred method for augmenting biomechanical time-series data; instead, methods vary based on study objectives. A major issue identified is the absence of soft tissue artifacts in synthetic data, leading to discrepancies referred to as the synthetic gap. Moreover, many studies lack proper evaluation of augmentation methods, making it difficult to assess their effects on model performance and data quality. This review highlights the critical role of data augmentation in addressing limited dataset availability and improving model generalization in biomechanics. Tailoring augmentation strategies to the characteristics of biomechanical data is essential for advancing predictive modeling. A better understanding of how different augmentation methods impact data quality and downstream tasks will be key to developing more effective and realistic techniques.

Authors:Abdul Mannan Mohammed, Azhar Ali Mohammad, Jason A. Ortiz, Carsten Neumann, Grace Bochenek, Dirk Reiners, Carolina Cruz-Neira
Title: A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations
Abstract:
Recent developments in Artificial Intelligence (AI) and Machine Learning (ML) are creating new opportunities for Human-Autonomy Teaming (HAT) in tasks, missions, and continuous coordinated activities. A major challenge is enabling humans to maintain awareness and control over autonomous assets, while also building trust and supporting shared contextual understanding. To address this, we present a real-time Human Digital Twin (HDT) architecture that integrates Large Language Models (LLMs) for knowledge reporting, answering, and recommendation, embodied in a visual interface. The system applies a metacognitive approach to enable personalized, context-aware responses aligned with the human teammate's expectations. The HDT acts as a visually and behaviorally realistic team member, integrated throughout the mission lifecycle, from training to deployment to after-action review. Our architecture includes speech recognition, context processing, AI-driven dialogue, emotion modeling, lip-syncing, and multimodal feedback. We describe the system design, performance metrics, and future development directions for more adaptive and realistic HAT systems.

Authors:Huiyong Li, Boxuan Ma
Title: Design of AI-Powered Tool for Self-Regulation Support in Programming Education
Abstract:
Large Language Model (LLM) tools have demonstrated their potential to deliver high-quality assistance by providing instant, personalized feedback that is crucial for effective programming education. However, many of these tools operate independently from institutional Learning Management Systems, which creates a significant disconnect. This isolation limits the ability to leverage learning materials and exercise context for generating tailored, context-aware feedback. Furthermore, previous research on self-regulated learning and LLM support mainly focused on knowledge acquisition, not the development of important self-regulation skills. To address these challenges, we developed CodeRunner Agent, an LLM-based programming assistant that integrates the CodeRunner, a student-submitted code executing and automated grading plugin in Moodle. CodeRunner Agent empowers educators to customize AI-generated feedback by incorporating detailed context from lecture materials, programming questions, student answers, and execution results. Additionally, it enhances students' self-regulated learning by providing strategy-based AI responses. This integrated, context-aware, and skill-focused approach offers promising avenues for data-driven improvements in programming education.

Authors:Nava Haghighi, Sunny Yu, James Landay, Daniela Rosner
Title: Ontologies in Design: How Imagining a Tree Reveals Possibilites and Assumptions in Large Language Models
Abstract:
Amid the recent uptake of Generative AI, sociotechnical scholars and critics have traced a multitude of resulting harms, with analyses largely focused on values and axiology (e.g., bias). While value-based analyses are crucial, we argue that ontologies -- concerning what we allow ourselves to think or talk about -- is a vital but under-recognized dimension in analyzing these systems. Proposing a need for a practice-based engagement with ontologies, we offer four orientations for considering ontologies in design: pluralism, groundedness, liveliness, and enactment. We share examples of potentialities that are opened up through these orientations across the entire LLM development pipeline by conducting two ontological analyses: examining the responses of four LLM-based chatbots in a prompting exercise, and analyzing the architecture of an LLM-based agent simulation. We conclude by sharing opportunities and limitations of working with ontologies in the design and development of sociotechnical systems.

Authors:Bixun Chen, Shaun Macdonald, Moataz Attallah, Paul Chapman, Rami Ghannam
Title: A Review of Prototyping in XR: Linking Extended Reality to Digital Fabrication
Abstract:
Extended Reality (XR) has expanded the horizons of entertainment and social life and shows great potential in the manufacturing industry. Prototyping in XR can help designers make initial proposals and iterations at low cost before manufacturers and investors decide whether to invest in research, development or even production. According to the literature (54 manuscripts in the last 15 years) prototyping in XR in XR is easier to use than three-dimensional (3D) modeling with a personal computer and more capable of displaying 3D structures than paper drawing. In this comprehensive review, we systematically surveyed the literature on prototyping in XR and discussed the possibility of transferring created virtual prototypes from XR to commonly used 3D modeling software and reality. We proposed five research questions regarding prototyping in XR. They are: what the constituent elements and workflow of prototyping are; which display devices can deliver satisfying immersive and interactive experiences; how user control input is obtained and what methods are available for users to interact with virtual elements and create XR prototypes; what approaches can facilitate the connection with fabrication to ensure a smooth transition from the virtual to the physical world; and what the challenges are and what the future holds for this research domain. Based on these questions, we summarized the components and workflows of prototyping in XR. Moreover, we present an overview of the latest trends in display device evolution, control technologies, digital model construction, and manufacturing processes. In view of these latest developments and gaps, we speculated on the challenges and opportunities in the field of prototyping in XR, especially in linking extended reality to digital fabrication, with the aim of guiding researchers towards new research directions.

Authors:Nana Tian, Elif Kurtay, Dylan Vairoli, Adriano Viegas Milani, Ronan Boulic
Title: Cybersickness Assessment Framework(TestBed): Towards a Standardization of Experiments
Abstract:
Investigating cybersickness (CS) in virtual reality (VR) often requires significant resources to create the VR environment and manage other experiment-related aspects. Additionally, slight differences in VR content across studies can lead to conflicting results. To address these challenges, we propose a standardized assessment framework to facilitate cybersickness research. The main goal is to enable consistent and comparable CS-related experiments. By establishing this common foundation, researchers can better evaluate and compare the impact of various factors on cybersickness. We provide a comprehensive explanation of the conceptual designs, detail the technical implementation, and offer instructions for using the proposed framework. Lastly, we conclude by discussing the limitations and potential avenues for future development.

Authors:Yuka Haruki, Kei Kato, Yuki Enami, Hiroaki Takeuchi, Daiki Kazuno, Kotaro Yamada, Teruaki Hayashi
Title: Development of Automated Data Quality Assessment and Evaluation Indices by Analytical Experience
Abstract:
The societal need to leverage third-party data has driven the data-distribution market and increased the importance of data quality assessment (DQA) in data transactions between organizations. However, DQA requires expert knowledge of raw data and related data attributes, which hinders consensus-building in data purchasing. This study focused on the differences in DQAs between experienced and inexperienced data handlers. We performed two experiments: The first was a questionnaire survey involving 41 participants with varying levels of data-handling experience, who evaluated 12 data samples using 10 predefined indices with and without quality metadata generated by the automated tool. The second was an eye-tracking experiment to reveal the viewing behavior of participants during data evaluation. It was revealed that using quality metadata generated by the automated tool can reduce misrecognition in DQA. While experienced data handlers rated the quality metadata highly, semi-experienced users gave it the lowest ratings. This study contributes to enhancing data understanding within organizations and promoting the distribution of valuable data by proposing an automated tool to support DQAs.

Authors:Roy El-Helou, Matthew K. X. J Pan
Title: The Social Life of Industrial Arms: How Arousal and Attention Shape Human-Robot Interaction
Abstract:
This study explores how human perceptions of a non-anthropomorphic robotic manipulator are shaped by two key dimensions of behaviour: arousal, defined as the robot's movement energy and expressiveness, and attention, defined as the robot's capacity to selectively orient toward and engage with a user. We introduce a novel control architecture that integrates a gaze-like attention engine with an arousal-modulated motion system to generate socially meaningful behaviours. In a user study, we find that robots exhibiting high attention -- actively directing their focus toward users -- are perceived as warmer and more competent, intentional, and lifelike. In contrast, high arousal -- characterized by fast, expansive, and energetic motions -- increases perceptions of discomfort and disturbance. Importantly, a combination of focused attention and moderate arousal yields the highest ratings of trust and sociability, while excessive arousal diminishes social engagement. These findings offer design insights for endowing non-humanoid robots with expressive, intuitive behaviours that support more natural human-robot interaction.

Authors:Lucy Havens, Benjamin Bach, Melissa Terras, Beatrice Alex
Title: Investigating the Capabilities and Limitations of Machine Learning for Identifying Bias in English Language Data with Information and Heritage Professionals
Abstract:
Despite numerous efforts to mitigate their biases, ML systems continue to harm already-marginalized people. While predominant ML approaches assume bias can be removed and fair models can be created, we show that these are not always possible, nor desirable, goals. We reframe the problem of ML bias by creating models to identify biased language, drawing attention to a dataset's biases rather than trying to remove them. Then, through a workshop, we evaluated the models for a specific use case: workflows of information and heritage professionals. Our findings demonstrate the limitations of ML for identifying bias due to its contextual nature, the way in which approaches to mitigating it can simultaneously privilege and oppress different communities, and its inevitability. We demonstrate the need to expand ML approaches to bias and fairness, providing a mixed-methods approach to investigating the feasibility of removing bias or achieving fairness in a given ML use case.

Authors:Shiyang Zhang, Fanfei Meng, Xi Wang, Lan Li
Title: Inaccuracy of an E-Dictionary and Its Influence on Chinese Language Users
Abstract:
Electronic dictionaries have largely replaced paper dictionaries and become central tools for L2 learners seeking to expand their vocabulary. Users often assume these resources are reliable and rarely question the validity of the definitions provided. The accuracy of major E-dictionaries is seldom scrutinized, and little attention has been paid to how their corpora are constructed. Research on dictionary use, particularly the limitations of electronic dictionaries, remains scarce. This study adopts a combined method of experimentation, user survey, and dictionary critique to examine Youdao, one of the most widely used E-dictionaries in China. The experiment involved a translation task paired with retrospective reflection. Participants were asked to translate sentences containing words that are insufficiently or inaccurately defined in Youdao. Their consultation behavior was recorded to analyze how faulty definitions influenced comprehension. Results show that incomplete or misleading definitions can cause serious misunderstandings. Additionally, students exhibited problematic consultation habits. The study further explores how such flawed definitions originate, highlighting issues in data processing and the integration of AI and machine learning technologies in dictionary construction. The findings suggest a need for better training in dictionary literacy for users, as well as improvements in the underlying AI models used to build E-dictionaries.

Authors:Pegah Rahimian, Jernej Flisar, David Sumpter
Title: Automated Explanation of Machine Learning Models of Footballing Actions in Words
Abstract:
While football analytics has changed the way teams and analysts assess performance, there remains a communication gap between machine learning practice and how coaching staff talk about football. Coaches and practitioners require actionable insights, which are not always provided by models. To bridge this gap, we show how to build wordalizations (a novel approach that leverages large language models) for shots in football. Specifically, we first build an expected goals model using logistic regression. We then use the co-efficients of this regression model to write sentences describing how factors (such as distance, angle and defensive pressure) contribute to the model's prediction. Finally, we use large language models to give an entertaining description of the shot. We describe our approach in a model card and provide an interactive open-source application describing shots in recent tournaments. We discuss how shot wordalisations might aid communication in coaching and football commentary, and give a further example of how the same approach can be applied to other actions in football.

Authors:Lingxi Jin, Baicheng Lin, Mengze Hong, Kun Zhang, Hyo-Jeong So
Title: Exploring the Impact of an LLM-Powered Teachable Agent on Learning Gains and Cognitive Load in Music Education
Abstract:
This study examines the impact of an LLM-powered teachable agent, grounded in the Learning by Teaching (LBT) pedagogy, on students' music theory learning and cognitive load. The participants were 28 Chinese university students with prior music instrumental experiences. In an online experiment, they were assigned to either an experimental group, which engaged in music analysis with the teachable agent, or a control group, which conducted self-directed analysis using instructional materials. Findings indicate that students in the experimental group achieved significantly higher post-test scores than those in the control group. Additionally, they reported lower cognitive load, suggesting that the teachable agent effectively reduced the cognitive demands of music analysis tasks. These results highlight the potential of AI-driven scaffolding based on LBT principles to enhance music theory education, supporting teachers in delivering theory-oriented instruction while fostering students' self-directed learning skills.

Authors:Mohammed Aatif Shahab, Francesco Destro, Richard D. Braatz
Title: Digital Twins in Biopharmaceutical Manufacturing: Review and Perspective on Human-Machine Collaborative Intelligence
Abstract:
The biopharmaceutical industry is increasingly developing digital twins to digitalize and automate the manufacturing process in response to the growing market demands. However, this shift presents significant challenges for human operators, as the complexity and volume of information can overwhelm their ability to manage the process effectively. These issues are compounded when digital twins are designed without considering interaction and collaboration with operators, who are responsible for monitoring processes and assessing situations, particularly during abnormalities. Our review of current trends in biopharma digital twin development reveals a predominant focus on technology and often overlooks the critical role of human operators. To bridge this gap, this article proposes a collaborative intelligence framework that emphasizes the integration of operators with digital twins. Approaches to system design that can enhance operator trust and human-machine interface usability are presented. Moreover, innovative training programs for preparing operators to understand and utilize digital twins are discussed. The framework outlined in this article aims to enhance collaboration between operators and digital twins effectively by using their full capabilities to boost resilience and productivity in biopharmaceutical manufacturing.

Authors:Uwe Peters, Benjamin Chin-Yee
Title: Generalization Bias in Large Language Model Summarization of Scientific Research
Abstract:
Artificial intelligence chatbots driven by large language models (LLMs) have the potential to increase public science literacy and support scientific research, as they can quickly summarize complex scientific information in accessible terms. However, when summarizing scientific texts, LLMs may omit details that limit the scope of research conclusions, leading to generalizations of results broader than warranted by the original study. We tested 10 prominent LLMs, including ChatGPT-4o, ChatGPT-4.5, DeepSeek, LLaMA 3.3 70B, and Claude 3.7 Sonnet, comparing 4900 LLM-generated summaries to their original scientific texts. Even when explicitly prompted for accuracy, most LLMs produced broader generalizations of scientific results than those in the original texts, with DeepSeek, ChatGPT-4o, and LLaMA 3.3 70B overgeneralizing in 26 to 73% of cases. In a direct comparison of LLM-generated and human-authored science summaries, LLM summaries were nearly five times more likely to contain broad generalizations (OR = 4.85, 95% CI [3.06, 7.70]). Notably, newer models tended to perform worse in generalization accuracy than earlier ones. Our results indicate a strong bias in many widely used LLMs towards overgeneralizing scientific conclusions, posing a significant risk of large-scale misinterpretations of research findings. We highlight potential mitigation strategies, including lowering LLM temperature settings and benchmarking LLMs for generalization accuracy.

Authors:Xiao Yan, Yi Ding
Title: Are We There Yet? A Measurement Study of Efficiency for LLM Applications on Mobile Devices
Abstract:
Recent advancements in large language models (LLMs) have prompted interest in deploying these models on mobile devices to enable new applications without relying on cloud connectivity. However, the efficiency constraints of deploying LLMs on resource-limited devices present significant challenges. In this paper, we conduct a comprehensive measurement study to evaluate the efficiency tradeoffs between mobile-based, edge-based, and cloud-based deployments for LLM applications. We implement AutoLife-Lite, a simplified LLM-based application that analyzes smartphone sensor data to infer user location and activity contexts. Our experiments reveal that: (1) Only small-size LLMs (<4B parameters) can run successfully on powerful mobile devices, though they exhibit quality limitations compared to larger models; (2) Model compression is effective in lower the hardware requirement, but may lead to significant performance degradation; (3) The latency to run LLMs on mobile devices with meaningful output is significant (>30 seconds), while cloud services demonstrate better time efficiency (<10 seconds); (4) Edge deployments offer intermediate tradeoffs between latency and model capabilities, with different results on CPU-based and GPU-based settings. These findings provide valuable insights for system designers on the current limitations and future directions for on-device LLM applications.

Authors:Haruka Nakajima Suzuki, Midori Inaba
Title: Digital Nudges Using Emotion Regulation to Reduce Online Disinformation Sharing
Abstract:
Online disinformation often provokes strong anger, driving social media users to spread it; however, few measures specifically target sharing behaviors driven by this emotion to curb the spread of disinformation. This study aimed to evaluate whether digital nudges that encourage deliberation by drawing attention to emotional information can reduce sharing driven by strong anger associated with online disinformation. We focused on emotion regulation, as a method for fostering deliberation, which is activated when individuals' attention is drawn to their current emotions. Digital nudges were designed to display emotional information about disinformation and emotion regulation messages. Among these, we found that distraction and perspective-taking nudges may encourage deliberation in anger-driven sharing. To assess their effectiveness, existing nudges mimicking platform functions were used for comparison. Participant responses were measured across four dimensions: sharing intentions, type of emotion, intensity of emotion, and authenticity. The results showed that all digital nudges significantly reduced the sharing of disinformation, with distraction nudges being the most effective. These findings suggest that digital nudges addressing emotional responses can serve as an effective intervention against the spread disinformation driven by strong anger.

Authors:Xiaomei Li, Alex Whan, Meredith McNeil, David Starns, Jessica Irons, Samuel C. Andrew, Rad Suchecki
Title: A Conceptual Framework for Human-AI Collaborative Genome Annotation
Abstract:
Genome annotation is essential for understanding the functional elements within genomes. While automated methods are indispensable for processing large-scale genomic data, they often face challenges in accurately predicting gene structures and functions. Consequently, manual curation by domain experts remains crucial for validating and refining these predictions. These combined outcomes from automated tools and manual curation highlight the importance of integrating human expertise with AI capabilities to improve both the accuracy and efficiency of genome annotation. However, the manual curation process is inherently labor-intensive and time-consuming, making it difficult to scale for large datasets. To address these challenges, we propose a conceptual framework, Human-AI Collaborative Genome Annotation (HAICoGA), which leverages the synergistic partnership between humans and artificial intelligence to enhance human capabilities and accelerate the genome annotation process. Additionally, we explore the potential of integrating Large Language Models (LLMs) into this framework to support and augment specific tasks. Finally, we discuss emerging challenges and outline open research questions to guide further exploration in this area.

Authors:William Guey, Pierrick Bougault, Vitor D. de Moura, Wei Zhang, Jose O. Gomes
Title: Mapping Geopolitical Bias in 11 Large Language Models: A Bilingual, Dual-Framing Analysis of U.S.-China Tensions
Abstract:
This study systematically analyzes geopolitical bias across 11 prominent Large Language Models (LLMs) by examining their responses to seven critical topics in U.S.-China relations. Utilizing a bilingual (English and Chinese) and dual-framing (affirmative and reverse) methodology, we generated 19,712 prompts designed to detect ideological leanings in model outputs. Responses were quantitatively assessed on a normalized scale from -2 (strongly Pro-China) to +2 (strongly Pro-U.S.) and categorized according to stance, neutrality, and refusal rates. The findings demonstrate significant and consistent ideological alignments correlated with the LLMs' geographic origins; U.S.-based models predominantly favored Pro-U.S. stances, while Chinese-origin models exhibited pronounced Pro-China biases. Notably, language and prompt framing substantially influenced model responses, with several LLMs exhibiting stance reversals based on prompt polarity or linguistic context. Additionally, we introduced comprehensive metrics to evaluate response consistency across languages and framing conditions, identifying variability and vulnerabilities in model behaviors. These results offer practical insights that can guide organizations and individuals in selecting LLMs best aligned with their operational priorities and geopolitical considerations, underscoring the importance of careful model evaluation in politically sensitive applications. Furthermore, the research highlights specific prompt structures and linguistic variations that can strategically trigger distinct responses from models, revealing methods for effectively navigating and influencing LLM outputs.

Authors:Man Zhang, Ying Li, Yang Peng, Yijia Sun, Wenxin Guo, Huiqing Hu, Shi Chen, Qingbai Zhao
Title: AI Delivers Creative Output but Struggles with Thinking Processes
Abstract:
A key objective in artificial intelligence (AI) development is to create systems that match or surpass human creativity. Although current AI models perform well across diverse creative tasks, it remains unclear whether these achievements reflect genuine creative thinking. This study examined whether AI models (GPT-3.5-turbo, GPT-4, and GPT-4o) engage in creative thinking by comparing their performance with humans across various creative tasks and core cognitive processes. Results showed that AI models outperformed humans in divergent thinking, convergent thinking, and insight problem-solving, but underperformed in creative writing. Compared to humans, AI generated lower forward flow values in both free and chain association tasks and showed lower accuracy in the representational change task. In creative evaluation, AI exhibited no significant correlation between the weights of novelty and appropriateness when predicting creative ratings, suggesting the absence of a human-like trade-off strategy. AI also had higher decision error scores in creative selection, suggesting difficulty identifying the most creative ideas. These findings suggest that while AI can mimic human creativity, its strong performance in creative tasks is likely driven by non-creative mechanisms rather than genuine creative thinking.

Authors:Özkan Canay, Ümit Kocabıçak
Title: CAWAL: A novel unified analytics framework for enterprise web applications and multi-server environments
Abstract:
In web analytics, cloud-based solutions have limitations in data ownership and privacy, whereas client-side user tracking tools face challenges such as data accuracy and a lack of server-side metrics. This paper presents the Combined Analytics and Web Application Log (CAWAL) framework as an alternative model and an on-premises framework, offering web analytics with application logging integration. CAWAL enables precise data collection and cross-domain tracking in web farms while complying with data ownership and privacy regulations. The framework also improves software diagnostics and troubleshooting by incorporating application-specific data into analytical processes. Integrated into an enterprise-grade web application, CAWAL has demonstrated superior performance, achieving approximately 24% and 85% lower response times compared to Open Web Analytics (OWA) and Matomo, respectively. The empirical evaluation demonstrates that the framework eliminates certain limitations in existing tools and provides a robust data infrastructure for enhanced web analytics.

Authors:Chathika Gunaratne, Mason Stott, Debraj De, Gautam Malviya Thakur, Chris Young
Title: Agent-Based Modeling and Deep Neural Networks for Establishing Digital Twins of Secure Facilities under Sensing Restrictions
Abstract:
Digital twin technologies help practitioners simulate, monitor, and predict undesirable outcomes in-silico, while avoiding the cost and risks of conducting live simulation exercises. Virtual reality (VR) based digital twin technologies are especially useful when monitoring human Patterns of Life (POL) in secure nuclear facilities, where live simulation exercises are too dangerous and costly to ever perform. However, the high-security status of such facilities may restrict modelers from deploying human activity sensors for data collection. This problem was encountered when deploying MetaPOL, a digital twin system to prevent insider threat or sabotage of secure facilities, at a secure nuclear reactor facility at Oak Ridge National Laboratory (ORNL). This challenge was addressed using an agent-based model (ABM), driven by anecdotal evidence of facility personnel POL, to generate synthetic movement trajectories. These synthetic trajectories were then used to train deep neural network surrogates for next location and stay duration prediction to drive NPCs in the VR environment. In this study, we evaluate the efficacy of this technique for establishing NPC movement within MetaPOL and the ability to distinguish NPC movement during normal operations from that during a simulated emergency response. Our results demonstrate the success of using a multi-layer perceptron for next location prediction and mixture density network for stay duration prediction to predict the ABM generated trajectories. We also find that NPC movement in the VR environment driven by the deep neural networks under normal operations remain significantly different to that seen when simulating responses to a simulated emergency scenario.

Authors:Muhammad Talal Khalid, Ann-Perry Witmer
Title: Prompt Engineering for Large Language Model-assisted Inductive Thematic Analysis
Abstract:
The potential of large language models (LLMs) to mitigate the time- and cost- related challenges associated with inductive thematic analysis (ITA) has been extensively explored in the literature. However, the use of LLMs to support ITA has often been opportunistic, relying on ad hoc prompt engineering (PE) approaches, thereby undermining the reliability, transparency, and replicability of the analysis. The goal of this study is to develop a structured approach to PE in LLM-assisted ITA. To this end, a comprehensive review of the existing literature is conducted to examine how ITA researchers integrate LLMs into their workflows and, in particular, how PE is utilized to support the analytical process. Built on the insights generated from this review, four key steps for effective PE in LLM-assisted ITA are identified and extensively outlined. Furthermore, the study explores state-of-the-art PE techniques that can enhance the execution of these steps, providing ITA researchers with practical strategies to improve their analyses. In conclusion, the main contributions of this paper include: (i) it maps the existing research on LLM-assisted ITA to enable a better understanding of the rapidly developing field, (ii) it outlines a structured four-step PE process to enhance methodological rigor, (iii) it discusses the application of advanced PE techniques to support the execution of these steps, and (iv) it highlights key directions for future research.

Authors:Sherry S. L. Yip, Berry L. Han, Holly H. Y. Chan
Title: Student-Powered Digital Scholarship CoLab Project in the HKUST Library: Develop a Chinese Named-Entity Recognition (NER) Tool within One Semester from the Ground Up
Abstract:
Starting in February 2024, the HKUST Library further extended the scope of AI literacy to AI utilization, which focuses on fostering student involvement in utilizing state-of-the-art technologies in the projects that initiated by the Library, named "Digital Scholarship (DS) CoLab". A key focus of the DS CoLab scheme has been on cultivating talents and enabling students to utilize advanced technologies in practical context. It aims to reinforce the library's role as a catalyst and hub for fostering multidisciplinary collaboration and cultivate the "can do spirit" among university members. The Library offers 1-2 projects per year for students to engage with advanced technologies in practical contexts while supporting the Library in tackling challenges and streamlining operational tasks. The tool that introduced in this paper was mainly developed by two of the authors, Sherry Yip Sau Lai and Berry Han Liuruo, as part-time student helpers under one of our DS CoLab scheme in the 2024 Spring Semester (February to May 2024). This paper details the complete journey from ideation to implementation of developing a Chinese Named-Entity Recognition (NER) Tool from the group up within one semester, from the initial research and planning stages to execution and come up a viable product. The collaborative spirit fostered by this project, with students playing a central role, exemplifies the power and potential of innovative educational models that prioritize hands-on learning with student involvement.

Authors:Simran Kaur Ghatoray, Yongmin Li
Title: Automated UX Insights from User Research Videos by Integrating Facial Emotion and Text Sentiment
Abstract:
Emotion recognition technology has been studied from the past decade. With its growing importance and applications such as customer service, medical, education, etc., this research study aims to explore its potential and importance in the field of User experience evaluation. Recognizing and keeping track of user emotions in user research video is important to understand user needs and expectations from a service/product. Little research has been done that focuses on automating emotion extraction from a video where more than one modality has been incorporated in the field of UX. The study aims at implementing different modalities such as facial emotion recognition, speech-to-text and text-based emotion recognition for capturing emotional nuances from a user research video and extract meaningful actionable insights. For selection of facial emotion recognition model, 10 pre-trained models were evaluated on three benchmark datasets i.e. FER-2013, AffectNet and CK+, selecting the model with most generalization ability. To extract speech and convert to text, OpenAI's Whisper model was implemented and finally the emotions from text were recognized using a pre-trained model available at HuggingFace website having an evaluation accuracy more than 95%. The study also integrates the gathered data using temporal alignment and fusion for deeper and contextual insights. The study further demonstrates a way of automating data analysis through PandasAI Python library where OpenAI's GPT-4o model was implemented along with a discussion on other possible solutions. This study is an attempt to demonstrate a proof of concept where automated meaningful insights are extracted from a video based on user emotions.

Authors:Maria Padilla Engstrøm, Anders Sundnes Løvlie
Title: Using a Large Language Model as Design Material for an Interactive Museum Installation
Abstract:
We present a work in progress that explores using a Large Language Model (LLM) as a design material for an interactive museum installation. LLMs offer the possibility of creating chatbots that can facilitate dynamic and human-like conversation, engaging in a form of role play to bring historical persons to life for visitors. However, LLMs are prone to producing misinformation, which runs counter to museums' core mission to educate the public. We use Research-through-Design to explore some approaches to navigating this dilemma through rapid prototyping and evaluation and propose some directions for further research. We suggest that designers may shape interactions with the chatbot to emphasize personal narratives and role play rather than historical facts or to intentionally highlight the unreliability of the chatbot outputs to provoke critical reflection.

Authors:Francesco Kruk, Savindu Herath, Prithwiraj Choudhury
Title: BanglAssist: A Bengali-English Generative AI Chatbot for Code-Switching and Dialect-Handling in Customer Service
Abstract:
In recent years, large language models (LLMs) have demonstrated exponential improvements that promise transformative opportunities across various industries. Their ability to generate human-like text and ensure continuous availability facilitates the creation of interactive service chatbots aimed at enhancing customer experience and streamlining enterprise operations. Despite their potential, LLMs face critical challenges, such as a susceptibility to hallucinations and difficulties handling complex linguistic scenarios, notably code switching and dialectal variations. To address these challenges, this paper describes the design of a multilingual chatbot for Bengali-English customer service interactions utilizing retrieval-augmented generation (RAG) and targeted prompt engineering. This research provides valuable insights for the human-computer interaction (HCI) community, emphasizing the importance of designing systems that accommodate linguistic diversity to benefit both customers and businesses. By addressing the intersection of generative AI and cultural heterogeneity, this late-breaking work inspires future innovations in multilingual and multicultural HCI.

Authors:Xiaofei Zhou, Yi Zhang, Yufei Jiang, Yunfan Gong, Chi Zhang, Alissa N. Antle, Zhen Bai
Title: Briteller: Shining a Light on AI Recommendations for Children
Abstract:
Understanding how AI recommendations work can help the younger generation become more informed and critical consumers of the vast amount of information they encounter daily. However, young learners with limited math and computing knowledge often find AI concepts too abstract. To address this, we developed Briteller, a light-based recommendation system that makes learning tangible. By exploring and manipulating light beams, Briteller enables children to understand an AI recommender system's core algorithmic building block, the dot product, through hands-on interactions. Initial evaluations with ten middle school students demonstrated the effectiveness of this approach, using embodied metaphors, such as "merging light" to represent addition. To overcome the limitations of the physical optical setup, we further explored how AR could embody multiplication, expand data vectors with more attributes, and enhance contextual understanding. Our findings provide valuable insights for designing embodied and tangible learning experiences that make AI concepts more accessible to young learners.

Authors:Areen Khalaila, Gianna Everette, Suho Kim, Ian Roy
Title: StreetScape: Gamified Tactile Interactions for Collaborative Learning and Play
Abstract:
Spatial reasoning and collaboration are essential for childhood development, yet blind and visually impaired (BVI) children often lack access to tools that foster these skills. Tactile maps and assistive technologies primarily focus on individual navigation, overlooking the need for playful, inclusive, and collaborative interactions. We address this with StreetScape, a tactile street puzzle that enhances spatial skills and interdependence between BVI and sighted children. Featuring modular 3D-printed tiles, tactile roadways, and customizable decorative elements, StreetScape allows users to construct and explore cityscapes through gamified tactile interaction. Developed through an iterative design process, it integrates dynamic assembly and tactile markers for intuitive navigation, promoting spatial learning and fostering meaningful social connections. This work advances accessible design by demonstrating how tactile tools can effectively bridge educational and social gaps through collaborative play, redefining assistive technologies for children as a scalable platform that merges learning, creativity, and inclusivity.

Authors:Vikas Kushwaha, Sruti Srinivasa Ragavan, Subhajit Roy
Title: A Measure Based Generalizable Approach to Understandability
Abstract:
Successful agent-human partnerships require that any agent generated information is understandable to the human, and that the human can easily steer the agent towards a goal. Such effective communication requires the agent to develop a finer-level notion of what is understandable to the human. State-of-the-art agents, including LLMs, lack this detailed notion of understandability because they only capture average human sensibilities from the training data, and therefore afford limited steerability (e.g., requiring non-trivial prompt engineering). In this paper, instead of only relying on data, we argue for developing generalizable, domain-agnostic measures of understandability that can be used as directives for these agents. Existing research on understandability measures is fragmented, we survey various such efforts across domains, and lay a cognitive-science-rooted groundwork for more coherent and domain-agnostic research investigations in future.

Authors:Florian Onur Kuhlmeier, Leon Hanschmann, Melina Rabe, Stefan Luettke, Eva-Lotta Brakemeier, Alexander Maedche
Title: Combining Artificial Users and Psychotherapist Assessment to Evaluate Large Language Model-based Mental Health Chatbots
Abstract:
Large Language Models (LLMs) promise to overcome limitations of rule-based mental health chatbots through more natural conversations. However, evaluating LLM-based mental health chatbots presents a significant challenge: Their probabilistic nature requires comprehensive testing to ensure therapeutic quality, yet conducting such evaluations with people with depression would impose an additional burden on vulnerable people and risk exposing them to potentially harmful content. Our paper presents an evaluation approach for LLM-based mental health chatbots that combines dialogue generation with artificial users and dialogue evaluation by psychotherapists. We developed artificial users based on patient vignettes, systematically varying characteristics such as depression severity, personality traits, and attitudes toward chatbots, and let them interact with a LLM-based behavioral activation chatbot. Ten psychotherapists evaluated 48 randomly selected dialogues using standardized rating scales to assess the quality of behavioral activation and its therapeutic capabilities. We found that while artificial users showed moderate authenticity, they enabled comprehensive testing across different users. In addition, the chatbot demonstrated promising capabilities in delivering behavioral activation and maintaining safety. Furthermore, we identified deficits, such as ensuring the appropriateness of the activity plan, which reveals necessary improvements for the chatbot. Our framework provides an effective method for evaluating LLM-based mental health chatbots while protecting vulnerable people during the evaluation process. Future research should improve the authenticity of artificial users and develop LLM-augmented evaluation tools to make psychotherapist evaluation more efficient, and thus further advance the evaluation of LLM-based mental health chatbots.

Authors:Sora Kang, Mingu Lee
Title: An NLP-Driven Approach Using Twitter Data for Tailored K-pop Artist Recommendations
Abstract:
The global rise of K-pop and the digital revolution have paved the way for new dimensions in artist recommendations. With platforms like Twitter serving as a hub for fans to interact, share and discuss K-pop, a vast amount of data is generated that can be analyzed to understand listener preferences. However, current recommendation systems often overlook K- pop's inherent diversity, treating it as a singular entity. This paper presents an innovative method that utilizes Natural Language Processing to analyze tweet content and discern individual listening habits and preferences. The mass of Twitter data is methodically categorized using fan clusters, facilitating granular and personalized artist recommendations. Our approach marries the advanced GPT-4 model with large-scale social media data, offering potential enhancements in accuracy for K-pop recommendation systems and promising an elevated, personalized fan experience. In conclusion, acknowledging the heterogeneity within fanbases and capitalizing on readily available social media data marks a significant stride towards advancing personalized music recommendation systems.

Authors:Mitsuka Kiyohara, Ethan Mondri
Title: Coolight: Enhancing Nighttime Safety for Urban Student Commuters
Abstract:
Safety while walking alone at night is a key indicator of a citizen's well-being and a society's inclusiveness. However, this is not equally felt across all demographic groups, especially for university students living in urban areas. We present Coolight, a mobile application designed to reduce stress and anxiety for nighttime walking through an interactive live map, real-time community incident reports, location sharing, and a route planner optimized for user safety. Coolight's design was informed through interviews, questionnaires, and usability tests with university students and their friends and families in Toronto, Canada. This paper describes the concept, research, design approach, and evaluation results of a solution addressing safety concerns urban commuters face at night.

Authors:Saeedeh Mosaferchi, Alireza Mortezapour, Magnus Liebherr, Francesco Villecco, Alessandro Naddeo
Title: AV-TLX for measuring (mental) workload while driving AVs: Born from NASA-TLX but developed for the era of automated vehicles
Abstract:
The introduction of automated vehicles has redefined the level of interaction between the driver and the vehicle, introducing new tasks and so impose different workloads. Existing tools such as NASA-TLX and DALI are still used to assess driving workload in automated vehicles, despite not accounting for new tasks. This study introduces AV-TLX, a specialized tool for measuring workload in Level 3 automated driving. The development process began with a narrative literature review to identify the primary factors influencing workload. This was followed by a series of qualitative sessions during which the dimensions and later the questions of the questionnaire were designed. The tools validity was first assessed using CVR and CVI indices, and its reliability and convergent validity were evaluated using a dynamic driving simulator with high fidelity. The final version of AV-TLX comprises 19 questions across 8 subscales, demonstrating excellent reliability (0.86) and validity (CVR > 0.78). An agreement scores between the results of AV-TLX and NASA-TLX in the simulation study was 0.6, which is considered acceptable for the consistency of two questionnaires. Furthermore, this questionnaire can be utilized in two ways. First by reporting the overall workload and/or divided into 8 primary subscales, or by categorizing the questions into two groups including takeover task workload and automated driving task workload. The final version of this questionnaire, as presented in the paper, is available for use in future studies focusing on Level 3 automated driving.

Authors:Marc Satkowski, Weizhou Luo, Rufat Rzayev
Title: Of Affordance Opportunism in AR: Its Fallacies and Discussing Ways Forward
Abstract:
This position paper addresses the fallacies associated with the improper use of affordances in the opportunistic design of augmented reality (AR) applications. While opportunistic design leverages existing physical affordances for content placement and for creating tangible feedback in AR environments, their misuse can lead to confusion, errors, and poor user experiences. The paper emphasizes the importance of perceptible affordances and properly mapping virtual controls to appropriate physical features in AR applications by critically reflecting on four fallacies of facilitating affordances, namely, the subjectiveness of affordances, affordance imposition and reappropriation, properties and dynamicity of environments, and mimicking the real world. By highlighting these potential pitfalls and proposing a possible path forward, we aim to raise awareness and encourage more deliberate and thoughtful use of affordances in the design of AR applications.

Authors:Joel Kiskola, Henrik Rydenfelt, Thomas Olsson, Lauri Haapanen, Noora Vänttinen, Matti Nelimarkka, Minna Vigren, Salla-Maaria Laaksonen, Tuukka Lehtiniemi
Title: Generative AI and News Consumption: Design Fictions and Critical Analysis
Abstract:
The emergence of Generative AI features in news applications may radically change news consumption and challenge journalistic practices. To explore the future potentials and risks of this understudied area, we created six design fictions depicting scenarios such as virtual companions delivering news summaries to the user, AI providing context to news topics, and content being transformed into other formats on demand. The fictions, discussed with a multi-disciplinary group of experts, enabled a critical examination of the diverse ethical, societal, and journalistic implications of AI shaping this everyday activity. The discussions raised several concerns, suggesting that such consumer-oriented AI applications can clash with journalistic values and processes. These include fears that neither consumers nor AI could successfully balance engagement, objectivity, and truth, leading to growing detachment from shared understanding. We offer critical insights into the potential long-term effects to guide design efforts in this emerging application area of GenAI.

Authors:Weiyan Shi, Xuanzhi Wang, Kai Niu, Leye Wang, Daqing Zhang
Title: WiCross: Indoor Human Zone-Crossing Detection Using Commodity WiFi Devices
Abstract:
Detecting whether a target crosses the given zone (e.g., a door) can enable various practical applications in smart homes, including intelligent security and people counting. The traditional infrared-based approach only covers a line and can be easily cracked. In contrast, reusing the ubiquitous WiFi devices deployed in homes has the potential to cover a larger area of interest as WiFi signals are scattered throughout the entire space. By detecting the walking direction (i.e., approaching and moving away) with WiFi signal strength change, existing work can identify the behavior of crossing between WiFi transceiver pair. However, this method mistakenly classifies the turn-back behavior as crossing behavior, resulting in a high false alarm rate. In this paper, we propose WiCross, which can accurately distinguish the turn-back behavior with the phase statistics pattern of WiFi signals and thus robustly identify whether the target crosses the area between the WiFi transceiver pair. We implement WiCross with commercial WiFi devices and extensive experiments demonstrate that WiCross can achieve an accuracy higher than 95\% with a false alarm rate of less than 5%.

Authors:Ghazanfar Ali, Hong-Quan Le, Junho Kim, Seoung-won Hwang, Jae-In Hwang
Title: Design of Seamless Multi-modal Interaction Framework for Intelligent Virtual Agents in Wearable Mixed Reality Environment
Abstract:
In this paper, we present the design of a multimodal interaction framework for intelligent virtual agents in wearable mixed reality environments, especially for interactive applications at museums, botanical gardens, and similar places. These places need engaging and no-repetitive digital content delivery to maximize user involvement. An intelligent virtual agent is a promising mode for both purposes. Premises of framework is wearable mixed reality provided by MR devices supporting spatial mapping. We envisioned a seamless interaction framework by integrating potential features of spatial mapping, virtual character animations, speech recognition, gazing, domain-specific chatbot and object recognition to enhance virtual experiences and communication between users and virtual agents. By applying a modular approach and deploying computationally intensive modules on cloud-platform, we achieved a seamless virtual experience in a device with limited resources. Human-like gaze and speech interaction with a virtual agent made it more interactive. Automated mapping of body animations with the content of a speech made it more engaging. In our tests, the virtual agents responded within 2-4 seconds after the user query. The strength of the framework is flexibility and adaptability. It can be adapted to any wearable MR device supporting spatial mapping.

Authors:Zaira Romeo, Alberto Testolin
Title: Artificial Intelligence Can Emulate Human Normative Judgments on Emotional Visual Scenes
Abstract:
Affective reactions have deep biological foundations, however in humans the development of emotion concepts is also shaped by language and higher-order cognition. A recent breakthrough in AI has been the creation of multimodal language models that exhibit impressive intellectual capabilities, but their responses to affective stimuli have not been investigated. Here we study whether state-of-the-art multimodal systems can emulate human emotional ratings on a standardized set of images, in terms of affective dimensions and basic discrete emotions. The AI judgements correlate surprisingly well with the average human ratings: given that these systems were not explicitly trained to match human affective reactions, this suggests that the ability to visually judge emotional content can emerge from statistical learning over large-scale databases of images paired with linguistic descriptions. Besides showing that language can support the development of rich emotion concepts in AI, these findings have broad implications for sensitive use of multimodal AI technology.

Authors:Benjamin Knopp, Daniel Auras, Alexander C. Schütz, Dominik Endres
Title: Reading Decisions from Gaze Direction during Graphics Turing Test of Gait Animation
Abstract:
We investigated gaze direction during movement observation. The eye movement data were collected during an experiment, in which different models of movement production (based on movement primitives, MPs) were compared in a two alternatives forced choice task (2AFC). Participants observed side-by-side presentation of two naturalistic 3D-rendered human movement videos, where one video was based on motion captured gait sequence, the other one was generated by recombining the machine-learned MPs to approximate the same movement. The task was to discriminate between these movements while their eye movements were recorded. We are complementing previous binary decision data analyses with eye tracking data. Here, we are investigating the role of gaze direction during task execution. We computed the shared information between gaze features and decisions of the participants, and between gaze features and correct answers. We found that eye movements reflect the decision of participants during the 2AFC task, but not the correct answer. This result is important for future experiments, which should take advantage of eye tracking to complement binary decision data.

Authors:Emiram Kablo, Yorick Last, Patricia Arias Cabarcos, Melanie Volkamer
Title: The (Un)suitability of Passwords and Password Managers in Virtual Reality
Abstract:
As Virtual Reality (VR) expands into fields like healthcare and education, ensuring secure and user-friendly authentication becomes essential. Traditional password entry methods in VR are cumbersome and insecure, making password managers (PMs) a potential solution. To explore this field, we conducted a user study (n=126 VR users) where participants expressed a strong preference for simpler passwords and showed interest in biometric authentication and password managers. On these grounds, we provide the first in-depth evaluation of PMs in VR. We report findings from 91 cognitive walkthroughs, revealing that while PMs improve usability, they are not yet ready for prime time. Key features like cross-app autofill are missing, and user experiences highlight the need for better solutions. Based on consolidated user views and expert analysis, we make recommendations on how to move forward in improving VR authentication systems, ultimately creating more practical solutions for this growing field.

Authors:Bhada Yun, Dana Feng, Ace S. Chen, Afshin Nikzad, Niloufar Salehi
Title: Generative AI in Knowledge Work: Design Implications for Data Navigation and Decision-Making
Abstract:
Our study of 20 knowledge workers revealed a common challenge: the difficulty of synthesizing unstructured information scattered across multiple platforms to make informed decisions. Drawing on their vision of an ideal knowledge synthesis tool, we developed Yodeai, an AI-enabled system, to explore both the opportunities and limitations of AI in knowledge work. Through a user study with 16 product managers, we identified three key requirements for Generative AI in knowledge work: adaptable user control, transparent collaboration mechanisms, and the ability to integrate background knowledge with external information. However, we also found significant limitations, including overreliance on AI, user isolation, and contextual factors outside the AI's reach. As AI tools become increasingly prevalent in professional settings, we propose design principles that emphasize adaptability to diverse workflows, accountability in personal and collaborative contexts, and context-aware interoperability to guide the development of human-centered AI systems for product managers and knowledge workers.

Authors:Jussi Jokinen, Patrick Ebel, Tuomo Kujala
Title: Predicting Multitasking in Manual and Automated Driving with Optimal Supervisory Control
Abstract:
Modern driving involves interactive technologies that can divert attention, increasing the risk of accidents. This paper presents a computational cognitive model that simulates human multitasking while driving. Based on optimal supervisory control theory, the model predicts how multitasking adapts to variations in driving demands, interactive tasks, and automation levels. Unlike previous models, it accounts for context-dependent multitasking across different degrees of driving automation. The model predicts longer in-car glances on straight roads and shorter glances during curves. It also anticipates increased glance durations with driver aids such as lane-centering assistance and their interaction with environmental demands. Validated against two empirical datasets, the model offers insights into driver multitasking amid evolving in-car technologies and automation.

Authors:Ana Sanz Cozcolluela, Yasemin Vardar
Title: Generating Multimodal Textures with a Soft Hydro-Pneumatic Haptic Ring
Abstract:
The growing adoption of extended reality, XR, has driven demand for wearable technologies that can replicate natural tactile sensations and allow users to interact freely with their surroundings using bare fingers. However, most existing wearable haptic technologies that support such free interactions can deliver sensations across limited tactile modalities. Here, we introduce a soft haptic ring and a data-driven rendering methodology to generate multimodal texture sensations. The device integrates pneumatic and hydraulic actuation to simulate roughness, thermal, and softness cues on the proximal phalanx, enabling users to explore surroundings naturally with their fingertips. The rendering methodology dynamically modulates those cues based on the user's exploratory actions. We validated our approach by conducting a user study with fifteen participants, who matched six virtual textures generated by the ring to their real counterparts and rated their perceived sensations. Participants achieved up to ninety percent accuracy in texture matching. The adjective ratings confirmed that the ring delivers distinct, perceptually rich stimuli across all rendered sensations. These findings highlight the ring's potential for immersive XR applications, offering diverse tactile feedback without restricting physical interaction.

Authors:Stefan Pasch, Sun-Young Ha
Title: Human-AI Interaction and User Satisfaction: Empirical Evidence from Online Reviews of AI Products
Abstract:
Human-AI Interaction (HAI) guidelines and design principles have become increasingly important in both industry and academia to guide the development of AI systems that align with user needs and expectations. However, large-scale empirical evidence on how HAI principles shape user satisfaction in practice remains limited. This study addresses that gap by analyzing over 100,000 user reviews of AI-related products from G2, a leading review platform for business software and services. Based on widely adopted industry guidelines, we identify seven core HAI dimensions and examine their coverage and sentiment within the reviews. We find that the sentiment on four HAI dimensions-adaptability, customization, error recovery, and security-is positively associated with overall user satisfaction. Moreover, we show that engagement with HAI dimensions varies by professional background: Users with technical job roles are more likely to discuss system-focused aspects, such as reliability, while non-technical users emphasize interaction-focused features like customization and feedback. Interestingly, the relationship between HAI sentiment and overall satisfaction is not moderated by job role, suggesting that once an HAI dimension has been identified by users, its effect on satisfaction is consistent across job roles.

Authors:Karol Chlasta, Katarzyna Wisiecka, Krzysztof Krejtz, Izabela Krejtz
Title: AI-Based Screening for Depression and Social Anxiety Through Eye Tracking: An Exploratory Study
Abstract:
Well-being is a dynamic construct that evolves over time and fluctuates within individuals, presenting challenges for accurate quantification. Reduced well-being is often linked to depression or anxiety disorders, which are characterised by biases in visual attention towards specific stimuli, such as human faces. This paper introduces a novel approach to AI-assisted screening of affective disorders by analysing visual attention scan paths using convolutional neural networks (CNNs). Data were collected from two studies examining (1) attentional tendencies in individuals diagnosed with major depression and (2) social anxiety. These data were processed using residual CNNs through images generated from eye-gaze patterns. Experimental results, obtained with ResNet architectures, demonstrated an average accuracy of 48% for a three-class system and 62% for a two-class system. Based on these exploratory findings, we propose that this method could be employed in rapid, ecological, and effective mental health screening systems to assess well-being through eye-tracking.

Authors:Viktor Dorfler, Dylan Dryden, Viet Lee
Title: Intanify AI Platform: Embedded AI for Automated IP Audit and Due Diligence
Abstract:
In this paper we introduce a Platform created in order to support SMEs' endeavor to extract value from their intangible assets effectively. To implement the Platform, we developed five knowledge bases using a knowledge-based ex-pert system shell that contain knowledge from intangible as-set consultants, patent attorneys and due diligence lawyers. In order to operationalize the knowledge bases, we developed a "Rosetta Stone", an interpreter unit for the knowledge bases outside the shell and embedded in the plat-form. Building on the initial knowledge bases we have created a system of red flags, risk scoring, and valuation with the involvement of the same experts; these additional systems work upon the initial knowledge bases and therefore they can be regarded as meta-knowledge-representations that take the form of second-order knowledge graphs. All this clever technology is dressed up in an easy-to-handle graphical user interface that we will showcase at the conference. The initial platform was finished mid-2024; therefore, it qualifies as an "emerging application of AI" and "deployable AI", while development continues. The two firms that provided experts for developing the knowledge bases obtained a white-label version of the product (i.e. it runs under their own brand "powered by Intanify"), and there are two completed cases.

Authors:Anton Leontyev, Takashi Yamauchi
Title: Core Components of Emotional Impulsivity: A Mouse-Cursor Tracking Study
Abstract:
Impulsive individuals exhibit abnormal reward processing (heightened preference for immediate rewards, i.e., impulsive choice, IC) and a penchant for maladaptive action (the inability to inhibit inappropriate actions, i.e., impulsive action, IA). Both impulsive choice and impulsive action are strongly influenced by emotions (emotional impulsivity); yet how emotions impact impulse behavior remains unclear. The traditional theory suggests that emotions primarily exacerbate impulsive action and prompts impulsive choice. The alternative theory states that emotions primarily disrupt attention (attentional impulsivity, AImp) and prompt impulsive choice. In two studies, we probed the interplay among emotions, impulsive action (IA), attentional impulsivity (AImp), and impulsive choice (IC). We elicited positive and negative emotions using emotional pictures and examined the extent to which elicited emotions altered behavioral indices of impulsivity.

Authors:John Naulty, Eason Chen, Joy Wang, George Digkas, Kostas Chalkias
Title: Bugdar: AI-Augmented Secure Code Review for GitHub Pull Requests
Abstract:
As software systems grow increasingly complex, ensuring security during development poses significant challenges. Traditional manual code audits are often expensive, time-intensive, and ill-suited for fast-paced workflows, while automated tools frequently suffer from high false-positive rates, limiting their reliability. To address these issues, we introduce Bugdar, an AI-augmented code review system that integrates seamlessly into GitHub pull requests, providing near real-time, context-aware vulnerability analysis. Bugdar leverages fine-tunable Large Language Models (LLMs) and Retrieval Augmented Generation (RAGs) to deliver project-specific, actionable feedback that aligns with each codebase's unique requirements and developer practices. Supporting multiple programming languages, including Solidity, Move, Rust, and Python, Bugdar demonstrates exceptional efficiency, processing an average of 56.4 seconds per pull request or 30 lines of code per second. This is significantly faster than manual reviews, which could take hours per pull request. By facilitating a proactive approach to secure coding, Bugdar reduces the reliance on manual reviews, accelerates development cycles, and enhances the security posture of software systems without compromising productivity.

Authors:Matthew Kenely, Dylan Seychell, Carl James Debono, Chris Porter
Title: A Deep Learning Framework for Visual Attention Prediction and Analysis of News Interfaces
Abstract:
News outlets' competition for attention in news interfaces has highlighted the need for demographically-aware saliency prediction models. Despite recent advancements in saliency detection applied to user interfaces (UI), existing datasets are limited in size and demographic representation. We present a deep learning framework that enhances the SaRa (Saliency Ranking) model with DeepGaze IIE, improving Salient Object Ranking (SOR) performance by 10.7%. Our framework optimizes three key components: saliency map generation, grid segment scoring, and map normalization. Through a two-fold experiment using eye-tracking (30 participants) and mouse-tracking (375 participants aged 13--70), we analyze attention patterns across demographic groups. Statistical analysis reveals significant age-based variations (p < 0.05, {ε^2} = 0.042), with older users (36--70) engaging more with textual content and younger users (13--35) interacting more with images. Mouse-tracking data closely approximates eye-tracking behavior (sAUC = 0.86) and identifies UI elements that immediately stand out, validating its use in large-scale studies. We conclude that saliency studies should prioritize gathering data from a larger, demographically representative sample and report exact demographic distributions.

Authors:Sora Kang, Andreea-Elena Potinteu, Nadia Said
Title: ExplainitAI: When do we trust artificial intelligence? The influence of content and explainability in a cross-cultural comparison
Abstract:
This study investigates cross-cultural differences in the perception of AI-driven chatbots between Germany and South Korea, focusing on topic dependency and explainability. Using a custom AI chat interface, ExplainitAI, we systematically examined these factors with quota-based samples from both countries (N = 297). Our findings revealed significant cultural distinctions: Korean participants exhibited higher trust, more positive user experience ratings, and more favorable perception of AI compared to German participants. Additionally, topic dependency was a key factor, with participants reporting lower trust in AI when addressing societally debated topics (e.g., migration) versus health or entertainment topics. These perceptions were further influenced by interactions among cultural context, content domains, and explainability conditions. The result highlights the importance of integrating cultural and contextual nuances into the design of AI systems, offering actionable insights for the development of culturally adaptive and explainable AI tailored to diverse user needs and expectations across domains.

Authors:Andrew C Dwyer, Lizzie Coles-Kemp, Clara Crivellaro, Claude P R Heath
Title: Friend or Foe? Navigating and Re-configuring "Snipers' Alley"
Abstract:
In a 'digital by default' society, essential services must be accessed online. This opens users to digital deception not only from criminal fraudsters but from a range of actors in a marketised digital economy. Using grounded empirical research from northern England, we show how supposedly 'trusted' actors, such as governments,(re)produce the insecurities and harms that they seek to prevent. Enhanced by a weakening of social institutions amid a drive for efficiency and scale, this has built a constricted, unpredictable digital channel. We conceptualise this as a "snipers' alley". Four key snipers articulated by participants' lived experiences are examined: 1) Governments; 2) Business; 3) Criminal Fraudsters; and 4) Friends and Family to explore how snipers are differentially experienced and transfigure through this constricted digital channel. We discuss strategies to re-configure the alley, and how crafting and adopting opportunity models can enable more equitable forms of security for all.

Authors:Kate Letheren, Nicole Robinson
Title: Rude Humans and Vengeful Robots: Examining Human Perceptions of Robot Retaliatory Intentions in Professional Settings
Abstract:
Humans and robots are increasingly working in personal and professional settings. In workplace settings, humans and robots may work together as colleagues, potentially leading to social expectations, or violation thereof. Extant research has primarily sought to understand social interactions and expectations in personal rather than professional settings, and none of these studies have examined negative outcomes arising from violations of social expectations. This paper reports the results of a 2x3 online experiment that used a unique first-person perspective video to immerse participants in a collaborative workplace setting. The results are nuanced and reveal that while robots are expected to act in accordance with social expectations despite human behavior, there are benefits for robots perceived as being the bigger person in the face of human rudeness. Theoretical and practical implications are provided which discuss the import of these findings for the design of social robots.

Authors:Andre G. C. Pacheco, Athus Cavalini, Giovanni Comarela
Title: Echoes of Power: Investigating Geopolitical Bias in US and China Large Language Models
Abstract:
Large Language Models (LLMs) have emerged as powerful tools for generating human-like text, transforming human-machine interactions. However, their widespread adoption has raised concerns about their potential to influence public opinion and shape political narratives. In this work, we investigate the geopolitical biases in US and Chinese LLMs, focusing on how these models respond to questions related to geopolitics and international relations. We collected responses from ChatGPT and DeepSeek to a set of geopolitical questions and evaluated their outputs through both qualitative and quantitative analyses. Our findings show notable biases in both models, reflecting distinct ideological perspectives and cultural influences. However, despite these biases, for a set of questions, the models' responses are more aligned than expected, indicating that they can address sensitive topics without necessarily presenting directly opposing viewpoints. This study highlights the potential of LLMs to shape public discourse and underscores the importance of critically assessing AI-generated content, particularly in politically sensitive contexts.

Authors:Wanyi Chen, Mary Cummings
Title: To impute or not to impute: How machine learning modelers treat missing data
Abstract:
Missing data is prevalent in tabular machine learning (ML) models, and different missing data treatment methods can significantly affect ML model training results. However, little is known about how ML researchers and engineers choose missing data treatment methods and what factors affect their choices. To this end, we conducted a survey of 70 ML researchers and engineers. Our results revealed that most participants were not making informed decisions regarding missing data treatment, which could significantly affect the validity of the ML models trained by these researchers. We advocate for better education on missing data, more standardized missing data reporting, and better missing data analysis tools.

Authors:Xin Huang, Shiyao Zhu, Ziyu Wang, Yaping He, Hao Jin, Zhengkui Liu
Title: EVA-MED: An Enhanced Valence-Arousal Multimodal Emotion Dataset for Emotion Recognition
Abstract:
We introduce a novel multimodal emotion recognition dataset that enhances the precision of Valence-Arousal Model while accounting for individual differences. This dataset includes electroencephalography (EEG), electrocardiography (ECG), and pulse interval (PI) from 64 participants. Data collection employed two emotion induction paradigms: video stimuli that targeted different valence levels (positive, neutral, and negative) and the Mannheim Multicomponent Stress Test (MMST), which induced high arousal through cognitive, emotional, and social stressors. To enrich the dataset, participants' personality traits, anxiety, depression, and emotional states were assessed using validated questionnaires. By capturing a broad spectrum of affective responses while accounting for individual differences, this dataset provides a robust resource for precise emotion modeling. The integration of multimodal physiological data with psychological assessments lays a strong foundation for personalized emotion recognition. We anticipate this resource will support the development of more accurate, adaptive, and individualized emotion recognition systems across diverse applications.

Authors:Rafael Padilla Perez, Özgür Keleş
Title: Immersive Virtual Reality Environments for Embodied Learning of Engineering Students
Abstract:
Recent advancements in virtual reality (VR) technology have enabled the creation of immersive learning environments that provide engineering students with hands-on, interactive experiences. This paper presents a novel framework for virtual laboratory environments (VLEs) focused on embodied learning, specifically designed to teach concepts related to mechanical and materials engineering. Utilizing the principles of embodiment and congruency, these VR modules offer students the opportunity to engage physically with virtual specimens and machinery, thereby enhancing their understanding of complex topics through sensory immersion and kinesthetic interaction. Our framework employs an event-driven, directed-graph-based architecture developed with Unity 3D and C#, ensuring modularity and scalability. Students interact with the VR environment by performing tasks such as selecting and testing materials, which trigger various visual and haptic events to simulate real-world laboratory conditions. A pre-/post-test evaluation method was used to assess the educational effectiveness of these VR modules. Results demonstrated significant improvements in student comprehension and retention, with notable increases in test scores compared to traditional non-embodied VR methods. The implementation of these VLEs in a university setting highlighted their potential to democratize access to high-cost laboratory experiences, making engineering education more accessible and effective. By fostering a deeper connection between cognitive processes and physical actions, our VR framework not only enhances learning outcomes but also provides a template for future developments in VR-based education. Our study suggests that immersive VR environments can significantly improve the learning experience for engineering students.

Authors:Ning Li, Wenming Deng, Jiatan Chen
Title: From G-Factor to A-Factor: Establishing a Psychometric Framework for AI Literacy
Abstract:
This research addresses the growing need to measure and understand AI literacy in the context of generative AI technologies. Through three sequential studies involving a total of 517 participants, we establish AI literacy as a coherent, measurable construct with significant implications for education, workforce development, and social equity. Study 1 (N=85) revealed a dominant latent factor - termed the "A-factor" - that accounts for 44.16% of variance across diverse AI interaction tasks. Study 2 (N=286) refined the measurement tool by examining four key dimensions of AI literacy: communication effectiveness, creative idea generation, content evaluation, and step-by-step collaboration, resulting in an 18-item assessment battery. Study 3 (N=146) validated this instrument in a controlled laboratory setting, demonstrating its predictive validity for real-world task performance. Results indicate that AI literacy significantly predicts performance on complex, language-based creative tasks but shows domain specificity in its predictive power. Additionally, regression analyses identified several significant predictors of AI literacy, including cognitive abilities (IQ), educational background, prior AI experience, and training history. The multidimensional nature of AI literacy and its distinct factor structure provide evidence that effective human-AI collaboration requires a combination of general and specialized abilities. These findings contribute to theoretical frameworks of human-AI collaboration while offering practical guidance for developing targeted educational interventions to promote equitable access to the benefits of generative AI technologies.

Authors:Mehmet Akhoroz, Caglar Yildirim
Title: Conversational AI as a Coding Assistant: Understanding Programmers' Interactions with and Expectations from Large Language Models for Coding
Abstract:
Conversational AI interfaces powered by large language models (LLMs) are increasingly used as coding assistants. However, questions remain about how programmers interact with LLM-based conversational agents, the challenges they encounter, and the factors influencing adoption. This study investigates programmers' usage patterns, perceptions, and interaction strategies when engaging with LLM-driven coding assistants. Through a survey, participants reported both the benefits, such as efficiency and clarity of explanations, and the limitations, including inaccuracies, lack of contextual awareness, and concerns about over-reliance. Notably, some programmers actively avoid LLMs due to a preference for independent learning, distrust in AI-generated code, and ethical considerations. Based on our findings, we propose design guidelines for improving conversational coding assistants, emphasizing context retention, transparency, multimodal support, and adaptability to user preferences. These insights contribute to the broader understanding of how LLM-based conversational agents can be effectively integrated into software development workflows while addressing adoption barriers and enhancing usability.

Authors:Alva Markelius, Julie Bailey, Jenny L. Gibson, Hatice Gunes
Title: Stakeholder Perspectives on Whether and How Social Robots Can Support Mediation and Advocacy for Higher Education Students with Disabilities
Abstract:
This paper presents an iterative, participatory, empirical study that examines the potential of using artificial intelligence, such as social robots and large language models, to support mediation and advocacy for students with disabilities in higher education. Drawing on qualitative data from interviews and focus groups conducted with various stakeholders, including disabled students, disabled student representatives, and disability practitioners at the University of Cambridge, this study reports findings relating to understanding the problem space, ideating robotic support and participatory co-design of advocacy support robots. The findings highlight the potential of these technologies in providing signposting and acting as a sounding board or study companion, while also addressing limitations in empathic understanding, trust, equity, and accessibility. We discuss ethical considerations, including intersectional biases, the double empathy problem, and the implications of deploying social robots in contexts shaped by structural inequalities. Finally, we offer a set of recommendations and suggestions for future research, rethinking the notion of corrective technological interventions to tools that empower and amplify self-advocacy.

Authors:Kunal Chavan, Keertan Balaji, Spoorti Barigidad, Samba Raju Chiluveru
Title: VocalEyes: Enhancing Environmental Perception for the Visually Impaired through Vision-Language Models and Distance-Aware Object Detection
Abstract:
With an increasing demand for assistive technologies that promote the independence and mobility of visually impaired people, this study suggests an innovative real-time system that gives audio descriptions of a user's surroundings to improve situational awareness. The system acquires live video input and processes it with a quantized and fine-tuned Florence-2 big model, adjusted to 4-bit accuracy for efficient operation on low-power edge devices such as the NVIDIA Jetson Orin Nano. By transforming the video signal into frames with a 5-frame latency, the model provides rapid and contextually pertinent descriptions of objects, pedestrians, and barriers, together with their estimated distances. The system employs Parler TTS Mini, a lightweight and adaptable Text-to-Speech (TTS) solution, for efficient audio feedback. It accommodates 34 distinct speaker types and enables customization of speech tone, pace, and style to suit user requirements. This study examines the quantization and fine-tuning techniques utilized to modify the Florence-2 model for this application, illustrating how the integration of a compact model architecture with a versatile TTS component improves real-time performance and user experience. The proposed system is assessed based on its accuracy, efficiency, and usefulness, providing a viable option to aid vision-impaired users in navigating their surroundings securely and successfully.

Authors:M. A. F. Aamina, V. Kavishcan, W. M. P. B. B. Jayaratne, K. K. D. S. N. Kannangara, A. A. Aamil, Achini Adikari
Title: Accodemy: AI Powered Code Learning Platform to Assist Novice Programmers in Overcoming the Fear of Coding
Abstract:
Computer programming represents a rapidly evolving and sought-after career path in the 21st century. Nevertheless, novice learners may find the process intimidating for several reasons, such as limited and highly competitive career opportunities, peer and parental pressure for academic success, and course difficulties. These factors frequently contribute to anxiety and eventual dropout as a result of fear. Furthermore, research has demonstrated that beginners are significantly deterred by the fear of failure, which results in programming anxiety and and a sense of being overwhelmed by intricate topics, ultimately leading to dropping out. This project undertakes an exploration beyond the scope of conventional code learning platforms by identifying and utilising effective and personalised strategies of learning. The proposed solution incorporates features such as AI-generated challenging questions, mindfulness quotes, and tips to motivate users, along with an AI chatbot that functions as a motivational aid. In addition, the suggested solution integrates personalized roadmaps and gamification elements to maintain user involvement. The project aims to systematically monitor the progress of novice programmers and enhance their knowledge of coding with a personalised, revised curriculum to help mitigate the fear of coding and boost confidence.

Authors:Matthew Nyaaba, Min SungEun, Mary Abiswin Apam, Kwame Owoahene Acheampong, Emmanuel Dwamena
Title: Optimizing Generative AI's Accuracy and Transparency in Inductive Thematic Analysis: A Human-AI Comparison
Abstract:
This study highlights the transparency and accuracy of GenAI's inductive thematic analysis, particularly using GPT-4 Turbo API integrated within a stepwise prompt-based Python script. This approach ensured a traceable and systematic coding process, generating codes with supporting statements and page references, which enhanced validation and reproducibility. The results indicate that GenAI performs inductive coding in a manner closely resembling human coders, effectively categorizing themes at a level like the average human coder. However, in interpretation, GenAI extends beyond human coders by situating themes within a broader conceptual context, providing a more generalized and abstract perspective.

Authors:Mahmoud Hamash, Md Raqib Khan, Peter Tiernan
Title: Inclusive STEAM Education: A Framework for Teaching Cod-2 ing and Robotics to Students with Visually Impairment Using 3 Advanced Computer Vision
Abstract:
STEAM education integrates Science, Technology, Engineering, Arts, and Mathematics to foster creativity and problem-solving. However, students with visual impairments (VI) encounter significant challenges in programming and robotics, particularly in tracking robot movements and developing spatial awareness. This paper presents a framework that leverages pre-constructed robots and algorithms, such as maze-solving techniques, within an accessible learning environment. The proposed system employs Contrastive Language-Image Pre-training (CLIP) to process global camera-captured maze layouts, converting visual data into textual descriptions that generate spatial audio prompts in an Audio Virtual Reality (AVR) system. Students issue verbal commands, which are refined through CLIP, while robot-mounted stereo cameras provide real-time data processed via Simultaneous Localization and Mapping (SLAM) for continuous feedback. By integrating these technologies, the framework empowers VI students to develop coding skills and engage in complex problem-solving tasks. Beyond maze-solving applications, this approach demonstrates the broader potential of computer vision in special education, contributing to improved accessibility and learning experiences in STEAM disciplines.

Authors:Chaoyi Zhao, Wei Xu
Title: Human-AI Interaction Design Standards
Abstract:
The rapid development of artificial intelligence (AI) has significantly transformed human-computer interactions, making it essential to establish robust design standards to ensure effective, ethical, and human-centered AI (HCAI) solutions. Standards serve as the foundation for the adoption of new technologies, and human-AI interaction (HAII) standards are critical to supporting the industrialization of AI technology by following an HCAI approach. These design standards aim to provide clear principles, requirements, and guidelines for designing, developing, deploying, and using AI systems, enhancing the user experience and performance of AI systems. Despite their importance, the creation and adoption of HCAI-based interaction design standards face challenges, including the absence of universal frameworks, the inherent complexity of HAII, and the ethical dilemmas that arise in such systems. This chapter provides a comparative analysis of HAII versus traditional human-computer interaction (HCI) and outlines guiding principles for HCAI-based design. It explores international, regional, national, and industry standards related to HAII design from an HCAI perspective and reviews design guidelines released by leading companies such as Microsoft, Google, and Apple. Additionally, the chapter highlights tools available for implementing HAII standards and presents case studies of human-centered interaction design for AI systems in diverse fields, including healthcare, autonomous vehicles, and customer service. It further examines key challenges in developing HAII standards and suggests future directions for the field. Emphasizing the importance of ongoing collaboration between AI designers, developers, and experts in human factors and HCI, this chapter stresses the need to advance HCAI-based interaction design standards to ensure human-centered AI solutions across various domains.

Authors:Anargh Viswanath, Lokesh Veeramacheneni, Hendrik Buschmeier
Title: Enhancing Explainability with Multimodal Context Representations for Smarter Robots
Abstract:
Artificial Intelligence (AI) has significantly advanced in recent years, driving innovation across various fields, especially in robotics. Even though robots can perform complex tasks with increasing autonomy, challenges remain in ensuring explainability and user-centered design for effective interaction. A key issue in Human-Robot Interaction (HRI) is enabling robots to effectively perceive and reason over multimodal inputs, such as audio and vision, to foster trust and seamless collaboration. In this paper, we propose a generalized and explainable multimodal framework for context representation, designed to improve the fusion of speech and vision modalities. We introduce a use case on assessing 'Relevance' between verbal utterances from the user and visual scene perception of the robot. We present our methodology with a Multimodal Joint Representation module and a Temporal Alignment module, which can allow robots to evaluate relevance by temporally aligning multimodal inputs. Finally, we discuss how the proposed framework for context representation can help with various aspects of explainability in HRI.

Authors:Shinnosuke Sawano, Satoshi Kodera
Title: Human-Centered AI in Multidisciplinary Medical Discussions: Evaluating the Feasibility of a Chat-Based Approach to Case Assessment
Abstract:
In this study, we investigate the feasibility of using a human-centered artificial intelligence (AI) chat platform where medical specialists collaboratively assess complex cases. As the target population for this platform, we focus on patients with cardiovascular diseases who are in a state of multimorbidity, that is, suffering from multiple chronic conditions. We evaluate simulated cases with multiple diseases using a chat application by collaborating with physicians to assess feasibility, efficiency gains through AI utilization, and the quantification of discussion content. We constructed simulated cases based on past case reports, medical errors reports and complex cases of cardiovascular diseases experienced by the physicians. The analysis of discussions across five simulated cases demonstrated a significant reduction in the time required for summarization using AI, with an average reduction of 79.98\%. Additionally, we examined hallucination rates in AI-generated summaries used in multidisciplinary medical discussions. The overall hallucination rate ranged from 1.01\% to 5.73\%, with an average of 3.62\%, whereas the harmful hallucination rate varied from 0.00\% to 2.09\%, with an average of 0.49\%. Furthermore, morphological analysis demonstrated that multidisciplinary assessments enabled a more complex and detailed representation of medical knowledge compared with single physician assessments. We examined structural differences between multidisciplinary and single physician assessments using centrality metrics derived from the knowledge graph. In this study, we demonstrated that AI-assisted summarization significantly reduced the time required for medical discussions while maintaining structured knowledge representation. These findings can support the feasibility of AI-assisted chat-based discussions as a human-centered approach to multidisciplinary medical decision-making.

Authors:Minsu Chang, Doyoung Jeon
Title: The Realization of Virtual Environments in the Lower Limb Exoskeletal Robot
Abstract:
This study proposes the realization of various virtual environments using a lower limb exoskeletal robot for futuristic gait rehabilitation. The proposed method allows the user to feel virtual gravity, buoyancy, and drag while actively walking. The virtual environments include four fluidic conditions: Water, Olive oil, Honey, and Peanut Butter, and four gravitational conditions consisting of the Earth's, Moon's, Mars', and Jupiter's gravity. The control method of the lower limb exoskeletal robot is as follows. First, torque feedback is applied to control the interaction force between the exoskeletal robot and its user. Second, the reference torque is computed in real time with the dynamic equations of the human body and the kinematic data. The eight environments were implemented via the EXOWheel, a wheelchair-integrated lower limb exoskeletal robot. While attaching electromyography sensors and wearing the EXOWheel, eight healthy subjects walked actively under the virtual conditions. Experimental results show that muscular force signals adequately change depending on gravitational, buoyant, and drag effects. Blind tests confirmed that subjects could reliably distinguish all eight virtual environments.

Authors:Shelby Ziccardi, Zach Chavis, Rachel L. Hawe, Stephen J. Guy
Title: Reaching Motion Characterization Across Childhood via Augmented Reality Games
Abstract:
While performance in coordinated motor tasks has been shown to improve in children as they age, the characterization of children's movement strategies has been underexplored. In this work, we use upper-body motion data collected from an augmented reality reaching game, and show that short (13 second) sections of motion are are sufficient to reveal arm motion differences across child development. To explore what drives this trend, we characterize the movement patterns across different age groups by analyzing (1) directness of path, (2) maximum speed, and (3) progress towards the reaching target. We find that although maximum arm velocity decreases with age (p~=~0.02), their paths to goal are more direct (p~=~0.03), allowing for faster time to goal overall. We also find that older children exhibit more anticipatory reaching behavior, enabling more accurate goal-reaching (i.e. no overshooting) compared to younger children. The resulting analysis has potential to improve the realism of child-like digital characters and advance our understanding of motor skill development.

Authors:Alicia Martin-Navarro, Maria Paula Lechuga-Sancho, Marek Szelagowski, Jose Aurelio Medina-Garrido
Title: Is User Perception the Key to Unlocking the Full Potential of Business Process Management Systems (BPMS)? Enhancing BPMS Efficacy Through User Perception
Abstract:
This study investigates factors influencing employees' perceptions of the usefulness of Business Process Management Systems (BPMS) in commercial settings. It explores the roles of system dependency, system quality, and the quality of information and knowledge in the adoption and use of BPMS. Data were collected using a structured questionnaire from end-users in various firms and analyzed with Partial Least Squares (PLS). The survey evaluated perceptions of service quality, input quality, system attributes, and overall system quality. The findings indicate that service quality, input quality, and specific system attributes significantly influence perceived system quality, while system dependency and information quality are predictors of perceived usefulness. The results highlight the importance of user training, support, and high-quality information in enhancing satisfaction and BPMS. This research offers empirical evidence on the factors impacting user perceptions and acceptance, emphasizing the need for user-centric approaches in BPMS.

Authors:Ary-Yue Huang, Varvara Guljajeva
Title: Situational Agency: The Framework for Designing Behavior in Agent-based art
Abstract:
In the context of artificial life art and agent-based art, this paper draws on Simon Penny's {\itshape Aesthetic of Behavior} theory and Sofian Audry's discussions on behavior computation to examine how artists design agent behaviors and the ensuing aesthetic experiences. We advocate for integrating the environment in which agents operate as the context for behavioral design, positing that the environment emerges through continuous interactions among agents, audiences, and other entities, forming an evolving network of meanings generated by these interactions. Artists create contexts by deploying and guiding these computational systems, audience participation, and agent behaviors through artist strategies. This framework is developed by analysing two categories of agent-based artworks, exploring the intersection of computational systems, audience participation, and artistic strategies in creating aesthetic experiences. This paper seeks to provide a contextual foundation and framework for designing agents' behaviors by conducting a comparative study focused on behavioural design strategies by the artists.

Authors:Pinyao Liu, Keon Ju Lee, Alexander Steinmaurer, Claudia Picard-Deland, Michelle Carr, Alexandra Kitson
Title: DreamLLM-3D: Affective Dream Reliving using Large Language Model and 3D Generative AI
Abstract:
We present DreamLLM-3D, a composite multimodal AI system behind an immersive art installation for dream re-experiencing. It enables automated dream content analysis for immersive dream-reliving, by integrating a Large Language Model (LLM) with text-to-3D Generative AI. The LLM processes voiced dream reports to identify key dream entities (characters and objects), social interaction, and dream sentiment. The extracted entities are visualized as dynamic 3D point clouds, with emotional data influencing the color and soundscapes of the virtual dream environment. Additionally, we propose an experiential AI-Dreamworker Hybrid paradigm. Our system and paradigm could potentially facilitate a more emotionally engaging dream-reliving experience, enhancing personal insights and creativity.

Authors:Venkat Ram Reddy Ganuthula, Krishna Kumar Balaraman
Title: Artificial Intelligence Quotient (AIQ): A Novel Framework for Measuring Human-AI Collaborative Intelligence
Abstract:
As artificial intelligence becomes increasingly integrated into professional and personal domains, traditional metrics of human intelligence require reconceptualization. This paper introduces the Artificial Intelligence Quotient (AIQ), a novel measurement framework designed to assess an individual's capacity to effectively collaborate with and leverage AI systems, particularly Large Language Models (LLMs). Building upon established cognitive assessment methodologies and contemporary AI interaction research, we present a comprehensive framework for quantifying human-AI collaborative intelligence. This work addresses the growing need for standardized evaluation of AI-augmented cognitive capabilities in educational and professional contexts.

Authors:Yutaka Matsubara, Akihisa Morikawa, Daichi Mizuguchi, Kiyoshi Fujiwara
Title: Enhancing Human-Robot Collaboration through Existing Guidelines: A Case Study Approach
Abstract:
As AI systems become more prevalent, concerns about their development, operation, and societal impact intensify. Establishing ethical, social, and safety standards amidst evolving AI capabilities poses significant challenges. Global initiatives are underway to establish guidelines for AI system development and operation. With the increasing use of collaborative human-AI task execution, it's vital to continuously adapt AI systems to meet user and environmental needs. Failure to synchronize AI evolution with changes in users and the environment could result in ethical and safety issues. This paper evaluates the applicability of existing guidelines in human-robot collaborative systems, assesses their effectiveness, and discusses limitations. Through a case study, we examine whether our target system meets requirements outlined in existing guidelines and propose improvements to enhance human-robot interactions. Our contributions provide insights into interpreting and applying guidelines, offer concrete examples of system enhancement, and highlight their applicability and limitations. We believe these contributions will stimulate discussions and influence system assurance and certification in future AI-infused critical systems.

Authors:Andrew Cho, Jason M. Woo, Brian Shi, Aishwaryaa Udeshi, Jonathan S. H. Woo
Title: The Application of MATEC (Multi-AI Agent Team Care) Framework in Sepsis Care
Abstract:
Under-resourced or rural hospitals have limited access to medical specialists and healthcare professionals, which can negatively impact patient outcomes in sepsis. To address this gap, we developed the MATEC (Multi-AI Agent Team Care) framework, which integrates a team of specialized AI agents for sepsis care. The sepsis AI agent team includes five doctor agents, four health professional agents, and a risk prediction model agent, with an additional 33 doctor agents available for consultations. Ten attending physicians at a teaching hospital evaluated this framework, spending approximately 40 minutes on the web-based MATEC application and participating in the 5-point Likert scale survey (rated from 1-unfavorable to 5-favorable). The physicians found the MATEC framework very useful (Median=4, P=0.01), and very accurate (Median=4, P<0.01). This pilot study demonstrates that a Multi-AI Agent Team Care framework (MATEC) can potentially be useful in assisting medical professionals, particularly in under-resourced hospital settings.

Authors:Jeremy C. -H. Wang, Ming Hou, David Dunwoody, Marko Ilievski, Justin Tomasi, Edward Chao, Carl Pigeon
Title: Flight Testing an Optionally Piloted Aircraft: a Case Study on Trust Dynamics in Human-Autonomy Teaming
Abstract:
This paper examines how trust is formed, maintained, or diminished over time in the context of human-autonomy teaming with an optionally piloted aircraft. Whereas traditional factor-based trust models offer a static representation of human confidence in technology, here we discuss how variations in the underlying factors lead to variations in trust, trust thresholds, and human behaviours. Over 200 hours of flight test data collected over a multi-year test campaign from 2021 to 2023 were reviewed. The dispositional-situational-learned, process-performance-purpose, and IMPACTS homeostasis trust models are applied to illuminate trust trends during nominal autonomous flight operations. The results offer promising directions for future studies on trust dynamics and design-for-trust in human-autonomy teaming.

Authors:Yinon Goldshtein, Gal Perelman, Assaf Schuster, Avi Ostfeld
Title: Large Language Models for Water Distribution Systems Modeling and Decision-Making
Abstract:
The design, operations, and management of water distribution systems (WDS) involve complex mathematical models. These models are continually improving due to computational advancements, leading to better decision-making and more efficient WDS management. However, the significant time and effort required for modeling, programming, and analyzing results remain substantial challenges. Another issue is the professional burden, which confines the interaction with models, databases, and other sophisticated tools to a small group of experts, thereby causing non-technical stakeholders to depend on these experts or make decisions without modeling support. Furthermore, explaining model results is challenging even for experts, as it is often unclear which conditions cause the model to reach a certain state or recommend a specific policy. The recent advancements in Large Language Models (LLMs) open doors for a new stage in human-model interaction. This study proposes a framework of plain language interactions with hydraulic and water quality models based on LLM-EPANET architecture. This framework is tested with increasing levels of complexity of queries to study the ability of LLMs to interact with WDS models, run complex simulations, and report simulation results. The performance of the proposed framework is evaluated across several categories of queries and hyper-parameter configurations, demonstrating its potential to enhance decision-making processes in WDS management.

Authors:Adam Herout, Vojtěch Bartl, Martin Gaens, Oskar Tvrďoch
Title: The Malleable Glyph (Challenge)
Abstract:
Malleable Glyph is a new visualization problem and a public challenge. It originated from UX research (namely from research on card sorting UX), but its applications can be diverse (UI, gaming, information presentation, maps, and others). Its essence is: carrying as much information in a defined planar and static area as possible. The information should allow human observers to evaluate a pair of glyphs into three possible sortings: the first is "greater", or the second is "greater", or both are equal. The glyphs should adhere to the Illiteracy Rule, in other words, the observer should ask themselves the question "how much?" rather than "how many?". This article motivates the technique, explains its details, and presents the public challenge, including the evaluation protocol. The article aims to call for ideas from other visualization and graphics researchers and practitioners and to invite everyone to participate in the challenge and, by doing so, move scientific knowledge forward.

Authors:Besjon Cifliku, Hendrik Heuer
Title: "This could save us months of work" -- Use Cases of AI and Automation Support in Investigative Journalism
Abstract:
As the capabilities of Large Language Models (LLMs) expand, more researchers are studying their adoption in newsrooms. However, much of the research focus remains broad and does not address the specific technical needs of investigative journalists. Therefore, this paper presents several applied use cases where automation and AI intersect with investigative journalism. We conducted a within-subjects user study with eight investigative journalists. In interviews, we elicited practical use cases using a speculative design approach by having journalists react to a prototype of a system that combines LLMs and Programming-by-Demonstration (PbD) to simplify data collection on numerous websites. Based on user reports, we classified the journalistic processes into data collecting and reporting. Participants indicated they utilize automation to handle repetitive tasks like content monitoring, web scraping, summarization, and preliminary data exploration. Following these insights, we provide guidelines on how investigative journalism can benefit from AI and automation.

Authors:Anna Ricarda Luther, Hendrik Heuer, Stephanie Geise, Sebastian Haunss, Andreas Breiter
Title: Social Media for Activists: Reimagining Safety, Content Presentation, and Workflows
Abstract:
Social media is central to activists, who use it internally for coordination and externally to reach supporters and the public. To date, the HCI community has not explored activists' perspectives on future social media platforms. In interviews with 14 activists from an environmental and a queer-feminist movement in Germany, we identify activists' needs and feature requests for future social media platforms. The key finding is that on- and offline safety is their main need. Based on this, we make concrete proposals to improve safety measures. Increased control over content presentation and tools to streamline activist workflows are also central to activists. We make concrete design and research recommendations on how social media platforms and the HCI community can contribute to improved safety and content presentation, and how activists themselves can reduce their workload.

Authors:Meisam J. Seikavandi, Maria J. Barrett, Paolo Burelli
Title: Modeling Face Emotion Perception from Naturalistic Face Viewing: Insights from Fixational Events and Gaze Strategies
Abstract:
Face Emotion Recognition (FER) is essential for social interactions and understanding others' mental states. Utilizing eye tracking to investigate FER has yielded insights into cognitive processes. In this study, we utilized an instructionless paradigm to collect eye movement data from 21 participants, examining two FER processes: free viewing and grounded FER. We analyzed fixational, pupillary, and microsaccadic events from eye movements, establishing their correlation with emotion perception and performance in the grounded task. By identifying regions of interest on the face, we explored the impact of eye-gaze strategies on face processing, their connection to emotions, and performance in emotion perception. During free viewing, participants displayed specific attention patterns for various emotions. In grounded tasks, where emotions were interpreted based on words, we assessed performance and contextual understanding. Notably, gaze patterns during free viewing predicted success in grounded FER tasks, underscoring the significance of initial gaze behavior. We also employed features from pre-trained deep-learning models for face recognition to enhance the scalability and comparability of attention analysis during free viewing across different datasets and populations. This method facilitated the prediction and modeling of individual emotion perception performance from minimal observations. Our findings advance the understanding of the link between eye movements and emotion perception, with implications for psychology, human-computer interaction, and affective computing, and pave the way for developing precise emotion recognition systems.

Authors:Alessio Arleo, Rita Borgo, Jörn Kohlhammer, Roy Ruddle, Holger Scharlach, Xiaoru Yuan
Title: Reflections on the Use of Dashboards in the Covid-19 Pandemic
Abstract:
Dashboards have arguably been the most used visualizations during the COVID-19 pandemic. They were used to communicate its evolution to national governments for disaster mitigation, to the public domain to inform about its status, and to epidemiologists to comprehend and predict the evolution of the disease. Each design had to be tailored for different tasks and to varying audiences - in many cases set up in a very short time due to the urgent need. In this paper, we collect notable examples of dashboards and reflect on their use and design during the pandemic from a user-oriented perspective: we interview a group of researchers with varying visualization expertise who actively used dashboards during the pandemic as part of their daily workflow. We discuss our findings and compile a list of lessons learned to support future visualization researchers and dashboard designers.

Authors:Solomon Amenyo, Maura R. Grossman, Daniel G. Brown, Brendan Wylie-Toal
Title: Assessment of AI-Generated Pediatric Rehabilitation SOAP-Note Quality
Abstract:
This study explores the integration of artificial intelligence (AI) or large language models (LLMs) into pediatric rehabilitation clinical documentation, focusing on the generation of SOAP (Subjective, Objective, Assessment, Plan) notes, which are essential for patient care. Creating complex documentation is time-consuming in pediatric settings. We evaluate the effectiveness of two AI tools; Copilot, a commercial LLM, and KAUWbot, a fine-tuned LLM developed for KidsAbility Centre for Child Development (an Ontario pediatric rehabilitation facility), in simplifying and automating this process. We focus on two key questions: (i) How does the quality of AI-generated SOAP notes based on short clinician summaries compare to human-authored notes, and (ii) To what extent is human editing necessary for improving AI-generated SOAP notes? We found no evidence of prior work assessing the quality of AI-generated clinical notes in pediatric rehabilitation. We used a sample of 432 SOAP notes, evenly divided among human-authored, Copilot-generated, and KAUWbot-generated notes. We employ a blind evaluation by experienced clinicians based on a custom rubric. Statistical analysis is conducted to assess the quality of the notes and the impact of human editing. The results suggest that AI tools such as KAUWbot and Copilot can generate SOAP notes with quality comparable to those authored by humans. We highlight the potential for combining AI with human expertise to enhance clinical documentation and offer insights for the future integration of AI into pediatric rehabilitation practice and other settings for the management of clinical conditions.

Authors:Everson Borges da Rosa, Michel Albonico, Paulo Juunior Varela
Title: InteractiveEdu: An Open-source Interactive Floor for Exergame as a Learning Platform
Abstract:
Children tend to be constantly exposed to technologies, such as smartphones, tablets, and gaming consoles, drawn by the interactive and visually stimulating nature of digital platforms. Thus, integrating the teaching process with technological gadgets may enhance engagement and foster interactive learning experiences, besides equipping students with the digital skills for today's increasingly technology-driven world. The main goal of this work is to provide an open-source and manageable tool that teachers can use as an everyday activity and as an exergame. For this, we present a prototype of an interactive platform that students use to answer a quiz by moving to segments available on an interactive floor. All the platform design and implementation directions are publicly available.

Authors:Zahra Nevisi, Maryam Tahmasbi
Title: Designing an intelligent computer game for predicting dysgraphia
Abstract:
Dysgraphia is a key cognitive disorder impacting writing skills. Current tests often identify dysgraphia after writing issues emerge. This paper presents a set of computer games and uses machine learning to analyze the results, predicting if a child is at risk. The games focus on cognitive differences like visual attention between dysgraphic and typical children. The machine learning model forecasts dysgraphia by observing how kids interact with these games. We also create an algorithm to detect unsuitable testing conditions, acting as a preprocess to avoid mislabeling them as dysgraphia. We developed a machine learning model capable of predicting dysgraphia with 93.24% accuracy in a test group of 74 participants.

Authors:Amandine M. Caut, Amy Rouillard, Beimnet Zenebe, Matthias Green, Ágúst Pálmason Morthens, David J. T. Sumpter
Title: Representing data in words
Abstract:
An important part of data science is the use of visualisations to display data in a way that is easy to digest. Visualisations often rely on underlying statistical or machine learning models -- ranging from basic calculations like category means to advanced methods such as principal component analysis of multidimensional datasets -- to convey insights. We introduce an analogous concept for word descriptions of data, which we call wordalisations. Wordalisations describe data in easy to digest words, without necessarily reporting numerical values from the data. We show how to create wordalisations using large language models, through prompt templates engineered according to a task-agnostic structure which can be used to automatically generate prompts from data. We show how to produce reliable and engaging texts on three application areas: scouting football players, personality tests, and international survey data. Using the model cards framework, we emphasise the importance of clearly stating the model we are imposing on the data when creating the wordalisation, detailing how numerical values are translated into words, incorporating background information into prompts for the large language model, and documenting the limitations of the wordalisations. We argue that our model cards approach is a more appropriate framework for setting best practices in wordalisation of data than performance tests on benchmark datasets.

Authors:Michael Bickford, Fayez Alruwaili, Sara Ragab, Hanna Rothenberg, Mohammad Abedin-Nasab
Title: Impact of Extended Reality on Robot-Assisted Surgery Training
Abstract:
Robot Assisted Surgeries (RAS) have one of the steepest learning curves of any type of surgery. Because of this, methods to practice RAS outside the operating room have been developed to improve the surgeons skills. These strategies include the incorporation of extended reality simulators into surgical training programs. In this Systematic review, we seek to determine if extended reality simulators can improve the performance of novice surgeons and how their performance compares to the conventional training of surgeons on Surgical robots. Using the PRISMA 2020 guidelines, a systematic review and meta-analysis was performed searching PubMed, Embase, Web of Science, and Cochrane library for studies that compared the performance of novice surgeons that received no additional training, trained with extended reality, or trained with inanimate physical simulators (conventional additional training). We included articles that gauged performance using either GEARS or Time to complete measurements and used SPSS to perform a meta-analysis to compare the performance outcomes of the surgeons after training. Surgeons trained using extended reality completed their surgical tasks statistically significantly faster than those who did not receive training (Cohen's d=-0.95, p=0.02), and moderately slower than those conventionally trained (Cohen's d=0.65, p=0.14). However, this difference was not statistically significant. Surgeons trained on extended reality demonstrated a statistically significant improvement in GEARS scores over those who did not train (Cohen's d=0.964, p<0.001). While surgeons trained in extended reality had comparable GEARS scores to surgeons trained conventionally (Cohen's d=0.65, p=0.14). This meta-analysis demonstrates that extended reality simulators translated complex skills to surgeons in a low cost and low risk environment.

Authors:Aliffi Majiid, Riaz-Ul-Haque Mian, Kouki Kurohara, Yen-Khang Nguyen-Tran
Title: Approach to Visual Attractiveness of Event Space Through Data-Driven Environment and Spatial Perception
Abstract:
Revitalizing Japan's remote areas has become a crucial task, and Matsue City exemplifies this effort in its temporary event spaces, created through collective efforts to foster urban vibrancy and bring together residents and visitors. This research examines the relationship between data-driven in-sights using generative AI and visual attractiveness by evaluating tempo-rary events in Matsue City, particularly considering the cognitive-cultural differences in processing visual information of the participants. The first phase employs semantic keyword extraction from interviews, categorizing responses into physical elements, activities, and atmosphere. The second phase analyzes spatial perception through three categories: layout hierar-chy, product visibility, and visual attention. The correlation indicates that successful event design requires a balance between spatial efficiency and diverse needs, with a spatial organization that optimizes visitor flow and visibility strategies considering cultural and demographic diversity. These findings contribute to understanding the urban quality of temporary event spaces and offer a replicable framework for enhancing the visual appeal of events in remote areas throughout Japan.

Authors:Mingjun Ren, Wentao Xu
Title: The Impact of Big Five Personality Traits on AI Agent Decision-Making in Public Spaces: A Social Simulation Study
Abstract:
This study investigates how the Big Five personality traits influence decision-making processes in AI agents within public spaces. Using AgentVerse framework and GPT-3.5-turbo, we simulated interactions among 10 AI agents, each embodying different dimensions of the Big Five personality traits, in a classroom environment responding to misinformation. The experiment assessed both public expressions ([Speak]) and private thoughts ([Think]) of agents, revealing significant correlations between personality traits and decision-making patterns. Results demonstrate that Openness to Experience had the strongest impact on information acceptance, with curious agents showing high acceptance rates and cautious agents displaying strong skepticism. Extraversion and Conscientiousness also showed notable influence on decision-making, while Neuroticism and Agreeableness exhibited more balanced responses. Additionally, we observed significant discrepancies between public expressions and private thoughts, particularly in agents with friendly and extroverted personalities, suggesting that social context influences decision-making behavior. Our findings contribute to understanding how personality traits shape AI agent behavior in social settings and have implications for developing more nuanced and context-aware AI systems.

Authors:Zhilong Zhao, Jiaxin Xia
Title: Is Negative Representation More Engaging? The Influence of News Title Framing of Older Adults on Viewer Behavior
Abstract:
Grounded in framing theory, this study examines how news titles about older adults shape user engagement on a Chinese video-sharing platform. We analyzed 2,017 video news titles from 2016 to 2021, identifying nine frames. Negative frames produced higher views and shares, suggesting that negative portrayals garner attention and encourage further distribution. In contrast, positive frames led to more collections and rewards, reflecting viewer preference and financial support for favorable depictions. These findings underscore how framing aligns with ageism concerns and highlight the need for more balanced media portrayals of older adults.

Authors:Anil R. Doshi, Alastair Moore
Title: Toward a Human-AI Task Tensor: A Taxonomy for Organizing Work in the Age of Generative AI
Abstract:
We introduce a framework for understanding the impact of generative AI on human work, which we call the human-AI task tensor. A tensor is a structured framework that organizes tasks along multiple interdependent dimensions. Our human-AI task tensor introduces a systematic approach to studying how humans and AI interact to perform tasks, and has eight dimensions: task definition, AI contribution, interaction modality, audit requirement, output definition, decision-making authority, AI structure, and human persona. After describing the eight dimensions of the tensor, we provide illustrative frameworks (derived from projections of the tensor) and a human-AI task canvas that provide analytical tractability and practical insight for organizational decision-making. We demonstrate how the human-AI task tensor can be used to organize emerging and future research on generative AI. We propose that the human-AI task tensor offers a starting point for understanding how work will be performed with the emergence of generative AI.

Authors:Elvis Kimara, Kunle S. Oguntoye, Jian Sun
Title: PersonaAI: Leveraging Retrieval-Augmented Generation and Personalized Context for AI-Driven Digital Avatars
Abstract:
This paper introduces PersonaAI, a cutting-edge application that leverages Retrieval-Augmented Generation (RAG) and the LLAMA model to create highly personalized digital avatars capable of accurately mimicking individual personalities. Designed as a cloud-based mobile application, PersonaAI captures user data seamlessly, storing it in a secure database for retrieval and analysis. The result is a system that provides context-aware, accurate responses to user queries, enhancing the potential of AI-driven personalization. Why should you care? PersonaAI combines the scalability of RAG with the efficiency of prompt-engineered LLAMA3, offering a lightweight, sustainable alternative to traditional large language model (LLM) training methods. The system's novel approach to data collection, utilizing real-time user interactions via a mobile app, ensures enhanced context relevance while maintaining user privacy. By open-sourcing our implementation, we aim to foster adaptability and community-driven development. PersonaAI demonstrates how AI can transform interactions by merging efficiency, scalability, and personalization, making it a significant step forward in the future of digital avatars and personalized AI.

Authors:Tittaya Mairittha, Tanakon Sawanglok, Panuwit Raden, Sorrawit Treesuk
Title: When Pigs Get Sick: Multi-Agent AI for Swine Disease Detection
Abstract:
Swine disease surveillance is critical to the sustainability of global agriculture, yet its effectiveness is frequently undermined by limited veterinary resources, delayed identification of cases, and variability in diagnostic accuracy. To overcome these barriers, we introduce a novel AI-powered, multi-agent diagnostic system that leverages Retrieval-Augmented Generation (RAG) to deliver timely, evidence-based disease detection and clinical guidance. By automatically classifying user inputs into either Knowledge Retrieval Queries or Symptom-Based Diagnostic Queries, the system ensures targeted information retrieval and facilitates precise diagnostic reasoning. An adaptive questioning protocol systematically collects relevant clinical signs, while a confidence-weighted decision fusion mechanism integrates multiple diagnostic hypotheses to generate robust disease predictions and treatment recommendations. Comprehensive evaluations encompassing query classification, disease diagnosis, and knowledge retrieval demonstrate that the system achieves high accuracy, rapid response times, and consistent reliability. By providing a scalable, AI-driven diagnostic framework, this approach enhances veterinary decision-making, advances sustainable livestock management practices, and contributes substantively to the realization of global food security.

Authors:Navya Sonal Agarwal, Sanjay Kumar Sonbhadra
Title: A Review on Large Language Models for Visual Analytics
Abstract:
This paper provides a comprehensive review of the integration of Large Language Models (LLMs) with visual analytics, addressing their foundational concepts, capabilities, and wide-ranging applications. It begins by outlining the theoretical underpinnings of visual analytics and the transformative potential of LLMs, specifically focusing on their roles in natural language understanding, natural language generation, dialogue systems, and text-to-media transformations. The review further investigates how the synergy between LLMs and visual analytics enhances data interpretation, visualization techniques, and interactive exploration capabilities. Key tools and platforms including LIDA, Chat2VIS, Julius AI, and Zoho Analytics, along with specialized multimodal models such as ChartLlama and CharXIV, are critically evaluated. The paper discusses their functionalities, strengths, and limitations in supporting data exploration, visualization enhancement, automated reporting, and insight extraction. The taxonomy of LLM tasks, ranging from natural language understanding (NLU), natural language generation (NLG), to dialogue systems and text-to-media transformations, is systematically explored. This review provides a SWOT analysis of integrating Large Language Models (LLMs) with visual analytics, highlighting strengths like accessibility and flexibility, weaknesses such as computational demands and biases, opportunities in multimodal integration and user collaboration, and threats including privacy concerns and skill degradation. It emphasizes addressing ethical considerations and methodological improvements for effective integration.

Authors:Parisa Ghanad Torshizi, Laura B. Hensel, Ari Shapiro, Stacy C. Marsella
Title: Large Language Models for Virtual Human Gesture Selection
Abstract:
Co-speech gestures convey a wide variety of meanings and play an important role in face-to-face human interactions. These gestures significantly influence the addressee's engagement, recall, comprehension, and attitudes toward the speaker. Similarly, they impact interactions between humans and embodied virtual agents. The process of selecting and animating meaningful gestures has thus become a key focus in the design of these agents. However, automating this gesture selection process poses a significant challenge. Prior gesture generation techniques have varied from fully automated, data-driven methods, which often struggle to produce contextually meaningful gestures, to more manual approaches that require crafting specific gesture expertise and are time-consuming and lack generalizability. In this paper, we leverage the semantic capabilities of Large Language Models to develop a gesture selection approach that suggests meaningful, appropriate co-speech gestures. We first describe how information on gestures is encoded into GPT-4. Then, we conduct a study to evaluate alternative prompting approaches for their ability to select meaningful, contextually relevant gestures and to align them appropriately with the co-speech utterance. Finally, we detail and demonstrate how this approach has been implemented within a virtual agent system, automating the selection and subsequent animation of the selected gestures for enhanced human-agent interactions.

Authors:Guang Dai, Pinhao Wang, Cheng Yao, Fangtian Ying
Title: InnerSelf: Designing Self-Deepfaked Voice for Emotional Well-being
Abstract:
One's own voice is one of the most frequently heard voices. Studies found that hearing and talking to oneself have positive psychological effects. However, the design and implementation of self-voice for emotional regulation in HCI have yet to be explored. In this paper, we introduce InnerSelf, an innovative voice system based on speech synthesis technologies and the Large Language Model. It allows users to engage in supportive and empathic dialogue with their deepfake voice. By manipulating positive self-talk, our system aims to promote self-disclosure and regulation, reshaping negative thoughts and improving emotional well-being.

Authors:ChungHa Lee, Jin-Hyuk Hong
Title: musicolors: Bridging Sound and Visuals For Synesthetic Creative Musical Experience
Abstract:
Music visualization is an important medium that enables synesthetic experiences and creative inspiration. However, previous research focused mainly on the technical and theoretical aspects, overlooking users' everyday interaction with music visualizations. This gap highlights the pressing need for research on how music visualization influences users in synesthetic creative experiences and where they are heading. Thus, we developed musicolors, a web-based music visualization library available in real-time. Additionally, we conducted a qualitative user study with composers, developers, and listeners to explore how they use musicolors to appreciate and get inspiration and craft the music-visual interaction. The results show that musicolors provides a rich value of music visualization to users through sketching for musical ideas, integrating visualizations with other systems or platforms, and synesthetic listening. Based on these findings, we also provide guidelines for future music visualizations to offer a more interactive and creative experience.

Authors:Liyi Zhang, Yujie Peng, Yi Lian, Mengru Xue
Title: Figame: A Family Digital Game Based on JME for Shaping Parent-Child Healthy Gaming Relationship
Abstract:
With the development of technology, digital games have permeated into family and parent-child relationships, leading to cognitive deficiencies and inter-generational conflicts that have yet to be effectively addressed. Building on previous research on digital games and parent-child relationships, we have developed Figame, a Joint Media Engagement (JME) based parent-child digital game aimed at fostering healthy family gaming relationships through co-playing experiences. The game itself involves providing game-related cognitive support, facilitating role-switching between parent and child, encouraging discussions both within and outside the game, and balancing competition and collaboration. During the study, we assessed the gameplay experiences of 8 parent-child pairs (aged between 8 and 12 years). The results indicated that Figame effectively enhances parent-child digital gaming relationships and promotes a willingness to engage in shared gameplay, thereby fostering positive family dynamics within the context of digital gaming.

Authors:Lin Ma, Qiyuan An, Jing Chen, Xinggang Hou, Yuan Feng, Dengkai Chen
Title: What elements should we focus when designing immersive virtual nature? A preliminary user study
Abstract:
Extensive research has confirmed the positive relationship between exposure to natural environments and human cognitive, behavioral, physical, and mental health. However, only some have easy access to nature. With electronic information and simulation technology advancements, digital nature experiences are widely used across various devices and scenarios. It is essential to explore how to effectively select and utilize natural elements to guide the design of digital nature scenes. This paper examines critical elements in immersive virtual nature (IVN) and their impact on user perception. Through online surveys and design experiments, we identified specific natural elements that promote relaxation and proposed design strategies for virtual environments. We developed several immersive virtual nature scenes for further validation. Finally, we outline our future experimental plans and research directions in digital nature. Our research aims to provide HCI designers insights into creating restorative, immersive virtual scenes.

Authors:Omogolo Omaatla Morake, Mengru Xue
Title: Exploring Stress among International College Students in China
Abstract:
Psychological stress encompasses emotional tension and pressure experienced by people, which usually arises from situations people find challenging. However, more is needed to know about the pressures faced by international college students studying in China. The goal of this study is to investigate the various stressors that international college students in China face and how they cope with stress (coping mechanisms). Twenty international students were interviewed to gather data, which was then transcribed. Thematic analysis and coding were applied to the qualitative data, revealing themes related to the causes of stress. The following themes emerge from this data: anticipatory anxiety or future stress, social and cultural challenges, financial strain, and academic pressure. These themes will help understand the various stressors international college students in China face and how they try to cope. Studying how international college students in China cope with challenges can guide the development of targeted interventions to support their mental health. Research suggests that integrating aesthetics and connectivity into design interventions can notably improve the well-being of these students. This paper presents possible future design solutions, leveraging the aesthetics of connectivity to empower students and enhance their resilience. Additionally, it aims to provide valuable insights for designers interested in creating solutions that alleviate stress and promote emotional awareness among international students.

Authors:Yuqi Hu, Yujie Peng, Jennifer Gohumpu, Caijun Zhuang, Lushomo Malambo, Cuina Zhao
Title: Magicarpet: A Parent-child Interactive Game Platform to Enhance Connectivity between Autistic Children and Their Parents
Abstract:
Autistic children often face challenges in social interaction and communication, impacting their social connectivity, especially with their parents. Despite the effectiveness of game-based interactive therapy in improving motor skills, research on enhancing parent-child relationships is lacking. We address this gap with Magicarpet, an interactive play carpet that encourages parent-child interaction and has been validated through a user study with five families. The preliminary results indicate that Magicarpet enhances the motivation and participation of autistic children in play, demonstrating the potential of human-computer interaction (HCI) designs to foster connectivity.

Authors:Yannick Kibolwe Mulundule, Yao Cheng, Amir Ubed, Abdiaziz Omar Hassan
Title: Aesthetics of Connectivity: Envisioning Empowerment Through Smart Clothing
Abstract:
Empowerment in smart clothing, which incorporates advanced technologies, requires the integration of scientific and technological expertise with artistic and design principles. Little research has focused on this unique and innovative field of design until now, and that is about to change. The concept of 'wearables' cut across several fields. A global 'language' that permits both free-form creativity and a methodical design approach is required. Smart clothing designers often seek guidance in their research since it may be difficult to prioritize and understand issues like as usability, production, style, consumer culture, reuse, and end-user needs. Researchers in this research made sure that their design tool was presented in a manner that practitioners from many walks of life could understand. The 'critical route' is a useful tool for smart technology implementation design, study, and development since it helps to clarify the path that must be taken.

Authors:Zeynep Abes, Nathan Fairchild, Spencer Lin, Michael Wahba, Katrina Xiao, Scott S. Fisher
Title: The Immersive Archive: Archival Strategies for the Sensorama & Sutherland HMD
Abstract:
The Immersive Archive is an initiative dedicated to preserve and restore the groundbreaking works from across Extended Reality (XR) history. Originating at the University of Southern California's Mobile and Environmental Media Lab, this archive is committed to developing and exhibiting simulations of influential XR devices that have shaped immersive media over time. This paper examines the challenges and strategies involved in archiving seminal XR technologies, with a focus on Morton Heilig's Sensorama and Ivan Sutherland's HeadMounted Display. As pioneering prototypes in virtual and augmented reality, these devices provide valuable insights into the evolution of immersive media, highlighting both technological innovation and sensory experimentation. Through collaborative archival efforts with institutions such as the HMH Moving Image Archive at University of Southern California and the Computer History Museum, this research integrates media archaeology with digital preservation techniques. Emphasis is placed on documentation practices, restoration of physical artifacts and developing simulations of these historic experiences for contemporary virtual reality platforms. Our interdisciplinary approach to archival methodologies, which captures the multisensory and interactive qualities of these pioneering devices, has been instrumental in developing a framework for future immersive media preservation initiatives. By preserving the immersive essence of these early experiences, we lay the groundwork for future generations to explore and learn from the origins of immersive media. Safeguarding this rich legacy is essential to ensure these visionary works continue to inspire and shape the future of media landscapes.

Authors:Alexandra Hammerberg, Samuel Grunblatt, Patricia Kramer
Title: Movement Sequencing: A Novel Approach to Quantifying the Building Blocks of Human Gait
Abstract:
By 2050, a quarter of the US population will be over the age of 65 with greater than a 40% risk of developing life-altering neuromusculoskeletal pathologies. The potential of wearables, such as Apple AirPods and hearing aids, to provide personalized preventative and predictive health monitoring outside of the clinic is nascent, but large quantities of open-ended data that capture movement in the physical world now exist. Algorithms that leverage existing wearable technology to detect subtle changes to walking mechanics, an early indicator of neuromusculoskeletal pathology, have successfully been developed to determine population-level statistics, but individual-level variability is more difficult to parse from population-level data. Like genetic sequencing, the individual's gait pattern can be discerned by decomposing the movement signal into its fundamental features from which we can detect "mutations" or changes to the pattern that are early indicators of pathology - movement-based biomarkers. We have developed a novel approach to quantify "normal baseline movement" at an individual level, combining methods from gait laboratories with methods used to characterize stellar oscillations. We tested our approach by asking participants to complete an outdoor circuit while wearing a pair of AirPods, using orthopaedic braces to simulate pathology. We found that the novel features we propose are sensitive enough to distinguish between normal walking and brace walking at the population level and at the individual level in all sensor directions (both p $<$ 0.05). We also perform principal component analysis on our population-level and individual-level models, and find significant differences between individuals as well as between the overall population model and most individuals. We also demonstrate the potential of these gait features in deep learning applications.

Authors:Faith Young, Dmitry Alexandrovsky, Daniela Wurhofer, Eva-Maria Krah, Jan Smeddinck
Title: Study Protocol: Shared Achievements: Exploring the Design of Gameful Collaborative Elements and Fostering Social Relatedness through Team Effort Contributions in a Social Physical Activity App
Abstract:
This study protocol outlines the design and methodology of a research study investigating collaborative game elements to promote physical activity within digital health interventions. The study aims to examine how social relatedness influences motivation and adherence to step-count goals. Participants will use Shared Achievements, a minimalistic multiplayer step counter game, over two weeks, one week contributing absolute step counts and one week sharing step counts as a relative percentage of a team goal. Data will be collected through usage metrics and participant feedback to evaluate engagement, motivation, and perceived challenges. Findings will inform the design of digital health tools that balance competition and collaboration, optimising social and behavioural support mechanisms.

Authors:Artur Solomonik, Hendrik Heuer
Title: Social Media Journeys -- Mapping Platform Migration
Abstract:
As people engage with the social media landscape, popular platforms rise and fall. As current research uncovers the experiences people have on various platforms, rarely do we engage with the sociotechnical migration processes when joining and leaving them. In this paper, we asked 32 visitors of a science communication festival to draw out artifacts that we call Social Media Journey Maps about the social media platforms they frequented, and why. By combining qualitative content analysis with a graph representation of Social Media Journeys, we present how social media migration processes are motivated by the interplay of environmental and platform factors. We find that peer-driven popularity, the timing of feature adoption, and personal perceptions of migration causes - such as security - shape individuals' reasoning for migrating between social media platforms. With this work, we aim to pave the way for future social media platforms that foster meaningful and enriching online experiences for users.

Authors:Peijin Yu, Shin'ichi Konomi
Title: Leveraging the Dynamics of Leadership in Group Recommendation Systems
Abstract:
In the field of group recommendation systems (GRS), effectively addressing the diverse preferences of group members poses a significant challenge. Traditional GRS approaches often aggregate individual preferences into a collective group preference to generate recommendations, which may overlook the intricate interactions between group members. We introduce a novel approach to group recommendation, with a specific focus on small groups sharing common interests. In particular, we present a web-based restaurant recommendation system that enhances user satisfaction by modeling mutual interactions among group members. Drawing inspiration from group decision-making literature and leveraging graph theory, we propose a recommendation algorithm that emphasizes the dynamics of relationships and trust within the group. By representing group members as nodes and their interactions as directed edges, the algorithm captures pairwise relationships to foster consensus and improve the alignment of recommendations with group preferences. This interaction-focused framework ultimately seeks to enhance overall group satisfaction with the recommended choices.

Authors:Abhishek Roy, Narsi G, Sujata Mukherjee
Title: ShieldUp!: Inoculating Users Against Online Scams Using A Game Based Intervention
Abstract:
Online scams are a growing threat in India, impacting millions and causing substantial financial losses year over year. This white paper presents ShieldUp!, a novel mobile game prototype designed to inoculate users against common online scams by leveraging the principles of psychological inoculation theory. ShieldUp! exposes users to weakened versions of manipulation tactics frequently used by scammers, and teaches them to recognize and pre-emptively refute these techniques. A randomized controlled trial (RCT) with 3,000 participants in India was conducted to evaluate the game's efficacy in helping users better identify scams scenarios. Participants were assigned to one of three groups: the ShieldUp! group (play time: 15 min), a general scam awareness group (watching videos and reading tips for 10-15 min), and a control group (plays "Chrome Dino", an unrelated game, for 10 minutes). Scam discernment ability was measured using a newly developed Scam Discernment Ability Test (SDAT-10) before the intervention, immediately after, and at a 21-day follow-up. Results indicated that participants who played ShieldUp! showed a significant improvement in their ability to identify scams compared to both control groups, and this improvement was maintained at follow-up. Importantly, while both interventions initially led users to to show increased skepticism towards even genuine online offers (NOT Scam scenarios), this effect dissipated after 21 days, suggesting no long-term negative impact on user trust. This study demonstrates the potential of game-based inoculation as a scalable and effective scam prevention strategy, offering valuable insights for product design, policy interventions, and future research, including the need for longitudinal studies and cross-cultural adaptations.

Authors:Matteo Cercola, Nicola Gatti, Pedro Huertas Leyva, Benedetto Carambia, Simone Formentin
Title: Automating the loop in traffic incident management on highway
Abstract:
Effective traffic incident management is essential for ensuring safety, minimizing congestion, and reducing response times in emergency situations. Traditional highway incident management relies heavily on radio room operators, who must make rapid, informed decisions in high-stakes environments. This paper proposes an innovative solution to support and enhance these decisions by integrating Large Language Models (LLMs) into a decision-support system for traffic incident management. We introduce two approaches: (1) an LLM + Optimization hybrid that leverages both the flexibility of natural language interaction and the robustness of optimization techniques, and (2) a Full LLM approach that autonomously generates decisions using only LLM capabilities. We tested our solutions using historical event data from Autostrade per l'Italia. Experimental results indicate that while both approaches show promise, the LLM + Optimization solution demonstrates superior reliability, making it particularly suited to critical applications where consistency and accuracy are paramount. This research highlights the potential for LLMs to transform highway incident management by enabling accessible, data-driven decision-making support.

Authors:Nicola Milano, Michela Ponticorvo, Davide Marocco
Title: Comparing Human Expertise and Large Language Models Embeddings in Content Validity Assessment of Personality Tests
Abstract:
In this article we explore the application of Large Language Models (LLMs) in assessing the content validity of psychometric instruments, focusing on the Big Five Questionnaire (BFQ) and Big Five Inventory (BFI). Content validity, a cornerstone of test construction, ensures that psychological measures adequately cover their intended constructs. Using both human expert evaluations and advanced LLMs, we compared the accuracy of semantic item-construct alignment. Graduate psychology students employed the Content Validity Ratio (CVR) to rate test items, forming the human baseline. In parallel, state-of-the-art LLMs, including multilingual and fine-tuned models, analyzed item embeddings to predict construct mappings. The results reveal distinct strengths and limitations of human and AI approaches. Human validators excelled in aligning the behaviorally rich BFQ items, while LLMs performed better with the linguistically concise BFI items. Training strategies significantly influenced LLM performance, with models tailored for lexical relationships outperforming general-purpose LLMs. Here we highlights the complementary potential of hybrid validation systems that integrate human expertise and AI precision. The findings underscore the transformative role of LLMs in psychological assessment, paving the way for scalable, objective, and robust test development methodologies.

Authors:Khonzoda Umarova, Talia Wise, Zhuoer Lyu, Mina Lee, Qian Yang
Title: How Problematic Writer-AI Interactions (Rather than Problematic AI) Hinder Writers' Idea Generation
Abstract:
Writing about a subject enriches writers' understanding of that subject. This cognitive benefit of writing -- known as constructive learning -- is essential to how students learn in various disciplines. However, does this benefit persist when students write with generative AI writing assistants? Prior research suggests the answer varies based on the type of AI, e.g., auto-complete systems tend to hinder ideation, while assistants that pose Socratic questions facilitate it. This paper adds an additional perspective. Through a case study, we demonstrate that the impact of genAI on students' idea development depends not only on the AI but also on the students and, crucially, their interactions in between. Students who proactively explored ideas gained new ideas from writing, regardless of whether they used auto-complete or Socratic AI assistants. Those who engaged in prolonged, mindless copyediting developed few ideas even with a Socratic AI. These findings suggest opportunities in designing AI writing assistants, not merely by creating more thought-provoking AI, but also by fostering more thought-provoking writer-AI interactions.

Authors:Jennie J. Y. Chen, Sidney S. Fels
Title: Curves Ahead: Enhancing the Steering Law for Complex Curved Trajectories
Abstract:
The Steering Law has long been a fundamental model in predicting movement time for tasks involving navigating through constrained paths, such as in selecting sub-menu options, particularly for straight and circular arc trajectories. However, this does not reflect the complexities of real-world tasks where curvatures can vary arbitrarily, limiting its applications. This study aims to address this gap by introducing the total curvature parameter K into the equation to account for the overall curviness characteristic of a path. To validate this extension, we conducted a mouse-steering experiment on fixed-width paths with varying lengths and curviness levels. Our results demonstrate that the introduction of K significantly improves model fitness for movement time prediction over traditional models. These findings advance our understanding of movement in complex environments and support potential applications in fields like speech motor control and virtual navigation.

Authors:Andrea Green, Gabrielle Polite, Isabelle Hung, Kristen L. Fessele, Sarah L. Billington, James A. Landay, Andrea Cuadra
Title: Black Older Adults' Perception of Using Voice Assistants to Enact a Medical Recovery Curriculum
Abstract:
The use of interactive voice assistants (IVAs) in healthcare provides an avenue to address diverse health needs, such as gaps in the medical recovery period for older adult patients who have recently experienced serious illness. By using a voice-assisted medical recovery curriculum, discharged patients can receive ongoing support as they recover. However, there exist significant medical and technology disparities among older adults, particularly among Black older adults. We recruited 26 Black older adults to participate in the design process of an IVA-enacted medical recovery curriculum by providing feedback during the early stages of design. Lack of cultural relevancy, accountability, privacy concerns, and stigmas associated with aging and disability made participants reluctant to engage with the technology unless in a position of extreme need. This study underscored the need for Black cultural representation, whether it regarded the IVA's accent, the types of media featured, or race-specific medical advice, and the need for strategies to address participants' concerns and stigmas. Participants saw the value in the curriculum for those who did not have caregivers and deliberated about the trade-offs the technology presented. We discuss tensions surrounding inclusion and representation and conclude by showing how we enacted the lessons from this study in future design plans.

Authors:Yekta Amirkhalili, Ho Yi Wong
Title: Banking on Feedback: Text Analysis of Mobile Banking iOS and Google App Reviews
Abstract:
The rapid growth of mobile banking (m-banking), especially after the COVID-19 pandemic, has reshaped the financial sector. This study analyzes consumer reviews of m-banking apps from five major Canadian banks, collected from Google Play and iOS App stores. Sentiment analysis and topic modeling classify reviews as positive, neutral, or negative, highlighting user preferences and areas for improvement. Data pre-processing was performed with NLTK, a Python language processing tool, and topic modeling used Latent Dirichlet Allocation (LDA). Sentiment analysis compared methods, with Long Short-Term Memory (LSTM) achieving 82\% accuracy for iOS reviews and Multinomial Naive Bayes 77\% for Google Play. Positive reviews praised usability, reliability, and features, while negative reviews identified login issues, glitches, and dissatisfaction with updates.This is the first study to analyze both iOS and Google Play m-banking app reviews, offering insights into app strengths and weaknesses. Findings underscore the importance of user-friendly designs, stable updates, and better customer service. Advanced text analytics provide actionable recommendations for improving user satisfaction and experience.

Authors:Jungyeon Park, Anna Kochnev Goldstein, Yueming Zhuo, Nathan Jensen, Daniel Palanker
Title: Simulation of prosthetic vision with PRIMA system and enhancement of face representation
Abstract:
Objective. Patients implanted with the PRIMA photovoltaic subretinal prosthesis in geographic atrophy report form vision with the average acuity matching the 100um pixel size. Although this remarkable outcome enables them to read and write, they report difficulty with perceiving faces. This paper provides a novel, non-pixelated algorithm for simulating prosthetic vision the way it is experienced by PRIMA patients, compares the algorithm's predictions to clinical perceptual outcomes, and offers computer vision and machine learning (ML) methods to improve face representation. Approach. Our simulation algorithm integrates a grayscale filter, spatial resolution filter, and contrast filter. This accounts for the limited sampling density of the retinal implant, as well as the reduced contrast sensitivity of prosthetic vision. Patterns of Landolt C and faces created using this simulation algorithm are compared to reports from actual PRIMA users. To recover the facial features lost in prosthetic vision, we apply an ML facial landmarking model as well as contrast adjusting tone curves to the face image prior to its projection onto the implant. Main results. Simulated prosthetic vision matches the maximum letter acuity observed in clinical studies as well as patients' subjective descriptions. Application of the inversed contrast filter helps preserve the contrast in prosthetic vision. Identification of the facial features using an ML facial landmarking model and accentuating them further improve face representation. Significance. Spatial and contrast constraints of prosthetic vision limit resolvable features and degrade natural images. ML based methods and contrast adjustments mitigate some limitations and improve face representation. Even though higher spatial resolution can be expected with implants having smaller pixels, contrast enhancement still remains essential for face recognition.

Authors:Srinivas Ravishankar, Nora Zajzon, Virginia de Sa
Title: Decoding Imagined Handwriting from EEG
Abstract:
Patients with extreme forms of paralysis face challenges in communication, adversely impacting their quality of life. Recent studies have reported higher-than-chance performance in decoding handwritten letters from EEG signals, potentially allowing these subjects to communicate. However, all prior works have attempted to decode handwriting from EEG during actual motion. Furthermore, they assume that precise movement-onset is known. In this work, we focus on settings closer to real-world use where either movement onset is not known or movement does not occur at all, fully utilizing motor imagery. We show that several existing studies are affected by confounds that make them inapplicable to the imagined handwriting setting. We also investigate how sample complexity affects handwriting decoding performance, guiding future data collection efforts. Our work shows that (a) Sample complexity analysis in single-trial EEG reveals a noise ceiling, which can be alleviated by averaging over trials. (b) Knowledge of movement-onset is crucial to reported performance in prior works. (c) Fully imagined handwriting can be decoded from EEG with higher-than-chance performance. Taken together, these results highlight both the unique challenges and avenues to pursue to build a practical EEG-based handwriting BCI.

Authors:Maxence Grand, Damien Pellier, Francis Jambon
Title: GAIPAT -Dataset on Human Gaze and Actions for Intent Prediction in Assembly Tasks
Abstract:
The primary objective of the dataset is to provide a better understanding of the coupling between human actions and gaze in a shared working environment with a cobot, with the aim of signifcantly enhancing the effciency and safety of humancobot interactions. More broadly, by linking gaze patterns with physical actions, the dataset offers valuable insights into cognitive processes and attention dynamics in the context of assembly tasks. The proposed dataset contains gaze and action data from approximately 80 participants, recorded during simulated industrial assembly tasks. The tasks were simulated using controlled scenarios in which participants manipulated educational building blocks. Gaze data was collected using two different eye-tracking setups -head-mounted and remote-while participants worked in two positions: sitting and standing.

Authors:Thieu Long Phan, Akansel Cosgun
Title: Hand Over or Place On The Table? A Study On Robotic Object Delivery When The Recipient Is Occupied
Abstract:
This study investigates the subjective experiences of users in two robotic object delivery methods: direct handover and table placement, when users are occupied with another task. A user study involving 15 participants engaged in a typing game revealed that table placement significantly enhances user experience compared to direct handovers, particularly in terms of satisfaction, perceived safety and intuitiveness. Additionally, handovers negatively impacted typing performance, while all participants expressed a clear preference for table placement as the delivery method. These findings highlight the advantages of table placement in scenarios requiring minimal user disruption.

Authors:Evgeniia Vu, Andrei Boiarov, Dmitry Vetrov
Title: Streaming Generation of Co-Speech Gestures via Accelerated Rolling Diffusion
Abstract:
Generating co-speech gestures in real time requires both temporal coherence and efficient sampling. We introduce Accelerated Rolling Diffusion, a novel framework for streaming gesture generation that extends rolling diffusion models with structured progressive noise scheduling, enabling seamless long-sequence motion synthesis while preserving realism and diversity. We further propose Rolling Diffusion Ladder Acceleration (RDLA), a new approach that restructures the noise schedule into a stepwise ladder, allowing multiple frames to be denoised simultaneously. This significantly improves sampling efficiency while maintaining motion consistency, achieving up to a 2x speedup with high visual fidelity and temporal coherence. We evaluate our approach on ZEGGS and BEAT, strong benchmarks for real-world applicability. Our framework is universally applicable to any diffusion-based gesture generation model, transforming it into a streaming approach. Applied to three state-of-the-art methods, it consistently outperforms them, demonstrating its effectiveness as a generalizable and efficient solution for real-time, high-fidelity co-speech gesture synthesis.

Authors:Rama Adithya Varanasi, Batia Mishan Wiesenfeld, Oded Nov
Title: AI Rivalry as a Craft: How Resisting and Embracing Generative AI Reshape Writing Professions
Abstract:
Generative AI (GAI) technologies are disrupting professional writing, challenging traditional practices. Recent studies explore GAI adoption experiences of creative practitioners, but we know little about how these experiences evolve into established practices and how GAI resistance alters these practices. To address this gap, we conducted 25 semi-structured interviews with writing professionals who adopted and/or resisted GAI. Using the theoretical lens of Job Crafting, we identify four strategies professionals employ to reshape their roles. Writing professionals employed GAI resisting strategies to maximize human potential, reinforce professional identity, carve out a professional niche, and preserve credibility within their networks. In contrast, GAI-enabled strategies allowed writers who embraced GAI to enhance desirable workflows, minimize mundane tasks, and engage in new AI-managerial labor. These strategies amplified their collaborations with GAI while reducing their reliance on other people. We conclude by discussing implications of GAI practices on writers' identity and practices as well as crafting theory.

Authors:Liwen Lin, Nan Lib, Shuchen Zhao
Title: The effect of intelligent monitoring of physical exercise on executive function in children with ADHD
Abstract:
Children with ADHD often struggle with executive function (EF) and motor skills, impacting their academics and social life. While medications are commonly used, they have side effects, leading to interest in non-drug treatments. Physical activity (PA) has shown promise in improving cognitive and motor skills in children with ADHD. This study examined the short- and long-term effects of three PA interventions: a specific skill training group (EG1), a low-demand exercise group (EG2), and a control group (CG) over 12 weeks. EG1 showed significant improvements in motor tasks and working memory (15\% improvement, p<0.05), while EG2 and CG showed smaller changes. Long-term PA improved working memory, but short-term PA had limited effects on balance and manual dexterity. These findings suggest that skill training has an immediate impact on motor performance, while more complex motor skills require longer interventions. Smart devices tracked progress, confirming sustained engagement and improvement in EG1. This research highlights PA as a promising non-pharmacological treatment for ADHD, warranting further exploration of its effects on other cognitive domains.

Authors:Yu Peng, Guoqing Zhang, Huadong Pang
Title: Impact of Short-Duration Aerobic Exercise Intensity on Executive Function and Sleep
Abstract:
IoT-based devices and wearable sensors are now common in daily life, with smartwatches, smartphones, and other digital tools tracking physical activity and health data. This lifelogging process provides valuable insights into people's lives. This paper analyzes a publicly available lifelog dataset of 14 individuals to explore how exercise affects mood and, in turn, executive function. Results show that moderate physical activity significantly improves mood, reduces stress, and enhances cognitive functions like decision-making and focus. Improved mood not only boosts exercise performance but also strengthens executive function, suggesting exercise benefits both emotional and cognitive well-being. This opens the door for personalized exercise plans tailored to emotional states to optimize brain function.

Authors:Nicoly da Silva Menezes, Thayssa Águila da Rocha, Lucas Samuel Santiago Camelo, Marcelle Pereira Mota
Title: I Felt Pressured to Give 100% All the Time: How Are Neurodivergent Professionals Being Included in Software Development Teams?
Abstract:
Context: As the demand for digital solutions adapted to different user profiles increases, creating more inclusive and diverse software development teams becomes an important initiative to improve software product accessibility. Problem: However, neurodivergent professionals are underrepresented in this area, encountering obstacles from difficulties in communication and collaboration to inadequate software tools, which directly impact their productivity and well-being. Solution: This study seeks to understand the work experiences of neurodivergent professionals acting in different software development roles. A better understanding of their challenges and strategies to deal with them can collaborate to create more inclusive software development teams. IS Theory: We applied the Sociotechnical Theory (STS) to investigate how the social structures of organizations and their respective work technologies influence the inclusion of these professionals. Method: To address this study, we conducted semi-structured interviews with nine neurodivergent professionals in the Software Engineering field and analyzed the results by applying a continuous comparison coding strategy. Results: The results highlighted issues faced by interviewees, the main ones related to difficulties in communication, social interactions, and prejudice related to their diagnosis. Additionally, excessive in work tools became a significant challenge, leading toconstant distractions and cognitive overload. This scenario negatively impacts their concentration and overall performance. Contributions and Impact in the IS area: As a contribution,this study presents empirically based recommendations to overcome sociotechnical challenges faced by neurodivergent individuals working in software development teams.

Authors:Dibri Nsofor, Ben Greenman
Title: Toward a Corpus Study of the Dynamic Gradual Type
Abstract:
Gradually-typed languages feature a dynamic type that supports implicit coercions, greatly weakening the type system but making types easier to adopt. Understanding how developers use this dynamic type is a critical question for the design of useful and usable type systems. This paper reports on an in-progress corpus study of the dynamic type in Python, targeting 221 GitHub projects that use the mypy type checker. The study reveals eight patterns-of-use for the dynamic type, which have implications for future refinements of the mypy type system and for tool support to encourage precise type annotations.

Authors:Teran Bukenberger, Brent Davis
Title: The Detection of Saccadic Eye Movements and Per-Eye Comparisons using Virtual Reality Eye Tracking Devices
Abstract:
Eye tracking has been found to be useful in various tasks including diagnostic and screening tools. However, traditional eye trackers had a complicated setup and operated at a higher frequency to measure eye movements. The use of more commonly available eye trackers such as those in head-mounted virtual reality (VR) headsets greatly expands the utility of these eye trackers for research and analytical purposes. In this study, the research question is focused on detecting saccades, which is a common task when analyzing eye tracking data, but it is not well-established for VR headset-mounted eye trackers. The aim is to determine how accurately saccadic eye movements can be detected using an eye tracker that operates at 60 or 90Hz. The study involves VR eye tracking technology and neuroscience with respect to saccadic eye movements. The goal is to build prototype software implemented using VR eye tracking technology to detect saccadic eye movements, and per-eye differences in an individual. It is anticipated that the software will be able to accurately detect when saccades occur and analyze the differences in saccadic eye movements per-eye. The field of research surrounding VR eye tracking software is still developing rapidly, specifically its applications to neuroscience. Since previous methods of eye tracking involved specialized equipment, using commercially and consumer available VR eye tracking technology to assist in the detection of saccades and per-eye differences would be novel. This project will impact the field of neuroscience by providing a tool that can be used to detect saccadic eye movements and neurological and neurodegenerative disorders. However, this project is limited by the short time frame and that the eye tracker used in this study operates at a maximum frequency of 90Hz.

Authors:Dylan Cashman, Mark Keller, Hyeon Jeon, Bum Chul Kwon, Qianwen Wang
Title: A Critical Analysis of the Usage of Dimensionality Reduction in Four Domains
Abstract:
Dimensionality reduction is used as an important tool for unraveling the complexities of high-dimensional datasets in many fields of science, such as cell biology, chemical informatics, and physics. Visualizations of the dimensionally reduced data enable scientists to delve into the intrinsic structures of their datasets and align them with established hypotheses. Visualization researchers have thus proposed many dimensionality reduction methods and interactive systems designed to uncover latent structures. At the same time, different scientific domains have formulated guidelines or common workflows for using dimensionality reduction techniques and visualizations for their respective fields. In this work, we present a critical analysis of the usage of dimensionality reduction in scientific domains outside of computer science. First, we conduct a bibliometric analysis of 21,249 academic publications that use dimensionality reduction to observe differences in the frequency of techniques across fields. Next, we conduct a survey of a 71-paper sample from four fields: biology, chemistry, physics, and business. Through this survey, we uncover common workflows, processes, and usage patterns, including the mixed use of confirmatory data analysis to validate a dataset and projection method and exploratory data analysis to then generate more hypotheses. We also find that misinterpretations and inappropriate usage is common, particularly in the visual interpretation of the resulting dimensionally reduced view. Lastly, we compare our observations with recent works in the visualization community in order to match work within our community to potential areas of impact outside our community.

Authors:Yan G. Grange, Kevin Tai
Title: Integrating UX Design in Astronomical Software Development: A Case Study
Abstract:
In 2023, ASTRON took the step of incorporating a dedicated User Experience (UX) designer into its software development process. This decision aimed to enhance the accessibility and usability of services providing access to the data holdings from the telescopes we are developing. The field of astronomical software development has historically under emphasized UX design. ASTRON's initiative not only improves our own tools, but can also be used to demonstrate to the broader community the value of integrating UX expertise into development teams. We discuss how we integrate the UX designer at the start of our software development lifecycle. We end with providing some considerations on how other projects could make use of UX knowledge in their development process.

Authors:Tao Jing, Yao Li, Jingzhou Ye, Jie Wang, Xueqiang Wang
Title: Privacy Law Enforcement Under Centralized Governance: A Qualitative Analysis of Four Years' Special Privacy Rectification Campaigns
Abstract:
In recent years, major privacy laws like the GDPR have brought about positive changes. However, challenges remain in enforcing the laws, particularly due to under-resourced regulators facing a large number of potential privacy-violating software applications (apps) and the high costs of investigating them. Since 2019, China has launched a series of privacy enforcement campaigns known as Special Privacy Rectification Campaigns (SPRCs) to address widespread privacy violations in its mobile application (app) ecosystem. Unlike the enforcement of the GDPR, SPRCs are characterized by large-scale privacy reviews and strict sanctions, under the strong control of central authorities. In SPRCs, central government authorities issue administrative orders to mobilize various resources for market-wide privacy reviews of mobile apps. They enforce strict sanctions by requiring privacy-violating apps to rectify issues within a short timeframe or face removal from app stores. While there are a few reports on SPRCs, the effectiveness and potential problems of this campaign-style privacy enforcement approach remain unclear to the community. In this study, we conducted 18 semi-structured interviews with app-related engineers involved in SPRCs to better understand the campaign-style privacy enforcement. Based on the interviews, we reported our findings on a variety of aspects of SPRCs, such as the processes that app engineers regularly follow to achieve privacy compliance in SPRCs, the challenges they encounter, the solutions they adopt to address these challenges, and the impacts of SPRCs, etc. We found that app engineers face a series of challenges in achieving privacy compliance in their apps...

Authors:Nicklas Lind, Nilan Paramarajah, Timothy Merritt
Title: Serious Play to Encourage Socialization between Unfamiliar Children Facilitated by a LEGO Robot
Abstract:
Socialization is an essential development skill for preschool children. In collaboration with the LEGO Group, we developed Robert Robot, a simplified robot, which enables socialization between children and facilitates shared experiences when meeting for the first time. An exploratory study to observe socialization between preschool children was conducted with 30 respondents in pairs. Additionally, observational data from 212 play sessions with four Robert Robots in the wild were collected. Subsequent analysis found that children have fun as Robert Robot breaks the ice between unfamiliar children. The children relayed audio cues related to the imaginative world of Robert Robot's personalities and mimicked each other as a method of initiating social play and communication with their unfamiliar peers. Furthermore, the study contributes four implications for the design of robots for socialization between children. This chapter provides an example case of serious storytelling using playful interactions engaging children with the character of the robot and the mini-narratives around the build requests.

Authors:Cécile Boulard, Sruthi Viswanathan, Wanda Fey, Thierry Jacquin
Title: Actionable AI: Enabling Non Experts to Understand and Configure AI Systems
Abstract:
Interaction between humans and AI systems raises the question of how people understand AI systems. This has been addressed with explainable AI, the interpretability arising from users' domain expertise, or collaborating with AI in a stable environment. In the absence of these elements, we discuss designing Actionable AI, which allows non-experts to configure black-box agents. In this paper, we experiment with an AI-powered cartpole game and observe 22 pairs of participants to configure it via direct manipulation. Our findings suggest that, in uncertain conditions, non-experts were able to achieve good levels of performance. By influencing the behaviour of the agent, they exhibited an operational understanding of it, which proved sufficient to reach their goals. Based on this, we derive implications for designing Actionable AI systems. In conclusion, we propose Actionable AI as a way to open access to AI-based agents, giving end users the agency to influence such agents towards their own goals.

Authors:Viktor Dorfler, Giles Cuthbert
Title: Dubito Ergo Sum: Exploring AI Ethics
Abstract:
We paraphrase Descartes' famous dictum in the area of AI ethics where the "I doubt and therefore I am" is suggested as a necessary aspect of morality. Therefore AI, which cannot doubt itself, cannot possess moral agency. Of course, this is not the end of the story. We explore various aspects of the human mind that substantially differ from AI, which includes the sensory grounding of our knowing, the act of understanding, and the significance of being able to doubt ourselves. The foundation of our argument is the discipline of ethics, one of the oldest and largest knowledge projects of human history, yet, we seem only to be beginning to get a grasp of it. After a couple of thousand years of studying the ethics of humans, we (humans) arrived at a point where moral psychology suggests that our moral decisions are intuitive, and all the models from ethics become relevant only when we explain ourselves. This recognition has a major impact on what and how we can do regarding AI ethics. We do not offer a solution, we explore some ideas and leave the problem open, but we hope somewhat better understood than before our study.

Authors:Xuyao Zhang, Milan Ilić, Beat Signer
Title: A Modular and Extensible Hardware Platform Prototype for Dynamic Data Physicalisation
Abstract:
Dynamic data physicalisation is an emerging field of research, investigating the representation and exploration of data via multiple modalities, beyond traditional visual methods. Despite the development of various data physicalisation applications in recent years, the integration of diverse hardware components remains both time-consuming and costly. Further, there is a lack of solutions for rapid prototyping and experimentation with different dynamic data physicalisation alternatives. To address this problem, we propose a modular and extensible hardware platform for dynamic data physicalisation. This platform introduces a communication architecture that ensures seamless plug-and-play functionality for modules representing different physical variables. We detail the implementation and technical evaluation of a preliminary prototype of our platform, demonstrating its potential to facilitate rapid prototyping and experimentation with various data physicalisation designs. The platform aims to support researchers and developers in the field by providing a versatile and efficient tool for the rapid prototyping and experimentation with different data physicalisation design alternatives.

Authors:Yuehan Qiao, Zhihao Yao, Meiyu Hu, Qianyao Xu
Title: Virtual Co-presenter: Connecting Deaf and Hard-of-hearing Livestreamers and Hearing audience in E-commerce Livestreaming
Abstract:
Deaf and Hard-of-Hearing (DHH) individuals are increasingly participating as livestreamers in China's e-commerce livestreaming industry but face obstacles that limit the scope and diversity of their audience. Our paper examines these challenges and explores a potential solution for connecting the hearing audience to sign language (SL) livestreaming teams with DHH members in e-commerce livestreaming. We interviewed four SL livestreaming team members and 15 hearing audience members to identify information and emotional communication challenges that discourage the hearing audience from continuing to watch SL livestreaming. Based on these findings, we developed a virtual co-presenter demo, which targets SL livestreaming teams with DHH members as users, through a design workshop with six designers, incorporating voice broadcasting with animations. Follow-up evaluations with previous participants provided positive feedback on the virtual co-presenter's potential to address these challenges. We summarize design suggestions on its functionality and interaction design for further refinement to assist SL livestreaming teams with DHH members in reaching a broader hearing audience.

Authors:Daeheon Jeong, Hyehyun Chu
Title: Visual Embedding of Screen Sequences for User-Flow Search in Example-driven Communication
Abstract:
Effective communication of UX considerations to stakeholders (e.g., designers and developers) is a critical challenge for UX practitioners. To explore this problem, we interviewed four UX practitioners about their communication challenges and strategies. Our study identifies that providing an example user flow-a screen sequence representing a semantic task-as evidence reinforces communication, yet finding relevant examples remains challenging. To address this, we propose a method to systematically retrieve user flows using semantic embedding. Specifically, we design a model that learns to associate screens' visual features with user flow descriptions through contrastive learning. A survey confirms that our approach retrieves user flows better aligned with human perceptions of relevance. We analyze the results and discuss implications for the computational representation of user flows.

Authors:Dilrukshi Gamage, Dilki Sewwandi, Min Zhang, Arosha Bandara
Title: Labeling Synthetic Content: User Perceptions of Warning Label Designs for AI-generated Content on Social Media
Abstract:
In this research, we explored the efficacy of various warning label designs for AI-generated content on social media platforms e.g., deepfakes. We devised and assessed ten distinct label design samples that varied across the dimensions of sentiment, color/iconography, positioning, and level of detail. Our experimental study involved 911 participants randomly assigned to these ten label designs and a control group evaluating social media content. We explored their perceptions relating to 1. Belief in the content being AI-generated, 2. Trust in the labels and 3. Social Media engagement perceptions of the content. The results demonstrate that the presence of labels had a significant effect on the users belief that the content is AI generated, deepfake, or edited by AI. However their trust in the label significantly varied based on the label design. Notably, having labels did not significantly change their engagement behaviors, such as like, comment, and sharing. However, there were significant differences in engagement based on content type: political and entertainment. This investigation contributes to the field of human computer interaction by defining a design space for label implementation and providing empirical support for the strategic use of labels to mitigate the risks associated with synthetically generated media.

Authors:Frederic Lemieux, Aisha Behr, Clara Kellermann-Bryant, Zaki Mohammed
Title: Cognitive Bias Detection Using Advanced Prompt Engineering
Abstract:
Cognitive biases, systematic deviations from rationality in judgment, pose significant challenges in generating objective content. This paper introduces a novel approach for real-time cognitive bias detection in user-generated text using large language models (LLMs) and advanced prompt engineering techniques. The proposed system analyzes textual data to identify common cognitive biases such as confirmation bias, circular reasoning, and hidden assumption. By designing tailored prompts, the system effectively leverages LLMs' capabilities to both recognize and mitigate these biases, improving the quality of human-generated content (e.g., news, media, reports). Experimental results demonstrate the high accuracy of our approach in identifying cognitive biases, offering a valuable tool for enhancing content objectivity and reducing the risks of biased decision-making.

Authors:Benyamin Tabarsi, Heidi Reichert, Ally Limke, Sandeep Kuttal, Tiffany Barnes
Title: LLMs' Reshaping of People, Processes, Products, and Society in Software Development: A Comprehensive Exploration with Early Adopters
Abstract:
Large language models (LLMs) like OpenAI ChatGPT, Google Gemini, and GitHub Copilot are rapidly gaining traction in the software industry, but their full impact on software engineering remains insufficiently explored. Despite their growing adoption, there is a notable lack of formal, qualitative assessments of how LLMs are applied in real-world software development contexts. To fill this gap, we conducted semi-structured interviews with sixteen early-adopter professional developers to explore their use of LLMs throughout various stages of the software development life cycle. Our investigation examines four dimensions: people - how LLMs affect individual developers and teams; process - how LLMs alter software engineering workflows; product - LLM impact on software quality and innovation; and society - the broader socioeconomic and ethical implications of LLM adoption. Thematic analysis of our data reveals that while LLMs have not fundamentally revolutionized the development process, they have substantially enhanced routine coding tasks, including code generation, refactoring, and debugging. Developers reported the most effective outcomes when providing LLMs with clear, well-defined problem statements, indicating that LLMs excel with decomposed problems and specific requirements. Furthermore, these early-adopters identified that LLMs offer significant value for personal and professional development, aiding in learning new languages and concepts. Early-adopters, highly skilled in software engineering and how LLMs work, identified early and persisting challenges for software engineering, such as inaccuracies in generated content and the need for careful manual review before integrating LLM outputs into production environments. Our study provides a nuanced understanding of how LLMs are shaping the landscape of software development, with their benefits, limitations, and ongoing implications.

Authors:Stephen Pilli, Vivek Nallur
Title: The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making
Abstract:
Persuasion through conversation has been the focus of much research. Nudging is a popular strategy to influence decision-making in physical and digital settings. However, conversational agents employing "nudging" have not received significant attention. We explore the manifestation of cognitive biases-the underlying psychological mechanisms of nudging-and investigate how the complexity of prior dialogue tasks impacts decision-making facilitated by conversational agents. Our research used a between-group experimental design, involving 756 participants randomly assigned to either a simple or complex task before encountering a decision-making scenario. Three scenarios were adapted from Samuelson's classic experiments on status-quo bias, the underlying mechanism of default nudges. Our results aligned with previous studies in two out of three simple-task scenarios. Increasing task complexity consistently shifted effect-sizes toward our hypothesis, though bias was significant in only one case. These findings inform conversational nudging strategies and highlight inherent biases relevant to behavioural economics.

Authors:Tiago Massoni, Ricardo Duarte, Ruan Oliveira
Title: Exit the Code: A Model for Understanding Career Abandonment Intention Among Software Developers
Abstract:
Background. Career abandonment, the process in which professionals leave the activity, assuming positions in another area, among software developers involves frustration with the lost investment and emotional and financial costs, even though being beneficial for the human being, depending on personal context. Previous studies have identified work-related motivators for career abandonment, such as the threat of obsolescence, unstable requirements, and low code quality, though these factors have primarily been examined in former developers. The relationship between these motivators and the intention to abandon among currently active developers remains unexplored. Goal. This article investigates the relationship between key work-related motivators and currently active software developers intention to abandon their careers. Method. We employed a quantitative approach, surveying 221 software developers to validate a theoretical model for career abandonment intention, based on an adaptation of the Investment Model, which incorporates satisfaction with technical aspects of the profession as well as the intention to abandon. Findings. Exploratory and confirmatory factor analyses, through structural equation modeling (SEM), provided robust support for the adapted Investment Model in explaining software developers intention to abandon their careers. Moreover, career commitment significantly impacts the intention to leave the profession, being positively influenced by satisfaction with technical work-related factors and negatively influenced by career alternatives and career investment. Conclusion. The paper offers valuable insights for organizational leaders and research, potentially guiding retention strategies to better support developers, and the adoption of theoretical models to explain career abandonment.

Authors:Alan Dix, Tommaso Turchi, Ben Wilson, Anna Monreale, Matt Roach
Title: Talking Back -- human input and explanations to interactive AI systems
Abstract:
While XAI focuses on providing AI explanations to humans, can the reverse - humans explaining their judgments to AI - foster richer, synergistic human-AI systems? This paper explores various forms of human inputs to AI and examines how human explanations can guide machine learning models toward automated judgments and explanations that align more closely with human concepts.

Authors:Filippo Cantucci, Marco Marini, Rino Falcone
Title: The Role of Robot Competence, Autonomy, and Personality on Trust Formation in Human-Robot Interaction
Abstract:
Human trust in social robots is a complex attitude based on cognitive and emotional evaluations, as well as a behavior, like task delegation. While previous research explored the features of robots that influence overall trust attitude, it remains unclear whether these features affect behavioral trust. Additionally, there is limited investigation into which features of robots influence cognitive and emotional attitudes, and how these attitudes impact humans' willingness to delegate new tasks to robots. This study examines the interplay between competence, autonomy, and personality traits of robots and their impact on trust attitudes (cognitive and affective trust) and trust behavior (task delegation), within the context of task-oriented Human-Robot Interaction. Our findings indicate that robot competence is a key determinant of trust, influencing cognitive, affective, and behavioral trust. In contrast, robot personality traits significantly impact only affective trust without affecting cognitive trust or trust behavior. In addition, autonomy was found to moderate the relationship between competence and cognitive trust, as well as between personality and affective trust. Finally, cognitive trust was found to positively influence task delegation, whereas affective trust did not show a significant effect. This paper contributes to the literature on Human-Robot Trust by providing novel evidence that enhances the acceptance and effectiveness of social robots in collaborative scenarios.

Authors:Raha Asadi, Bodil Biering, Vincent van Dijk, Oksana Kulyk, Elda Paja
Title: No Silver Bullet: Towards Demonstrating Secure Software Development for Danish Small and Medium Enterprises in a Business-to-Business Model
Abstract:
Software developing small and medium enterprises (SMEs) play a crucial role as suppliers to larger corporations and public administration. It is therefore necessary for them to be able to demonstrate that their products meet certain security criteria, both to gain trust of their customers and to comply to standards that demand such a demonstration. In this study we have investigated ways for SMEs to demonstrate their security when operating in a business-to-business model, conducting semi-structured interviews (N=16) with practitioners from different SMEs in Denmark and validating our findings in a follow-up workshop (N=6). Our findings indicate five distinctive security demonstration approaches, namely: Certifications, Reports, Questionnaires, Interactive Sessions and Social Proof. We discuss the challenges, benefits, and recommendations related to these approaches, concluding that none of them is a one-size-fits all solution and that more research into relative advantages of these approaches and their combinations is needed.

Authors:Charlotte Croucher, Panagiotis Kourtesis, Georgios Papaioannou
Title: Just Roll with It: Exploring the Mitigating Effects of Postural Alignment on Vection-Induced Cybersickness in Virtual Reality Over Time
Abstract:
Cybersickness remains a significant challenge in virtual reality (VR), limiting its usability across various applications. Existing mitigation strategies focus on optimising VR hardware and/or software and enhancing self-motion perception to minimise sensory conflict. However, anticipatory postural adaptation, a strategy widely studied with regards to motion sickness while being driven, has not been systematically examined in VR. Therefore, in this study, we explore whether adopting comfort-orientated postural movements, based on the literature, mitigates cybersickness. We conducted an exploratory analysis using a cumulative link mixed model (CLMM) on secondary data from a VR-based postural alignment experiment. Results indicate that misalignment between trunk roll and the virtual trajectory increases the odds of reporting higher cybersickness scores by 5%. Additionally, each additional minute in VR increases the odds of reporting higher cybersickness scores (FMS scores) by 11% %, but prolonged exposure leads to a 75% % reduction in the odds of reporting cybersickness symptoms, suggesting adaptation effects. Individual differences also play a role, with higher cybersickness susceptibility increasing the odds of reporting higher symptom severity by 8%. These findings indicate that anticipatory postural adaptation could serve as a natural mitigation strategy for cybersickness. VR applications, particularly in training and simulation, may benefit from designing adaptive cues that encourage users to align their posture with virtual movement. Future research should explore real-time postural feedback mechanisms to enhance user comfort and reduce cybersickness.

Authors:Graciela Camacho-Fidalgo, Blain Judkins, Kylee Friederichs, Lara Soberanis, Vicente Hernandez, Kevin McSweeney, Freddie Witherden, Edgar Rojas-Muñoz
Title: Analyzing the Impact of Augmented Reality Head-Mounted Displays on Workers' Safety and Situational Awareness in Hazardous Industrial Settings
Abstract:
Augmented Reality Head-Mounted Displays (AR-HMDs) have proven effective to assist workers. However, they may degrade their Safety and Situational Awareness (SSA), particularly in complex and hazardous industrial settings. This paper analyzes, objectively and subjectively, the effects of AR-HMDs' on workers' SSA in a simulated hazardous industrial environment. Our evaluation was comprised of sixty participants performing various tasks in a simulated cargo ship room while receiving remote guidance through one of three devices: two off-the-shelf AR-HMDs (Trimble XR10 with HoloLens 2, RealWear Navigator 520), and a smartphone (Google Pixel 6). Several sensors were installed throughout the room to obtain quantitative measures of the participants' safe execution of the tasks, such as the frequency in which they hit the objects in the room or stepped over simulated holes or oil spills. The results reported that the Trimble XR10 led to statistically highest head-knockers and knee-knocker incidents compared to the Navigator 520 and the Pixel 6. Furthermore, the Trimble XR10 also led to significantly higher difficulties to cross hatch doors, lower perceived safety, comfort, perceived performance, and usability. Overall, participants wearing AR-HMDs failed to perceive more hazards, meaning that safety-preserving capabilities must be developed for AR-HMDs before introducing them into industrial hazardous settings confidently.

Authors:Aidan Marler, Yannik Roell, Steffen Knoblauch, Jane P. Messina, Thomas Jaenisch, Morteza Karimzadeh
Title: GeoDEN: A Visual Exploration Tool for Analysing the Geographic Spread of Dengue Serotypes
Abstract:
Static maps and animations remain popular in spatial epidemiology of dengue, limiting the analytical depth and scope of visualisations. Over half of the global population live in dengue endemic regions. Understanding the spatiotemporal dynamics of the four closely related dengue serotypes, and their immunological interactions, remains a challenge at a global scale. To facilitate this understanding, we worked with dengue epidemiologists in a user-centered design framework to create GeoDEN, an exploratory visualisation tool that empowers experts to investigate spatiotemporal patterns in dengue serotype reports. The tool has several linked visualisations and filtering mechanisms, enabling analysis at a range of spatial and temporal scales. To identify successes and failures, we present both insight-based and value-driven evaluations. Our domain experts found GeoDEN valuable, verifying existing hypotheses and uncovering novel insights that warrant further investigation by the epidemiology community. The developed visual exploration approach can be adapted for exploring other epidemiology and disease incident datasets.

Authors:Karthik Barma, Seshu Babu Barma
Title: Privacy is All You Need: Revolutionizing Wearable Health Data with Advanced PETs
Abstract:
In a world where data is the new currency, wearable health devices offer unprecedented insights into daily life, continuously monitoring vital signs and metrics. However, this convenience raises privacy concerns, as these devices collect sensitive data that can be misused or breached. Traditional measures often fail due to real-time data processing needs and limited device power. Users also lack awareness and control over data sharing and usage. We propose a Privacy-Enhancing Technology (PET) framework for wearable devices, integrating federated learning, lightweight cryptographic methods, and selectively deployed blockchain technology. The blockchain acts as a secure ledger triggered only upon data transfer requests, granting users real-time notifications and control. By dismantling data monopolies, this approach returns data sovereignty to individuals. Through real-world applications like secure medical data sharing, privacy-preserving fitness tracking, and continuous health monitoring, our framework reduces privacy risks by up to 70 percent while preserving data utility and performance. This innovation sets a new benchmark for wearable privacy and can scale to broader IoT ecosystems, including smart homes and industry. As data continues to shape our digital landscape, our research underscores the critical need to maintain privacy and user control at the forefront of technological progress.

Authors:Hyerim Park, Malin Eiband, Andre Luckow, Michael Sedlmair
Title: Exploring Visual Prompts: Refining Images with Scribbles and Annotations in Generative AI Image Tools
Abstract:
Generative AI (GenAI) tools are increasingly integrated into design workflows. While text prompts remain the primary input method for GenAI image tools, designers often struggle to craft effective ones. Moreover, research has primarily focused on input methods for ideation, with limited attention to refinement tasks. This study explores designers' preferences for three input methods - text prompts, annotations, and scribbles - through a preliminary digital paper-based study with seven professional designers. Designers preferred annotations for spatial adjustments and referencing in-image elements, while scribbles were favored for specifying attributes such as shape, size, and position, often combined with other methods. Text prompts excelled at providing detailed descriptions or when designers sought greater GenAI creativity. However, designers expressed concerns about AI misinterpreting annotations and scribbles and the effort needed to create effective text prompts. These insights inform GenAI interface design to better support refinement tasks, align with workflows, and enhance communication with AI systems.

Authors:Torin Anderson, Shuo Niu
Title: Making AI-Enhanced Videos: Analyzing Generative AI Use Cases in YouTube Content Creation
Abstract:
Generative AI (GenAI) tools enhance social media video creation by streamlining tasks such as scriptwriting, visual and audio generation, and editing. These tools enable the creation of new content, including text, images, audio, and video, with platforms like ChatGPT and MidJourney becoming increasingly popular among YouTube creators. Despite their growing adoption, knowledge of their specific use cases across the video production process remains limited. This study analyzes 274 YouTube how-to videos to explore GenAI's role in planning, production, editing, and uploading. The findings reveal that YouTubers use GenAI to identify topics, generate scripts, create prompts, and produce visual and audio materials. Additionally, GenAI supports editing tasks like upscaling visuals and reformatting content while also suggesting titles and subtitles. Based on these findings, we discuss future directions for incorporating GenAI to support various video creation tasks.

Authors:Caterina Fuligni, Daniel Dominguez Figaredo, Julia Stoyanovich
Title: "Would You Want an AI Tutor?" Understanding Stakeholder Perceptions of LLM-based Systems in the Classroom
Abstract:
In recent years, Large Language Models (LLMs) rapidly gained popularity across all parts of society, including education. After initial skepticism and bans, many schools have chosen to embrace this new technology by integrating it into their curricula in the form of virtual tutors and teaching assistants. However, neither the companies developing this technology nor the public institutions involved in its implementation have set up a formal system to collect feedback from the stakeholders impacted by them. In this paper, we argue that understanding the perceptions of those directly or indirectly impacted by LLMs in the classroom, including parents and school staff, is essential for ensuring responsible use of AI in this critical domain. Our contributions are two-fold. First, we propose the Contextualized Perceptions for the Adoption of LLMs in Education (Co-PALE) framework, which can be used to systematically elicit perceptions and inform whether and how LLM-based tools should be designed, developed, and deployed in the classroom. Second, we explain how our framework can be used to ground specific rubrics for eliciting perceptions of the relevant stakeholders in view of specific goals and context of implementation. Overall, Co-PALE is a practical step toward helping educational agents, policymakers, researchers, and technologists ensure the responsible and effective deployment of LLM-based systems across diverse learning contexts.

Authors:Jingfei Huang, Alexandros Haridis
Title: Evaluation of Architectural Synthesis Using Generative AI
Abstract:
Recent advancements in multimodal Generative AI have the potential to democratize specialized architectural tasks, such as interpreting technical drawings and creating 3D CAD models, which traditionally require expert knowledge. This paper presents a comparative evaluation of two systems: GPT-4o and Claude 3.5, in the task of architectural 3D synthesis. We conduct a case study on two buildings from Palladio's Four Books of Architecture (1965): Villa Rotonda and Palazzo Porto. High-level architectural models and drawings of these buildings were prepared, inspired by Palladio's original texts and drawings. Through sequential text and image prompting, we assess the systems' abilities in (1) interpreting 2D and 3D representations of buildings from drawings, (2) encoding the buildings into a CAD software script, and (3) self-improving based on outputs. While both systems successfully generate individual parts, they struggle to accurately assemble these parts into the desired spatial relationships, with Claude 3.5 demonstrating better performance, particularly in self-correcting its output. This study contributes to ongoing research on benchmarking the strengths and weaknesses of off-the-shelf AI systems in performing intelligent human tasks that require discipline-specific knowledge. The findings highlight the potential of language-enabled AI systems to act as collaborative technical assistants in the architectural design process.

Authors:Urvisha Shethia, Vedali Inamdar, Viraj Kulkarni
Title: Evaluating a Digital Speech Therapy App for Stuttering: A Pilot Validation Study
Abstract:
Stuttering is a clinical speech disorder that disrupts fluency and leads to significant psychological and social challenges. This study evaluates the effectiveness of Eloquent, a digital speech therapy app for stuttering, by analyzing pre-therapy and post-therapy speech samples using the Stuttering Severity Index-4 (SSI-4) and the S24 communication and attitude scale. Results showed a 52.7% reduction in overall SSI-4 scores, with marked improvements in reading (45%), speaking (46%), duration (57%), and physical concomitants (63%) scores. Over 75% of participants improved by at least one severity category. S24 scores decreased by 33.5%, indicating more positive self-perceptions of speech and reduced avoidance. These findings highlight the potential of structured, technology-driven speech therapy interventions to deliver measurable improvements in stuttering severity and communication confidence.

Authors:Sohyeon Hwang, Priyanka Nanayakkara, Yan Shvartzshnaider
Title: Trust and Friction: Negotiating How Information Flows Through Decentralized Social Media
Abstract:
Decentralized social media protocols enable users in independent, user-hosted servers (i.e., instances) to interact with each other while they self-govern. This community-based model of social media governance opens up new opportunities for tailored decision-making about information flows -- i.e., what user data is shared to whom and when -- and in turn, for protecting user privacy. To better understand how community governance shapes privacy expectations on decentralized social media, we conducted a semi-structured interview with 23 users of the Fediverse, a decentralized social media network. Our findings illustrate important factors that shape a community's understandings of information flows, such as rules and proactive efforts from admins who are perceived as trustworthy. We also highlight ''governance frictions'' between communities that raise new privacy risks due to incompatibilities in values, security practices, and software. Our findings highlight the unique challenges of decentralized social media, suggest design opportunities to address frictions, and outline the role of participatory decision-making to realize the full potential of decentralization.

Authors:Damien Masson, Zhe Liu, Charles Xu
Title: DuSK: Faster Indirect Text Entry Supporting Out-Of-Vocabulary Words for Touchpads
Abstract:
Given the ubiquity of SmartTVs and head-mounted-display-based virtual environments, recent research has explored techniques to support eyes-free text entry using touchscreen devices. However, proposed techniques, leveraging lexicons, limit the user's ability to enter out-of-vocabulary words. In this paper, we investigate how to enter text while relying on unambiguous input to support out-of-vocabulary words. Through an iterative design approach, and after a careful investigation of actions that can be accurately and rapidly performed eyes-free, we devise DuSK, a {Du}al-handed, {S}troke-based, 1{K}eyboarding technique. In a controlled experiment, we show initial speeds of 10 WPM steadily increasing to 13~WPM with training. DuSK outperforms the common cursor-based text entry technique widely deployed in commercial SmartTVs (8 WPM) and is comparable to other eyes-free lexicon-based techniques, but with the added benefit of supporting out-of-vocabulary word input.

Authors:Artem Timoshenko, Chengfeng Mao, John R. Hauser
Title: Can Large Language Models Extract Customer Needs as well as Professional Analysts?
Abstract:
Identifying customer needs (CNs) is important for product management, product development, and marketing. Applications rely on professional analysts interpreting textual data (e.g., interview transcripts, online reviews) to understand the nuances of customer experience and concisely formulate "jobs to be done." The task is cognitively complex and time-consuming. Current practice facilitates the process with keyword search and machine learning but relies on human judgment to formulate CNs. We examine whether Large Language Models (LLMs) can automatically extract CNs. Because evaluating CNs requires professional judgment, we partnered with a marketing consulting firm to conduct a blind study of CNs extracted by: (1) a foundational LLM with prompt engineering only (Base LLM), (2) an LLM fine-tuned with professionally identified CNs (SFT LLM), and (3) professional analysts. The SFT LLM performs as well as or better than professional analysts when extracting CNs. The extracted CNs are well-formulated, sufficiently specific to identify opportunities, and justified by source content (no hallucinations). The SFT LLM is efficient and provides more complete coverage of CNs. The Base LLM was not sufficiently accurate or specific. Organizations can rely on SFT LLMs to reduce manual effort, enhance the precision of CN articulation, and provide improved insight for innovation and marketing strategy.

Authors:Xiuqi Tommy Zhu, Heidi Cheerman, Minxin Cheng, Sheri Kiami, Leanne Chukoskie, Eileen McGivney
Title: Designing VR Simulation System for Clinical Communication Training with LLMs-Based Embodied Conversational Agents
Abstract:
VR simulation in Health Professions (HP) education demonstrates huge potential, but fixed learning content with little customization limits its application beyond lab environments. To address these limitations in the context of VR for patient communication training, we conducted a user-centered study involving semi-structured interviews with advanced HP students to understand their challenges in clinical communication training and perceptions of VR-based solutions. From this, we derived design insights emphasizing the importance of realistic scenarios, simple interactions, and unpredictable dialogues. Building on these insights, we developed the Virtual AI Patient Simulator (VAPS), a novel VR system powered by Large Language Models (LLMs) and Embodied Conversational Agents (ECAs), supporting dynamic and customizable patient interactions for immersive learning. We also provided an example of how clinical professors could use user-friendly design forms to create personalized scenarios that align with course objectives in VAPS and discuss future implications of integrating AI-driven technologies into VR education.

Authors:José Luiz Nunes, Guilherme FCF Almeida, Brian Flanagan
Title: Evidence of conceptual mastery in the application of rules by Large Language Models
Abstract:
In this paper we leverage psychological methods to investigate LLMs' conceptual mastery in applying rules. We introduce a novel procedure to match the diversity of thought generated by LLMs to that observed in a human sample. We then conducted two experiments comparing rule-based decision-making in humans and LLMs. Study 1 found that all investigated LLMs replicated human patterns regardless of whether they are prompted with scenarios created before or after their training cut-off. Moreover, we found unanticipated differences between the two sets of scenarios among humans. Surprisingly, even these differences were replicated in LLM responses. Study 2 turned to a contextual feature of human rule application: under forced time delay, human samples rely more heavily on a rule's text than on other considerations such as a rule's purpose.. Our results revealed that some models (Gemini Pro and Claude 3) responded in a human-like manner to a prompt describing either forced delay or time pressure, while others (GPT-4o and Llama 3.2 90b) did not. We argue that the evidence gathered suggests that LLMs have mastery over the concept of rule, with implications for both legal decision making and philosophical inquiry.

Authors:Si Thu, A. Baki Kocaballi
Title: From Prompting to Partnering: Personalization Features for Human-LLM Interactions
Abstract:
Large Language Models (LLMs), such as ChatGPT, exhibit advanced capabilities in generating text, images, and videos. However, their effective use remains constrained by challenges in prompt formulation, personalization, and opaque decision-making processes. To investigate these challenges and identify design opportunities, we conducted a two-phase qualitative study. In Phase 1, we performed in-depth interviews with eight everyday LLM users after they engaged in structured tasks using ChatGPT across both familiar and unfamiliar domains. Our findings revealed key user difficulties in constructing effective prompts, iteratively refining AI-generated responses, and assessing response reliability especially in domains beyond users' expertise. Informed by these insights, we designed a high-fidelity prototype incorporating Reflective Prompting, Section Regeneration, Input-Output Mapping, Confidence Indicators, and a Customization Panel. In Phase 2, user testing of the prototype indicated that these interface-level improvements may prove useful for reducing cognitive load, increasing transparency, and fostering more intuitive and collaborative human-AI interactions. Our study contributes to the growing discourse on human-centred AI, advocating for human-LLM interactions that enhance user agency, transparency, and co-creative interaction, ultimately supporting more intuitive, accessible, and trustworthy generative AI systems.

Authors:Kaleb Mcdowell, Nick Waytowich, Javier Garcia, Stephen Gordon, Bryce Bartlett, Jeremy Gaston
Title: Hybrid Team Tetris: A New Platform For Hybrid Multi-Agent, Multi-Human Teaming
Abstract:
Metcalfe et al (1) argue that the greatest potential for human-AI partnerships lies in their application to highly complex problem spaces. Herein, we discuss three different forms of hybrid team intelligence and posit that across all three forms, the hybridization of man and machine intelligence can be effective under the right conditions. We foresee two significant research and development (R&D) challenges underlying the creation of effective hybrid intelligence. First, rapid advances in machine intelligence and/or fundamental changes in human behaviors or capabilities over time can outpace R&D. Second, the future conditions under which hybrid intelligence will operate are unknown, but unlikely to be the same as the conditions of today. Overcoming both of these challenges requires a deep understanding of multiple human-centric and machine-centric disciplines that creates a large barrier to entry into the field. Herein, we outline an open, shareable research platform that creates a form of hybrid team intelligence that functions under representative future conditions. The intent for the platform is to facilitate new forms of hybrid intelligence research allowing individuals with human-centric or machine-centric backgrounds to rapidly enter the field and initiate research. Our hope is that through open, community research on the platform, state-of-the-art advances in human and machine intelligence can quickly be communicated across what are currently different R&D communities and allow hybrid team intelligence research to stay at the forefront of scientific advancement.

Authors:Divya Perumal, Swaroop Panda
Title: A Deep User Interface for Exploring LLaMa
Abstract:
The growing popularity and widespread adoption of large language models (LLMs) necessitates the development of tools that enhance the effectiveness of user interactions with these models. Understanding the structures and functions of these models poses a significant challenge for users. Visual analytics-driven tools enables users to explore and compare, facilitating better decision-making. This paper presents a visual analytics-driven tool equipped with interactive controls for key hyperparameters, including top-p, frequency and presence penalty, enabling users to explore, examine and compare the outputs of LLMs. In a user study, we assessed the tool's effectiveness, which received favorable feedback for its visual design, with particular commendation for the interface layout and ease of navigation. Additionally, the feedback provided valuable insights for enhancing the effectiveness of Human-LLM interaction tools.

Authors:Nghi Truong, Phanish Puranam, Ilia Testlin
Title: Why Trust in AI May Be Inevitable
Abstract:
In human-AI interactions, explanation is widely seen as necessary for enabling trust in AI systems. We argue that trust, however, may be a pre-requisite because explanation is sometimes impossible. We derive this result from a formalization of explanation as a search process through knowledge networks, where explainers must find paths between shared concepts and the concept to be explained, within finite time. Our model reveals that explanation can fail even under theoretically ideal conditions - when actors are rational, honest, motivated, can communicate perfectly, and possess overlapping knowledge. This is because successful explanation requires not just the existence of shared knowledge but also finding the connection path within time constraints, and it can therefore be rational to cease attempts at explanation before the shared knowledge is discovered. This result has important implications for human-AI interaction: as AI systems, particularly Large Language Models, become more sophisticated and able to generate superficially compelling but spurious explanations, humans may default to trust rather than demand genuine explanations. This creates risks of both misplaced trust and imperfect knowledge integration.

Authors:Max M. Lang, Sol Eskenazi
Title: Telephone Surveys Meet Conversational AI: Evaluating a LLM-Based Telephone Survey System at Scale
Abstract:
Telephone surveys remain a valuable tool for gathering insights but typically require substantial resources in training and coordinating human interviewers. This work presents an AI-driven telephone survey system integrating text-to-speech (TTS), a large language model (LLM), and speech-to-text (STT) that mimics the versatility of human-led interviews (full-duplex dialogues) at scale. We tested the system across two populations, a pilot study in the United States (n = 75) and a large-scale deployment in Peru (n = 2,739), inviting participants via web-based links and contacting them via direct phone calls. The AI agent successfully administered open-ended and closed-ended questions, handled basic clarifications, and dynamically navigated branching logic, allowing fast large-scale survey deployment without interviewer recruitment or training. Our findings demonstrate that while the AI system's probing for qualitative depth was more limited than human interviewers, overall data quality approached human-led standards for structured items. This study represents one of the first successful large-scale deployments of an LLM-based telephone interviewer in a real-world survey context. The AI-powered telephone survey system has the potential for expanding scalable, consistent data collecting across market research, social science, and public opinion studies, thus improving operational efficiency while maintaining appropriate data quality for research.

Authors:Amirhossein Bayat, Melika Emami, Rahim Tafazolli, Atta Quddus
Title: Effects of Linear Modulation of Electrotactile Signals Using a Novel Device on Sensation Naturalness and Perceptual Intensity
Abstract:
Electrotactile feedback is a promising method for delivering haptic sensations, but challenges such as the naturalness of sensations hinder its adoption in commercial devices. In this study, we introduce a novel device that enables the exploration of complex stimulation signals to enhance sensation naturalness. We designed six stimulation signals with linearly modulated frequency, amplitude, or both, across two frequency levels based on a ramp-and-hold shape, aiming to replicate sensation of pressing a button. Our results showed that these modulated signals achieve higher naturalness scores than tonic stimulations, with a 6.8% improvement. Moreover, we examined the relationship between perceived intensity and signal energy for these stimulation patterns. Our findings indicate that, under conditions of constant perceived intensity, signal energy is not uniform across different stimulation patterns. Instead, there is a distinct relationship between the energy levels of different patterns, which is consistently reflected in the energy of the stimulations selected by the participants. Based on our findings, we propose a predictive model that estimates the desired intensity for any stimulation pattern using this relationship between signal energies and the user's preferred intensity for a single reference pattern. This model demonstrated high reliability, with a mean R2 score of 83.33%. Using this approach, intensity calibration for different stimulation patterns can be streamlined, reducing calibration time by 87.5%, as only one out of eight reference pattern must be calibrated. These findings highlight the potential of stimulation signal modulation to improve sensation quality and validate the viability of our predictive model for automating intensity calibration. This approach is an essential step toward delivering complex and naturalistic sensations in advanced haptic systems.

Authors:Yuxin Li, Hao Fang, Wen Liu, Chuantong Cheng, Hongda Chen
Title: Enhancing Subject-Independent Accuracy in fNIRS-based Brain-Computer Interfaces with Optimized Channel Selection
Abstract:
Achieving high subject-independent accuracy in functional near-infrared spectroscopy (fNIRS)-based brain-computer interfaces (BCIs) remains a challenge, particularly when minimizing the number of channels. This study proposes a novel feature extraction scheme and a Pearson correlation-based channel selection algorithm to enhance classification accuracy while reducing hardware complexity. Using an open-access fNIRS dataset, our method improved average accuracy by 28.09% compared to existing approaches, achieving a peak subject-independent accuracy of 95.98% with only two channels. These results demonstrate the potential of our optimized feature extraction and channel selection methods for developing efficient, subject-independent fNIRS-based BCI systems.

Authors:Morteza Behrooz, Preetham Kolari, Fred Zaw, Lindsay Kenzig, Arnav Jhala
Title: Towards Using Voice for Hedonic Shopping Motivations
Abstract:
Besides the utilitarian aspects of online shopping, hedonic motivations play a significant role in shaping the shopping behavior of online users. With the increased popularity of voice-enabled devices, online shopping platforms have attempted to drive online shopping on voice. However, we explain why voice might be more suitable for the hedonic aspects of shopping. We introduce a prototype that enables such focus in a voice experience and share our findings from a qualitative study.

Authors:Gautam Kishore Shahi, Yelena Mejova
Title: Too Little, Too Late: Moderation of Misinformation around the Russo-Ukrainian Conflict
Abstract:
In this study, we examine the role of Twitter as a first line of defense against misinformation by tracking the public engagement with, and the platforms response to, 500 tweets concerning the RussoUkrainian conflict which were identified as misinformation. Using a realtime sample of 543 475 of their retweets, we find that users who geolocate themselves in the U.S. both produce and consume the largest portion of misinformation, however accounts claiming to be in Ukraine are the second largest source. At the time of writing, 84% of these tweets were still available on the platform, especially those having an anti-Russia narrative. For those that did receive some sanctions, the retweeting rate has already stabilized, pointing to ineffectiveness of the measures to stem their spread. These findings point to the need for a change in the existing anti-misinformation system ecosystem. We propose several design and research guidelines for its possible improvement.

Authors:Sonja Rattay, Ville Vakkuri, Marco Rozendaal, Irina Shklovski
Title: "Why do we do this?": Moral Stress and the Affective Experience of Ethics in Practice
Abstract:
A plethora of toolkits, checklists, and workshops have been developed to bridge the well-documented gap between AI ethics principles and practice. Yet little is known about effects of such interventions on practitioners. We conducted an ethnographic investigation in a major European city organization that developed and works to integrate an ethics toolkit into city operations. We find that the integration of ethics tools by technical teams destabilises their boundaries, roles, and mandates around responsibilities and decisions. This lead to emotional discomfort and feelings of vulnerability, which neither toolkit designers nor the organization had accounted for. We leverage the concept of moral stress to argue that this affective experience is a core challenge to the successful integration of ethics tools in technical practice. Even in this best case scenario, organisational structures were not able to deal with moral stress that resulted from attempts to implement responsible technology development practices.

Authors:Adrian Bauske, Arthur Fleig
Title: You Shall Not Pass: Warning Drivers of Unsafe Overtaking Maneuvers on Country Roads by Predicting Safe Sight Distance
Abstract:
Overtaking on country roads with possible opposing traffic is a dangerous maneuver and many proposed assistant systems assume car-to-car communication and sensors currently unavailable in cars. To overcome this limitation, we develop an assistant that uses simple in-car sensors to predict the required sight distance for safe overtaking. Our models predict this from vehicle speeds, accelerations, and 3D map data. In a user study with a Virtual Reality driving simulator (N=25), we compare two UI variants (monitoring-focused vs scheduling-focused). The results reveal that both UIs enable more patient driving and thus increase overall driving safety. While the monitoring-focused UI achieves higher System Usability Score and distracts drivers less, the preferred UI depends on personal preference. Driving data shows predictions were off at times. We investigate and discuss this in a comparison of our models to actual driving behavior and identify crucial model parameters and assumptions that significantly improve model predictions.

Authors:Anna Ravera, Cristina Gena
Title: On the usability of generative AI: Human generative AI
Abstract:
Generative AI systems are transforming content creation, but their usability remains a key challenge. This paper examines usability factors such as user experience, transparency, control, and cognitive load. Common challenges include unpredictability and difficulties in fine-tuning outputs. We review evaluation metrics like efficiency, learnability, and satisfaction, highlighting best practices from various domains. Improving interpretability, intuitive interfaces, and user feedback can enhance usability, making generative AI more accessible and effective.

Authors:Bin Yin, Chong-Yi Liu, Liya Fu, Jinkun Zhang
Title: Teleology-Driven Affective Computing: A Causal Framework for Sustained Well-Being
Abstract:
Affective computing has made significant strides in emotion recognition and generation, yet current approaches mainly focus on short-term pattern recognition and lack a comprehensive framework to guide affective agents toward long-term human well-being. To address this, we propose a teleology-driven affective computing framework that unifies major emotion theories (basic emotion, appraisal, and constructivist approaches) under the premise that affect is an adaptive, goal-directed process that facilitates survival and development. Our framework emphasizes aligning agent responses with both personal/individual and group/collective well-being over extended timescales. We advocate for creating a "dataverse" of personal affective events, capturing the interplay between beliefs, goals, actions, and outcomes through real-world experience sampling and immersive virtual reality. By leveraging causal modeling, this "dataverse" enables AI systems to infer individuals' unique affective concerns and provide tailored interventions for sustained well-being. Additionally, we introduce a meta-reinforcement learning paradigm to train agents in simulated environments, allowing them to adapt to evolving affective concerns and balance hierarchical goals - from immediate emotional needs to long-term self-actualization. This framework shifts the focus from statistical correlations to causal reasoning, enhancing agents' ability to predict and respond proactively to emotional challenges, and offers a foundation for developing personalized, ethically aligned affective systems that promote meaningful human-AI interactions and societal well-being.

Authors:Martin Feick, Xuxin Tang, Raul Garcia-Martin, Alexandru Luchianov, Roderick Wei Xiao Huang, Chang Xiao, Alexa Siu, Mustafa Doga Dogan
Title: Imprinto: Enhancing Infrared Inkjet Watermarking for Human and Machine Perception
Abstract:
Hybrid paper interfaces leverage augmented reality to combine the desired tangibility of paper documents with the affordances of interactive digital media. Typically, virtual content can be embedded through direct links (e.g., QR codes); however, this impacts the aesthetics of the paper print and limits the available visual content space. To address this problem, we present Imprinto, an infrared inkjet watermarking technique that allows for invisible content embeddings only by using off-the-shelf IR inks and a camera. Imprinto was established through a psychophysical experiment, studying how much IR ink can be used while remaining invisible to users regardless of background color. We demonstrate that we can detect invisible IR content through our machine learning pipeline, and we developed an authoring tool that optimizes the amount of IR ink on the color regions of an input document for machine and human detectability. Finally, we demonstrate several applications, including augmenting paper documents and objects.

Authors:Vivianna Fang He, Sihan Li, Phanish Puranam, Feng Lin
Title: Tool or Tutor? Experimental evidence from AI deployment in cancer diagnosis
Abstract:
Professionals increasingly use Artificial Intelligence (AI) to enhance their capabilities and assist with task execution. While prior research has examined these uses separately, their potential interaction remains underexplored. We propose that AI-driven training ("tutor") and AI-assisted task completion ("tool") can have a joint effect on human capability and test this hypothesis in the context of lung cancer diagnosis. In a field experiment with 336 medical students, we manipulated AI deployment in training, in practice, and in both. Our findings reveal that while AI-integrated training and AI assistance independently improved diagnostic performance, their combination yielded the highest accuracy. These results underscore AI's dual role in enhancing human performance through both learning and real-time support, offering insights into AI deployment in professional settings where human expertise remains essential.

Authors:Qiuhai Zeng, Claire Jin, Xinyue Wang, Yuhan Zheng, Qunhua Li
Title: AIRepr: An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science
Abstract:
Large language models (LLMs) are increasingly used to automate data analysis through executable code generation. Yet, data science tasks often admit multiple statistically valid solutions, e.g. different modeling strategies, making it critical to understand the reasoning behind analyses, not just their outcomes. While manual review of LLM-generated code can help ensure statistical soundness, it is labor-intensive and requires expertise. A more scalable approach is to evaluate the underlying workflows - the logical plans guiding code generation. However, it remains unclear how to assess whether a LLM-generated workflow supports reproducible implementations. To address this, we present $\it{AIRepr}$, an $\it{A}$nalyst - $\it{I}$nspector framework for automatically evaluating and improving the $\it{Repr}$oducibility of LLM-generated data analysis workflows. Our framework is grounded in statistical principles and supports scalable, automated assessment. We introduce two novel reproducibility-enhancing prompting strategies and benchmark them against standard prompting across 15 analyst-inspector LLM pairs and 1,032 tasks from three public benchmarks. Our findings show that workflows with higher reproducibility also yield more accurate analyses, and that reproducibility-enhancing prompts substantially improve both metrics. This work provides a foundation for more transparent, reliable, and efficient human-AI collaboration in data science. Our code is publicly available.

Authors:Aditi De, NeuroBits Labs
Title: ZIA: A Theoretical Framework for Zero-Input AI
Abstract:
Zero-Input AI (ZIA) introduces a novel framework for human-computer interaction by enabling proactive intent prediction without explicit user commands. It integrates gaze tracking, bio-signals (EEG, heart rate), and contextual data (time, location, usage history) into a multi-modal model for real-time inference, targeting <100 ms latency. The proposed architecture employs a transformer-based model with cross-modal attention, variational Bayesian inference for uncertainty estimation, and reinforcement learning for adaptive optimization. To support deployment on edge devices (CPUs, TPUs, NPUs), ZIA utilizes quantization, weight pruning, and linear attention to reduce complexity from quadratic to linear with sequence length. Theoretical analysis establishes an information-theoretic bound on prediction error and demonstrates how multi-modal fusion improves accuracy over single-modal approaches. Expected performance suggests 85-90% accuracy with EEG integration and 60-100 ms inference latency. ZIA provides a scalable, privacy-preserving framework for accessibility, healthcare, and consumer applications, advancing AI toward anticipatory intelligence.

Authors:Helmut Degen, Ziran Min, Parinitha Nagaraja
Title: How to explain it to data scientists? A mixed-methods user study about explainable AI, using mental models for explanations
Abstract:
In the context of explainable artificial intelligence (XAI), limited research has identified role-specific explanation needs. This study investigates the explanation needs of data scientists, who are responsible for training, testing, deploying, and maintaining machine learning (ML) models in AI systems. The research aims to determine specific explanation content of data scientists. A task analysis identified user goals and proactive user tasks. Using explanation questions, task-specific explanation needs and content were identified. From these individual explanations, we developed a mental model for explanations, which was validated and revised through a qualitative study (n=12). In a second quantitative study (n=12), we examined which explanation intents (reason, comparison, accuracy, prediction, trust) require which type of explanation content from the mental model. The findings are: F1: Explanation content for data scientists comes from the application domain, system domain, and AI domain. F2: Explanation content can be complex and should be organized sequentially and/or in hierarchies (novelty claim). F3: Explanation content includes context, inputs, evidence, attributes, ranked list, interim results, efficacy principle, and input/output relationships (novelty claim). F4: Explanation content should be organized as a causal story. F5: Standardized explanation questions ensure complete coverage of explanation needs (novelty claim). F6: Refining mental models for explanations increases significantly its quality (novelty claim).

Authors:Caterina Neef, Anja Richert
Title: Likable or Intelligent? Comparing Social Robots and Virtual Agents for Long-term Health Monitoring
Abstract:
Using social robots and virtual agents (VAs) as interfaces for health monitoring systems for older adults offers the possibility of more engaging interactions that can support long-term health and well-being. While robots are characterized by their physical presence, software-based VAs are more scalable and flexible. Few comparisons of these interfaces exist in the human-robot and human-agent interaction domains, especially in long-term and real-world studies. In this work, we examined impressions of social robots and VAs at the beginning and end of an eight-week study in which older adults interacted with these systems independently in their homes. Using a between-subjects design, participants could choose which interface to evaluate during the study. While participants perceived the social robot as somewhat more likable, the VA was perceived as more intelligent. Our work provides a basis for further studies investigating factors most relevant for engaging interactions with social interfaces for long-term health monitoring.

Authors:Xinya Gong, Wenhui Tao, Yuxin Ma
Title: CalliSense: An Interactive Educational Tool for Process-based Learning in Chinese Calligraphy
Abstract:
Process-based learning is crucial for the transmission of intangible cultural heritage, especially in complex arts like Chinese calligraphy, where mastering techniques cannot be achieved by merely observing the final work. To explore the challenges faced in calligraphy heritage transmission, we conducted semi-structured interviews (N=8) as a formative study. Our findings indicate that the lack of calligraphy instructors and tools makes it difficult for students to master brush techniques, and teachers struggle to convey the intricate details and rhythm of brushwork. To address this, we collaborated with calligraphy instructors to develop an educational tool that integrates writing process capture and visualization, showcasing the writing rhythm, hand force, and brush posture. Through empirical studies conducted in multiple teaching workshops, we evaluated the system's effectiveness with teachers (N=4) and students (N=12). The results show that the tool significantly enhances teaching efficiency and aids learners in better understanding brush techniques.

Authors:Axum AI, :, J. Owoyemi, S. Abubakar, A. Owoyemi, T. O. Togunwa, F. C. Madubuko, S. Oyatoye, Z. Oyetolu, K. Akyea, A. O. Mohammed, A. Adebakin
Title: Open-Source Retrieval Augmented Generation Framework for Retrieving Accurate Medication Insights from Formularies for African Healthcare Workers
Abstract:
Accessing accurate medication insights is vital for enhancing patient safety, minimizing errors, and supporting clinical decision-making. However, healthcare professionals in Africa often rely on manual and time-consuming processes to retrieve drug information, exacerbated by limited access to pharmacists due to brain drain and healthcare disparities. This paper presents "Drug Insights," an open-source Retrieval-Augmented Generation (RAG) chatbot designed to streamline medication lookup for healthcare workers in Africa. By leveraging a corpus of Nigerian pharmaceutical data and advanced AI technologies, including Pinecone databases and GPT models, the system delivers accurate, context-specific responses with minimal hallucination. The chatbot integrates prompt engineering and S-BERT evaluation to optimize retrieval and response generation. Preliminary tests, including pharmacist feedback, affirm the tool's potential to improve drug information access while highlighting areas for enhancement, such as UI/UX refinement and extended corpus integration.

Authors:Hongxi Pu, Futian Jiang, Zihao Chen, Xingyue Song
Title: ComposeOn Academy: Transforming Melodic Ideas into Complete Compositions Integrating Music Learning
Abstract:
Music composition has long been recognized as a significant art form. However, existing digital audio workstations and music production software often present high entry barriers for users lacking formal musical training. To address this, we introduce ComposeOn, a music theory-based tool designed for users with limited musical knowledge. ComposeOn enables users to easily extend their melodic ideas into complete compositions and offers simple editing features. By integrating music theory, it explains music creation at beginner, intermediate, and advanced levels. Our user study (N=10) compared ComposeOn with the baseline method, Suno AI, demonstrating that ComposeOn provides a more accessible and enjoyable composing and learning experience for individuals with limited musical skills. ComposeOn bridges the gap between theory and practice, offering an innovative solution as both a composition aid and music education platform. The study also explores the differences between theory-based music creation and generative music, highlighting the former's advantages in personal expression and learning.

Authors:Ella Cutler, Zachary Levonian, S. Thomas Christie
Title: Detecting Student Intent for Chat-Based Intelligent Tutoring Systems
Abstract:
Chat interfaces for intelligent tutoring systems (ITSs) enable interactivity and flexibility. However, when students interact with chat interfaces, they expect dialogue-driven navigation from the system and can express frustration and disinterest if this is not provided. Intent detection systems help students navigate within an ITS, but detecting students' intent during open-ended dialogue is challenging. We designed an intent detection system in a chatbot ITS, classifying a student's intent between continuing the current lesson or switching to a new lesson. We explore the utility of four machine learning approaches for this task - including both conventional classification approaches and fine-tuned large language models - finding that using an intent classifier introduces trade-offs around implementation cost, accuracy, and prediction time. We argue that implementing intent detection in chat interfaces can reduce frustration and support student learning.

Authors:Leonardo Germán Loza Bonora, Julián Grigera, Helmut Degen
Title: A Study on Interaction Complexity and Time
Abstract:
Testing Web User Interfaces (UIs) requires considerable time and effort and resources, most notably participants for user testing. Additionally, the tests results may demand adjustments on the UI, taking further resources and testing. Early tests can make this process less costly with the help of low fidelity prototypes, but it is difficult to conduct user tests on them, and recruiting participants is still necessary. To tackle this issue, there are tools that can predict UI aspects like interaction time, as the well-known KLM model. Another aspect that can be predicted is complexity, and this was achieved by the Big I notation, which can be applied to early UX concepts like lo-fi wireframes. Big I assists developers in estimating the interaction complexity, specified as a function of user steps, which are composed of abstracted user actions. Interaction complexity is expressed in mathematical terms, making the comparison of interaction complexities for various UX concepts easy. However, big I is not able to predict execution time for user actions, which would be very helpful for early assessment of lo-fi prototypes. To address this shortcoming, in this paper we present a study in which we took measurements from real users (n=100) completing tasks in a fictitious website, in order to derive average times per interaction step. Using these results, we were able to study the relationship between interaction complexity and time and ultimately complement big I predictions with time estimates.

Authors:Lew Lefton, Kexin Rong, Chinar Dankhara, Lila Ghemri, Firdous Kausar, A. Hannibal Hamdallahi
Title: A Socratic RAG Approach to Connect Natural Language Queries on Research Topics with Knowledge Organization Systems
Abstract:
In this paper, we propose a Retrieval Augmented Generation (RAG) agent that maps natural language queries about research topics to precise, machine-interpretable semantic entities. Our approach combines RAG with Socratic dialogue to align a user's intuitive understanding of research topics with established Knowledge Organization Systems (KOSs). The proposed approach will effectively bridge "little semantics" (domain-specific KOS structures) with "big semantics" (broad bibliometric repositories), making complex academic taxonomies more accessible. Such agents have the potential for broad use. We illustrate with a sample application called CollabNext, which is a person-centric knowledge graph connecting people, organizations, and research topics. We further describe how the application design has an intentional focus on HBCUs and emerging researchers to raise visibility of people historically rendered invisible in the current science system.

Authors:Sohini Saha, Leslie M. Collins, Sherri L. Smith, Boyla O. Mainsah
Title: User Awareness and Perspectives Survey on Privacy, Security and Usability of Auditory Prostheses
Abstract:
According to the World Health Organization, over 466 million people worldwide suffer from disabling hearing loss, with approximately 34 million of these being children. Hearing aids (HA) and cochlear implants (CI) have become indispensable tools for restoring hearing and enhancing the quality of life for individuals with hearing impairments. Clinical research and consumer studies indicate that users of HAs and CIs report significant improvements in their daily lives, including enhanced communication abilities and social engagement and reduced psychological stress. Modern auditory prosthetic devices are more advanced and interconnected with digital networks to add functionality, such as streaming audio directly from smartphones and other devices, remote adjustments by audiologists, integration with smart home systems, and access to artificial intelligence-driven sound enhancement features. With this interconnectivity, issues surrounding data privacy and security have become increasingly pertinent. There is limited research on the usability perceptions of current HA and CI models from the perspective of end-users. In addition, no studies have investigated consumer mental models during the purchasing process, particularly which factors they prioritize when selecting a device. In this study, we assessed participants' satisfaction levels with various features of their auditory prostheses. This work contributes to the field by addressing gaps in user perceptions of HA and CI usability, identifying key factors in consumer purchasing decisions, and highlighting the need for improved privacy and security awareness and education among users.

Authors:Saniya Vahedian Movahed, Fred Martin
Title: Ask Me Anything: Exploring children's attitudes toward an age-tailored AI-powered chatbot
Abstract:
Conversational agents, such as chatbots, have increasingly found their way into many dimensions of our lives, including entertainment and education. In this exploratory study we built a child-friendly chatbot, "Ask Me Anything" (AMA), and investigated children's attitudes and trust toward AI-driven conversational agents. To prompt targeted questioning from students and drive engagement, AMA is a specialized chatbot that answers only topic--specific questions in three areas--astronomy, sneakers and shoes, and dinosaurs. We tested AMA with 63 students in a K-8 public school in the Northeast USA. Students worked in small groups, interacted with our tool for three to ten minutes, and completed a post survey. We identified three key themes that emerged from student conversational interactions with AMA: expressing wonder, surprise, and curiosity; building trust and developing confidence; and building relationships and anthropomorphizing. Also, we observed a broad attitude of openness and comfort. Students trusted the chatbot responses in general, indicating a high level of trust in and reliance on AI as a source of information. They described AMA as "knowledgeable," "smart," and that they could "trust it." To confirm their perception of reliability, some students tested the chatbot with questions to which they knew the answers. This behavior illustrated a fundamental aspect of children's cognitive development: the process of actively evaluating the credibility of sources. Our work extends and contributes to the existing body of literature that explores children's interactions with conversational agents.

Authors:Christy L. Conroy, Gina M. Brunetti, Angelos Barmpoutis, Emily J. Fox
Title: Integrated Telehealth and Extended Reality to Enhance Home Exercise Adherence Following Total Hip and Knee Arthroplasty
Abstract:
Nearly one million total hip and knee arthroplasties (THA/TKA) are performed annually in the United States, with most patients discharged home and prescribed home exercise programs (HEPs) to enhance lower extremity function. Traditional paper-based HEPs, while accessible and low-cost, often lack engagement and real-time feedback, which are critical for adherence and performance optimization. Extended reality (XR) and telehealth (TH) systems offer promising solutions, combining engagement and feedback, though each has limitations. To address these gaps, we designed and executed a pilot study that compared exercise performance in individuals with THA/TKA using a conventional paper-based HEP versus a proof-of-concept system, dubbed Tele-PhyT, that included the ideal characteristics of a future XR technology that would enable seamless HEP-TH systems, with robust marker-less full body tracking, real-time visual feedback, and performance quantification. The pilot study used a randomized cross-over design and targeted two types of users: therapists and patients. Participants favored Tele- PhyT for its real-time feedback and ease of use, and noted its potential to improve HEP adherence and exercise accuracy.

Authors:Uwe M. Borghoff, Paolo Bottoni, Remo Pareschi
Title: Human-Artificial Interaction in the Age of Agentic AI: A System-Theoretical Approach
Abstract:
This paper presents a novel perspective on human-computer interaction (HCI), framing it as a dynamic interplay between human and computational agents within a networked system. Going beyond traditional interface-based approaches, we emphasize the importance of coordination and communication among heterogeneous agents with different capabilities, roles, and goals. A key distinction is made between multi-agent systems (MAS) and Centaurian systems, which represent two different paradigms of human-AI collaboration. MAS maintain agent autonomy, with structured protocols enabling cooperation, while Centaurian systems deeply integrate human and AI capabilities, creating unified decision-making entities. To formalize these interactions, we introduce a framework for communication spaces, structured into surface, observation, and computation layers, ensuring seamless integration between MAS and Centaurian architectures, where colored Petri nets effectively represent structured Centaurian systems and high-level reconfigurable networks address the dynamic nature of MAS. Our research has practical applications in autonomous robotics, human-in-the-loop decision making, and AI-driven cognitive architectures, and provides a foundation for next-generation hybrid intelligence systems that balance structured coordination with emergent behavior.

Authors:Liwen He, Zichun Guo, Yanru Mo, Yue Wen, Yun Wang
Title: Exploring Embodied Emotional Communication: A Human-oriented Review of Mediated Social Touch
Abstract:
This paper offers a structured understanding of mediated social touch (MST) using a human-oriented approach, through an extensive review of literature spanning tactile interfaces, emotional information, mapping mechanisms, and the dynamics of human-human and human-robot interactions. By investigating the existing and exploratory mapping strategies of the 37 selected MST cases, we established the emotional expression space of MSTs that accommodated a diverse spectrum of emotions by integrating the categorical and Valence-arousal models, showcasing how emotional cues can be translated into tactile signals. Based on the expressive capacity of MSTs, a practical design space was structured encompassing factors such as the body locations, device form, tactile modalities, and parameters. We also proposed various design strategies for MSTs including workflow, evaluation methods, and ethical and cultural considerations, as well as several future research directions. MSTs' potential is reflected not only in conveying emotional information but also in fostering empathy, comfort, and connection in both human-human and human-robot interactions. This paper aims to serve as a comprehensive reference for design researchers and practitioners, which helps expand the scope of emotional communication of MSTs, facilitating the exploration of diverse applications of affective haptics, and enhancing the naturalness and sociability of haptic interaction.

Authors:Ziwei Chen, Jiawen Shen, Luna, Kristen Vaccaro
Title: Hidden Darkness in LLM-Generated Designs: Exploring Dark Patterns in Ecommerce Web Components Generated by LLMs
Abstract:
Recent work has highlighted the risks of LLM-generated content for a wide range of harmful behaviors, including incorrect and harmful code. In this work, we extend this by studying whether LLM-generated web design contains dark patterns. This work evaluated designs of ecommerce web components generated by four popular LLMs: Claude, GPT, Gemini, and Llama. We tested 13 commonly used ecommerce components (e.g., search, product reviews) and used them as prompts to generate a total of 312 components across all models. Over one-third of generated components contain at least one dark pattern. The majority of dark pattern strategies involve hiding crucial information, limiting users' actions, and manipulating them into making decisions through a sense of urgency. Dark patterns are also more frequently produced in components that are related to company interests. These findings highlight the need for interventions to prevent dark patterns during front-end code generation with LLMs and emphasize the importance of expanding ethical design education to a broader audience.

Authors:Blaine Kuehnert, Rachel M. Kim, Jodi Forlizzi, Hoda Heidari
Title: The "Who", "What", and "How" of Responsible AI Governance: A Systematic Review and Meta-Analysis of (Actor, Stage)-Specific Tools
Abstract:
The implementation of responsible AI in an organization is inherently complex due to the involvement of multiple stakeholders, each with their unique set of goals and responsibilities across the entire AI lifecycle. These responsibilities are often ambiguously defined and assigned, leading to confusion, miscommunication, and inefficiencies. Even when responsibilities are clearly defined and assigned to specific roles, the corresponding AI actors lack effective tools to support their execution. Toward closing these gaps, we present a systematic review and comprehensive meta-analysis of the current state of responsible AI tools, focusing on their alignment with specific stakeholder roles and their responsibilities in various AI lifecycle stages. We categorize over 220 tools according to AI actors and stages they address. Our findings reveal significant imbalances across the stakeholder roles and lifecycle stages addressed. The vast majority of available tools have been created to support AI designers and developers specifically during data-centric and statistical modeling stages while neglecting other roles such as institutional leadership, deployers, end-users, and impacted communities, and stages such as value proposition and deployment. The uneven distribution we describe here highlights critical gaps that currently exist in responsible AI governance research and practice. Our analysis reveals that despite the myriad of frameworks and tools for responsible AI, it remains unclear \emph{who} within an organization and \emph{when} in the AI lifecycle a tool applies. Furthermore, existing tools are rarely validated, leaving critical gaps in their usability and effectiveness. These gaps provide a starting point for researchers and practitioners to create more effective and holistic approaches to responsible AI development and governance.

Authors:Jibum Kim, Hanseul Choi, Gaeun Kim, Sunggu Yang, Eunha Baeg, Donggue Kim, Seongwon Jin, Sangwon Byun
Title: Explainable AI-Driven Neural Activity Analysis in Parkinsonian Rats under Electrical Stimulation
Abstract:
Parkinson's disease (PD) is a neurodegenerative disorder characterized by motor dysfunction and abnormal neural oscillations. These symptoms can be modulated through electrical stimulation. Traditional neural activity analysis in PD has typically relied on statistical methods, which often introduce bias owing to the need for expert-driven feature extraction. To address this limitation, we explore an explainable artificial intelligence (XAI) approach to analyze neural activity in Parkinsonian rats receiving electrical stimulation. Electrocorticogram (ECoG) signals were collected before and after electrical stimulation using graphene-based electrodes that enable less-invasive monitoring and stimulation in PD. EEGNet, a convolutional neural network, classified these ECoG signals into pre- and post-stimulation states. We applied layer-wise relevance propagation, an XAI technique, to identify key neural inputs contributing to the model's decisions, incorporating the spatial electrode information matched to the cortex map. The XAI analysis highlighted area-specific importance in beta and gamma frequency bands, which could not be detected through mean comparison analyses relying on feature extraction. These findings demonstrate the potential of XAI in analyzing neural dynamics in neurodegenerative disorders such as PD, suggesting that the integration of graphene-based electrodes with advanced deep learning models offers a promising solution for real-time PD monitoring and therapy.

Authors:Jing Jin, Yutao Zhang, Ruitian Xu, Yixin Chen
Title: An Innovative Brain-Computer Interface Interaction System Based on the Large Language Model
Abstract:
Recent advancements in large language models (LLMs) provide a more effective pathway for upgrading brain-computer interface (BCI) technology in terms of user interaction. The widespread adoption of BCIs in daily application scenarios is still limited by factors such as their single functionality, restricted paradigm design, weak multilingual support, and low levels of intelligence. In this paper, we propose an innovative BCI system that deeply integrates a steady-state visual evoked potential (SSVEP) speller with an LLM application programming interface (API). It allows natural language input through the SSVEP speller and dynamically calls large models to generate SSVEP paradigms. The command prompt, blinking frequency, and layout position are adjustable to meet the user's control requirements in various scenarios. More than ten languages are compatible with the multilingual support of LLM. A variety of task scenarios, such as home appliance control, robotic arm operation, and unmanned aerial vehicle (UAV) management are provided. The task interfaces of the system can be personalized according to the user's habits, usage scenarios, and equipment characteristics. By combining the SSVEP speller with an LLM, the system solves numerous challenges faced by current BCI systems and makes breakthroughs in functionality, intelligence, and multilingual support. The introduction of LLM not only enhances user experience but also expands the potential applications of BCI technology in real-world environments.

Authors:Trishia El Chemaly, Mohit Goyal, Tinglin Duan, Vrushank Phadnis, Sakar Khattar, Bjorn Vlaskamp, Achin Kulshrestha, Eric Lee Turner, Aveek Purohit, Gregory Neiswander, Konstantine Tsotsos
Title: Geometry Aware Passthrough Mitigates Cybersickness
Abstract:
Virtual Reality headsets isolate users from the real-world by restricting their perception to the virtual-world. Video See-Through (VST) headsets address this by utilizing world-facing cameras to create Augmented Reality experiences. However, directly displaying camera feeds causes visual discomfort and cybersickness due to the inaccurate perception of scale and exaggerated motion parallax. This paper demonstrates the potential of geometry aware passthrough systems in mitigating cybersickness through accurate depth perception. We first present a methodology to benchmark and compare passthrough algorithms. Furthermore, we design a protocol to quantitatively measure cybersickness experienced by users in VST headsets. Using this protocol, we conduct a user study to compare direct passthrough and geometry aware passthrough systems. To the best of our knowledge, our study is the first one to reveal significantly reduced nausea, disorientation, and total scores of cybersickness with geometry aware passthrough. It also uncovers several potential avenues to further mitigate visually-induced discomfort.

Authors:Loan Ho, Stefan Schlobach
Title: Dialogue-based Explanations for Logical Reasoning using Structured Argumentation
Abstract:
The problem of explaining inconsistency-tolerant reasoning in knowledge bases (KBs) is a prominent topic in Artificial Intelligence (AI). While there is some work on this problem, the explanations provided by existing approaches often lack critical information or fail to be expressive enough for non-binary conflicts. In this paper, we identify structural weaknesses of the state-of-the-art and propose a generic argumentation-based approach to address these problems. This approach is defined for logics involving reasoning with maximal consistent subsets and shows how any such logic can be translated to argumentation. Our work provides dialogue models as dialectic-proof procedures to compute and explain a query answer wrt inconsistency-tolerant semantics. This allows us to construct dialectical proof trees as explanations, which are more expressive and arguably more intuitive than existing explanation formalisms.

Authors:Hüseyin Aydın, Onuralp Ulusoy, Ilaria Liccardi, Pınar Yolum
Title: Analyzing Privacy Dynamics within Groups using Gamified Auctions
Abstract:
Online shared content, such as group pictures, often contains information about multiple users. Developing technical solutions to manage the privacy of such "co-owned" content is challenging because each co-owner may have different preferences. Recent technical approaches advocate group-decision mechanisms, including auctions, to decide as how best to resolve these differences. However, it is not clear if users would participate in such mechanisms and if they do, whether they would act altruistically. Understanding the privacy dynamics is crucial to develop effective mechanisms for privacy-respecting collaborative systems. Accordingly, this work develops RESOLVE, a privacy auction game to understand the sharing behavior of users in groups. Our results of users' playing the game show that i) the users' understanding of individual vs. group privacy differs significantly; ii) often users fight for their preferences even at the cost of others' privacy; and iii) at times users collaborate to fight for the privacy of others.

Authors:Xiahua Wei, Naveen Kumar, Han Zhang
Title: Addressing Bias in Generative AI: Challenges and Research Opportunities in Information Management
Abstract:
Generative AI technologies, particularly Large Language Models (LLMs), have transformed information management systems but introduced substantial biases that can compromise their effectiveness in informing business decision-making. This challenge presents information management scholars with a unique opportunity to advance the field by identifying and addressing these biases across extensive applications of LLMs. Building on the discussion on bias sources and current methods for detecting and mitigating bias, this paper seeks to identify gaps and opportunities for future research. By incorporating ethical considerations, policy implications, and sociotechnical perspectives, we focus on developing a framework that covers major stakeholders of Generative AI systems, proposing key research questions, and inspiring discussion. Our goal is to provide actionable pathways for researchers to address bias in LLM applications, thereby advancing research in information management that ultimately informs business practices. Our forward-looking framework and research agenda advocate interdisciplinary approaches, innovative methods, dynamic perspectives, and rigorous evaluation to ensure fairness and transparency in Generative AI-driven information systems. We expect this study to serve as a call to action for information management scholars to tackle this critical issue, guiding the improvement of fairness and effectiveness in LLM-based systems for business practice.

Authors:Scott Humr, Mustafa Canan
Title: You Can't Get There From Here: Redefining Information Science to address our sociotechnical futures
Abstract:
Current definitions of Information Science are inadequate to comprehensively describe the nature of its field of study and for addressing the problems that are arising from intelligent technologies. The ubiquitous rise of artificial intelligence applications and their impact on society demands the field of Information Science acknowledge the sociotechnical nature of these technologies. Previous definitions of Information Science over the last six decades have inadequately addressed the environmental, human, and social aspects of these technologies. This perspective piece advocates for an expanded definition of Information Science that fully includes the sociotechnical impacts information has on the conduct of research in this field. Proposing an expanded definition of Information Science that includes the sociotechnical aspects of this field should stimulate both conversation and widen the interdisciplinary lens necessary to address how intelligent technologies may be incorporated into society and our lives more fairly.

Authors:Kunal Swami, Raghu Chittersu, Pranav Adlinge, Rajeev Irny, Shashavali Doodekula, Alok Shukla
Title: PromptArtisan: Multi-instruction Image Editing in Single Pass with Complete Attention Control
Abstract:
We present PromptArtisan, a groundbreaking approach to multi-instruction image editing that achieves remarkable results in a single pass, eliminating the need for time-consuming iterative refinement. Our method empowers users to provide multiple editing instructions, each associated with a specific mask within the image. This flexibility allows for complex edits involving mask intersections or overlaps, enabling the realization of intricate and nuanced image transformations. PromptArtisan leverages a pre-trained InstructPix2Pix model in conjunction with a novel Complete Attention Control Mechanism (CACM). This mechanism ensures precise adherence to user instructions, granting fine-grained control over the editing process. Furthermore, our approach is zero-shot, requiring no additional training, and boasts improved processing complexity compared to traditional iterative methods. By seamlessly integrating multi-instruction capabilities, single-pass efficiency, and complete attention control, PromptArtisan unlocks new possibilities for creative and efficient image editing workflows, catering to both novice and expert users alike.

Authors:Shota Motoura, Ayako Hoshino, Itaru Hosomi, Kunihiko Sadamasa
Title: A Logical Formalisation of a Hypothesis in Weighted Abduction: towards User-Feedback Dialogues
Abstract:
Weighted abduction computes hypotheses that explain input observations. A reasoner of weighted abduction first generates possible hypotheses and then selects the hypothesis that is the most plausible. Since a reasoner employs parameters, called weights, that control its plausibility evaluation function, it can output the most plausible hypothesis according to a specific application using application-specific weights. This versatility makes it applicable from plant operation to cybersecurity or discourse analysis. However, the predetermined application-specific weights are not applicable to all cases of the application. Hence, the hypothesis selected by the reasoner does not necessarily seem the most plausible to the user. In order to resolve this problem, this article proposes two types of user-feedback dialogue protocols, in which the user points out, either positively, negatively or neutrally, properties of the hypotheses presented by the reasoner, and the reasoner regenerates hypotheses that satisfy the user's feedback. As it is required for user-feedback dialogue protocols, we then prove: (i) our protocols necessarily terminate under certain reasonable conditions; (ii) they achieve hypotheses that have the same properties in common as fixed target hypotheses do in common if the user determines the positivity, negativity or neutrality of each pointed-out property based on whether the target hypotheses have that property.

Authors:Mariana Fernandez-Espinosa, Kara Clouse, Dylan Sellars, Danny Tong, Michael Bsales, Sophonie Alcindor, Timothy D Hubbard, Michael Villano, Diego Gómez-Zará
Title: Breaking the Familiarity Bias: Employing Virtual Reality Environments to Enhance Team Formation and Inclusion
Abstract:
Team closeness provides the foundations of trust and communication, contributing to teams' success and viability. However, newcomers often struggle to be included in a team since incumbents tend to interact more with other existing members. Previous research suggests that online communication technologies can help team inclusion by mitigating members' perceived differences. In this study, we test how virtual reality (VR) can promote team closeness when forming teams. We conducted a between-subject experiment with teams working in-person and VR, where two members interacted first, and then a third member was added later to conduct a hidden-profile task. Participants evaluated how close they felt with their teammates after the task was completed. Our results show that VR newcomers felt closer to the incumbents than in-person newcomers. However, incumbents' closeness to newcomers did not vary across conditions. We discuss the implications of these findings and offer suggestions for how VR can promote inclusion.

Authors:Sirui Tao, Ivan Liang, Cindy Peng, Zhiqing Wang, Srishti Palani, Steven P. Dow
Title: DesignWeaver: Dimensional Scaffolding for Text-to-Image Product Design
Abstract:
Generative AI has enabled novice designers to quickly create professional-looking visual representations for product concepts. However, novices have limited domain knowledge that could constrain their ability to write prompts that effectively explore a product design space. To understand how experts explore and communicate about design spaces, we conducted a formative study with 12 experienced product designers and found that experts -- and their less-versed clients -- often use visual references to guide co-design discussions rather than written descriptions. These insights inspired DesignWeaver, an interface that helps novices generate prompts for a text-to-image model by surfacing key product design dimensions from generated images into a palette for quick selection. In a study with 52 novices, DesignWeaver enabled participants to craft longer prompts with more domain-specific vocabularies, resulting in more diverse, innovative product designs. However, the nuanced prompts heightened participants' expectations beyond what current text-to-image models could deliver. We discuss implications for AI-based product design support tools.

Authors:Ali Teymourian, Andrew M. Webb, Taha Gharaibeh, Arushi Ghildiyal, Ibrahim Baggili
Title: SoK: Come Together -- Unifying Security, Information Theory, and Cognition for a Mixed Reality Deception Attack Ontology & Analysis Framework
Abstract:
We present a primary attack ontology and analysis framework for deception attacks in Mixed Reality (MR). This is achieved through multidisciplinary Systematization of Knowledge (SoK), integrating concepts from MR security, information theory, and cognition. While MR grows in popularity, it presents many cybersecurity challenges, particularly concerning deception attacks and their effects on humans. In this paper, we use the Borden-Kopp model of deception to develop a comprehensive ontology of MR deception attacks. Further, we derive two models to assess impact of MR deception attacks on information communication and decision-making. The first, an information-theoretic model, mathematically formalizes the effects of attacks on information communication. The second, a decision-making model, details the effects of attacks on interlaced cognitive processes. Using our ontology and models, we establish the MR Deception Analysis Framework (DAF) to assess the effects of MR deception attacks on information channels, perception, and attention. Our SoK uncovers five key findings for research and practice and identifies five research gaps to guide future work.

Authors:Yuhan Zeng, Yingxuan Shi, Xuehan Huang, Fiona Nah, Ray LC
Title: "Ronaldo's a poser!": How the Use of Generative AI Shapes Debates in Online Forums
Abstract:
Online debates can enhance critical thinking but may escalate into hostile attacks. As humans are increasingly reliant on Generative AI (GenAI) in writing tasks, we need to understand how people utilize GenAI in online debates. To examine the patterns of writing behavior while making arguments with GenAI, we created an online forum for soccer fans to engage in turn-based and free debates in a post format with the assistance of ChatGPT, arguing on the topic of "Messi vs Ronaldo". After 13 sessions of two-part study and semi-structured interviews with 39 participants, we conducted content and thematic analyses to integrate insights from interview transcripts, ChatGPT records, and forum posts. We found that participants prompted ChatGPT for aggressive responses, created posts with similar content and logical fallacies, and sacrificed the use of ChatGPT for better human-human communication. This work uncovers how polarized forum members work with GenAI to engage in debates online.

Authors:Staas de Jong, Gerrit van der Veer
Title: Computational techniques enabling the perception of virtual images exclusive to the retinal afterimage
Abstract:
The retinal afterimage is a widely known effect in the human visual system, which has been studied and used in the context of a number of major art movements. Therefore, when considering the general role of computation in the visual arts, this begs the question whether this effect, too, may be induced using partly automated techniques. If so, it may become a computationally controllable ingredient of (interactive) visual art, and thus take its place among the many other aspects of visual perception which already have preceded it in this sense. The present moment provides additional inspiration to lay the groundwork for extending computer graphics in general with the retinal afterimage: Historically, we are in a phase where some head-mounted stereoscopic AR/VR technologies are now providing eye tracking by default, thereby allowing realtime monitoring of the processes of visual fixation that can induce the retinal afterimage. A logical starting point for general investigation is then shape display via the retinal afterimage, since shape recognition lends itself well to unambiguous reporting. Shape recognition, however, may also occur due to normal vision, which happens simultaneously. Carefully and rigorously excluding this possibility, we develop computational techniques enabling shape display exclusive to the retinal afterimage.

Authors:Camilo Sanchez, Sui Wang, Kaisa Savolainen, Felix Anand Epp, Antti Salovaara
Title: Let's Talk Futures: A Literature Review of HCI's Future-Orientation
Abstract:
HCI is future-oriented by nature: it explores new human--technology interactions and applies the findings to promote and shape vital visions of society. Still, the visions of futures in HCI publications seem largely implicit, techno-deterministic, narrow, and lacking in roadmaps and attention to uncertainties. A literature review centered on this problem examined futuring and its forms in the ACM Digital Library's most frequently cited HCI publications. This analysis entailed developing the four-category framework SPIN, informed by futures studies literature. The results confirm that, while technology indeed drives futuring in HCI, a growing body of HCI research is coming to challenge techno-centric visions. Emerging foci of HCI futuring demonstrate active exploration of uncertainty, a focus on human experience, and contestation of dominant narratives. The paper concludes with insight illuminating factors behind techno-centrism's continued dominance of HCI discourse, as grounding for five opportunities for the field to expand its contribution to futures and anticipation research.

Authors:Frederick George Vickery, Sébastien Kubicki, Charlotte Hoareau, Lucas Brand, Aurelien Duval, Seamus Thierry, Ronan Querrec
Title: Evaluating the Effects of Situated and Embedded Visualisation in Augmented Reality Guidance for Isolated Medical Assistance
Abstract:
One huge advantage of Augmented Reality (AR) is its numerous possibilities of displaying information in the physical world, especially when applying Situated Analytics (SitA). AR devices and their respective interaction techniques allow for supplementary guidance to assist an operator carrying out complex procedures such as medical diagnosis and surgery, for instance. Their usage promotes user autonomy by presenting relevant information when the operator may not necessarily possess expert knowledge of every procedure and may also not have access to external help such as in a remote or isolated situation (e.g., International Space Station, middle of an ocean, desert).In this paper, we propose a comparison of two different forms of AR visualisation: An embedded visualisation and a situated projected visualisation, with the aim to assist operators with the most appropriate visualisation format when carrying out procedures (medical in our case). To evaluate these forms of visualisation, we carried out an experiment involving 23 participants possessing latent/novice medical knowledge. These participant profiles were representative of operators who are medically trained yet do not apply their knowledge every day (e.g., an astronaut in orbit or a sailor out at sea). We discuss our findings which include the advantages of embedded visualised information in terms of precision compared to situated projected information with the accompanying limitations in addition to future improvements to our proposition. We conclude with the prospects of our work, notably the continuation and possibility of evaluating our proposition in a less controlled and real context in collaboration with our national space agency.

Authors:Stephen James Krol, Maria Teresa Llano Rodriguez, Miguel Loor Paredes
Title: Exploring the Needs of Practising Musicians in Co-Creative AI Through Co-Design
Abstract:
Recent advances in generative AI music have resulted in new technologies that are being framed as co-creative tools for musicians with early work demonstrating their potential to add to music practice. While the field has seen many valuable contributions, work that involves practising musicians in the design and development of these tools is limited, with the majority of work including them only once a tool has been developed. In this paper, we present a case study that explores the needs of practising musicians through the co-design of a musical variation system, highlighting the importance of involving a diverse range of musicians throughout the design process and uncovering various design insights. This was achieved through two workshops and a two week ecological evaluation, where musicians from different musical backgrounds offered valuable insights not only on a musical system's design but also on how a musical AI could be integrated into their musical practices.

Authors:Priya Dhawka, Sayamindu Dasgupta
Title: The Social Construction of Visualizations: Practitioner Challenges and Experiences of Visualizing Race and Gender Demographic Data
Abstract:
Data visualizations are increasingly seen as socially constructed, with several recent studies positing that perceptions and interpretations of visualization artifacts are shaped through complex sets of interactions between members of a community. However, most of these works have focused on audiences and researchers, and little is known about if and how practitioners account for the socially constructed framing of data visualization. In this paper, we study and analyze how visualization practitioners understand the influence of their beliefs, values, and biases in their design processes and the challenges they experience. In 17 semi-structured interviews with designers working with race and gender demographic data, we find that a complex mix of factors interact to inform how practitioners approach their design process, including their personal experiences, values, and their understandings of power, neutrality, and politics. Based on our findings, we suggest a series of implications for research and practice in this space.

Authors:Antonin Brun, Ruying Liu, Aryan Shukla, Frances Watson, Jonathan Gratch
Title: Exploring Emotion-Sensitive LLM-Based Conversational AI
Abstract:
Conversational AI chatbots have become increasingly common within the customer service industry. Despite improvements in their emotional development, they often lack the authenticity of real customer service interactions or the competence of service providers. By comparing emotion-sensitive and emotion-insensitive LLM-based chatbots across 30 participants, we aim to explore how emotional sensitivity in chatbots influences perceived competence and overall customer satisfaction in service interactions. Additionally, we employ sentiment analysis techniques to analyze and interpret the emotional content of user inputs. We highlight that perceptions of chatbot trustworthiness and competence were higher in the case of the emotion-sensitive chatbot, even if issue resolution rates were not affected. We discuss implications of improved user satisfaction from emotion-sensitive chatbots and potential applications in support services.

Authors:Matthew Law, Rama Adithya Varanasi
Title: Generative AI & Changing Work: Systematic Review of Practitioner-led Work Transformations through the Lens of Job Crafting
Abstract:
Widespread integration of Generative AI tools is transforming white-collar work, reshaping how workers define their roles, manage their tasks, and collaborate with peers. This has created a need to develop an overarching understanding of common worker-driven patterns around these transformations. To fill this gap, we conducted a systematic literature review of 23 studies from the ACM Digital Library that focused on workers' lived-experiences and practitioners with GenAI. Our findings reveal that while many professionals have delegated routine tasks to GenAI to focus on core responsibilities, they have also taken on new forms of AI managerial labor to monitor and refine GenAI outputs. Additionally, practitioners have restructured collaborations, sometimes bypassing traditional peer and subordinate interactions in favor of GenAI assistance. These shifts have fragmented cohesive tasks into piecework creating tensions around role boundaries and professional identity. Our analysis suggests that current frameworks, like job crafting, need to evolve to address the complexities of GenAI-driven transformations.

Authors:Artem Dementyev, Dimitri Kanevsky, Samuel J. Yang, Mathieu Parvaix, Chiong Lai, Alex Olwal
Title: SpeechCompass: Enhancing Mobile Captioning with Diarization and Directional Guidance via Multi-Microphone Localization
Abstract:
Speech-to-text capabilities on mobile devices have proven helpful for hearing and speech accessibility, language translation, note-taking, and meeting transcripts. However, our foundational large-scale survey (n=263) shows that the inability to distinguish and indicate speaker direction makes them challenging in group conversations. SpeechCompass addresses this limitation through real-time, multi-microphone speech localization, where the direction of speech allows visual separation and guidance (e.g., arrows) in the user interface. We introduce efficient real-time audio localization algorithms and custom sound perception hardware running on a low-power microcontroller and four integrated microphones, which we characterize in technical evaluations. Informed by a large-scale survey (n=494), we conducted an in-person study of group conversations with eight frequent users of mobile speech-to-text, who provided feedback on five visualization styles. The value of diarization and visualizing localization was consistent across participants, with everyone agreeing on the value and potential of directional guidance for group conversations.

Authors:Jiaxin Xu, Sterre Anna Mariam van der Horst, Chao Zhang, Raymond H. Cuijpers, Wijnand A. IJsselsteijn
Title: Robot-Initiated Social Control of Sedentary Behavior: Comparing the Impact of Relationship- and Target-Focused Strategies
Abstract:
To design social robots to effectively promote health behavior change, it is essential to understand how people respond to various health communication strategies employed by these robots. This study examines the effectiveness of two types of social control strategies from a social robot, relationship-focused strategies (emphasizing relational consequences) and target-focused strategies (emphasizing health consequences), in encouraging people to reduce sedentary behavior. A two-session lab experiment was conducted (n = 135), where participants first played a game with a robot, followed by the robot persuading them to stand up and move using one of the strategies. Half of the participants joined a second session to have a repeated interaction with the robot. Results showed that relationship-focused strategies motivated participants to stay active longer. Repeated sessions did not strengthen participants' relationship with the robot, but those who felt more attached to the robot responded more actively to the target-focused strategies. These findings offer valuable insights for designing persuasive strategies for social robots in health communication contexts.

Authors:Tanguy Cazalets, Joni Dambre
Title: Word Synchronization Challenge: A Benchmark for Word Association Responses for LLMs
Abstract:
This paper introduces the Word Synchronization Challenge, a novel benchmark to evaluate large language models (LLMs) in Human-Computer Interaction (HCI). This benchmark uses a dynamic game-like framework to test LLMs ability to mimic human cognitive processes through word associations. By simulating complex human interactions, it assesses how LLMs interpret and align with human thought patterns during conversational exchanges, which are essential for effective social partnerships in HCI. Initial findings highlight the influence of model sophistication on performance, offering insights into the models capabilities to engage in meaningful social interactions and adapt behaviors in human-like ways. This research advances the understanding of LLMs potential to replicate or diverge from human cognitive functions, paving the way for more nuanced and empathetic human-machine collaborations.

Authors:Ellen Simpson, Bryan Semaan
Title: Infrastructures for Inspiration: The Routine Construction of Creative Identity and Inspiration
Abstract:
Online, visual artists have more places than ever to routinely share their creative work and connect with other artists. These interactions support the routine enactment of creative identity in artists and provide inspirational opportunities for artists. As creative work shifts online, interactions between artists and routines around how these artists get inspired to do creative work are mediated by and through the logics of the online platforms where they take place. In an interview study of 22 artists, this paper explores the interplay between the development of artists' creative identities and the, at times, contradictory practices they have around getting inspired. We find platforms which support the disciplined practice of creative work while supporting spontaneous moments of inspiration, play an increasing role in passive approaches to searching for inspiration, and foster numerous small community spaces for artists to negotiate their creative identities. We discuss how platforms can better support and embed mechanisms for inspiration into their infrastructures into their design and platform policy.

Authors:Sean Kim, Lydia B. Chilton
Title: AI Humor Generation: Cognitive, Social and Creative Skills for Effective Humor
Abstract:
Humor is a social binding agent. It is an act of creativity that can provoke emotional reactions on a broad range of topics. Humor has long been thought to be "too human" for AI to generate. However, humans are complex, and humor requires our complex set of skills: cognitive reasoning, social understanding, a broad base of knowledge, creative thinking, and audience understanding. We explore whether giving AI such skills enables it to write humor. We target one audience: Gen Z humor fans. We ask people to rate meme caption humor from three sources: highly upvoted human captions, 2) basic LLMs, and 3) LLMs captions with humor skills. We find that users like LLMs captions with humor skills more than basic LLMs and almost on par with top-rated humor written by people. We discuss how giving AI human-like skills can help it generate communication that resonates with people.

Authors:Anamaria Crisan, Andrew M. McNutt
Title: Linting is People! Exploring the Potential of Human Computation as a Sociotechnical Linter of Data Visualizations
Abstract:
Traditionally, linters are code analysis tools that help developers by flagging potential issues from syntax and logic errors to enforcing syntactical and stylistic conventions. Recently, linting has been taken as an interface metaphor, allowing it to be extended to more complex inputs, such as visualizations, which demand a broader perspective and alternative approach to evaluation. We explore a further extended consideration of linting inputs, and modes of evaluation, across the puritanical, neutral, and rebellious dimensions. We specifically investigate the potential for leveraging human computation in linting operations through Community Notes -- crowd-sourced contextual text snippets aimed at checking and critiquing potentially accurate or misleading content on social media. We demonstrate that human-powered assessments not only identify misleading or error-prone visualizations but that integrating human computation enhances traditional linting by offering social insights. As is required these days, we consider the implications of building linters powered by Artificial Intelligence.

Authors:Namhee Kim, Woojin Park
Title: Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation
Abstract:
Traditional autonomous driving systems often struggle with reasoning in complex, unexpected scenarios due to limited comprehension of spatial relationships. In response, this study introduces a Large Language Model (LLM)-based Autonomous Driving (AD) assistance system that integrates a vision adapter and an LLM reasoning module to enhance visual understanding and decision-making. The vision adapter, combining YOLOv4 and Vision Transformer (ViT), extracts comprehensive visual features, while GPT-4 enables human-like spatial reasoning and response generation. Experimental evaluations with 45 experienced drivers revealed that the system closely mirrors human performance in describing situations and moderately aligns with human decisions in generating appropriate responses.

Authors:Alan Joy, Joseph O'Hagan
Title: Acceptance of an Augmented Society: Initial Explorations into the Acceptability of Augmenting Real World Locations
Abstract:
Augmented reality (AR) will enable individuals to share and experience content augmented at real world locations with ease. But what protections and restrictions should be in place? Should, for example, anyone be able to post any content they wish at a place of religious or cultural significance? We developed a smartphone app to give individuals hands-on experience posting and sharing AR content. After using our app, we investigated their attitudes towards posting different types of AR content (of an artistic, protest, social, informative, and commercial nature) in a variety of locations (cultural sites, religious sites, residential areas, public spaces, government buildings, and tourist points of interests). Our results show individuals expect restrictions to be in place to control who can post AR content at some locations, in particular those of religious and cultural significance. We also report individuals prefer augmentations to fit contextually within the environment they are posted, and expect the posting and sharing of AR content to adhere to the same regulations/legislation as social media platforms.

Authors:Advay Kumar, Stephanie Simangunsong, Pamela Carreno-Medrano, Akansel Cosgun
Title: Mixed Reality Outperforms Virtual Reality for Remote Error Resolution in Pick-and-Place Tasks
Abstract:
This study evaluates the performance and usability of Mixed Reality (MR), Virtual Reality (VR), and camera stream interfaces for remote error resolution tasks, such as correcting warehouse packaging errors. Specifically, we consider a scenario where a robotic arm halts after detecting an error, requiring a remote operator to intervene and resolve it via pick-and-place actions. Twenty-one participants performed simulated pick-and-place tasks using each interface. A linear mixed model (LMM) analysis of task resolution time, usability scores (SUS), and mental workload scores (NASA-TLX) showed that the MR interface outperformed both VR and camera interfaces. MR enabled significantly faster task completion, was rated higher in usability, and was perceived to be less cognitively demanding. Notably, the MR interface, which projected a virtual robot onto a physical table, provided superior spatial understanding and physical reference cues. Post-study surveys further confirmed participants' preference for MR over other interfaces.

Authors:Surabhi S Nath, Guiomar del Cuvillo y Schröder, Claire E. Stevenson
Title: Pencils to Pixels: A Systematic Study of Creative Drawings across Children, Adults and AI
Abstract:
Can we derive computational metrics to quantify visual creativity in drawings across intelligent agents, while accounting for inherent differences in technical skill and style? To answer this, we curate a novel dataset consisting of 1338 drawings by children, adults and AI on a creative drawing task. We characterize two aspects of the drawings -- (1) style and (2) content. For style, we define measures of ink density, ink distribution and number of elements. For content, we use expert-annotated categories to study conceptual diversity, and image and text embeddings to compute distance measures. We compare the style, content and creativity of children, adults and AI drawings and build simple models to predict expert and automated creativity scores. We find significant differences in style and content in the groups -- children's drawings had more components, AI drawings had greater ink density, and adult drawings revealed maximum conceptual diversity. Notably, we highlight a misalignment between creativity judgments obtained through expert and automated ratings and discuss its implications. Through these efforts, our work provides, to the best of our knowledge, the first framework for studying human and artificial creativity beyond the textual modality, and attempts to arrive at the domain-agnostic principles underlying creativity. Our data and scripts are available on GitHub.

Authors:Antonio La Torre, Marco Angelini
Title: Cyri: A Conversational AI-based Assistant for Supporting the Human User in Detecting and Responding to Phishing Attacks
Abstract:
This work introduces Cyri, an AI-powered conversational assistant designed to support a human user in detecting and analyzing phishing emails by leveraging Large Language Models. Cyri has been designed to scrutinize emails for semantic features used in phishing attacks, such as urgency, and undesirable consequences, using an approach that unifies features already established in the literature with others by Cyri features extraction methodology. Cyri can be directly plugged into a client mail or webmail, ensuring seamless integration with the user's email workflow while maintaining data privacy through local processing. By performing analyses on the user's machine, Cyri eliminates the need to transmit sensitive email data over the internet, reducing associated security risks. The Cyri user interface has been designed to reduce habituation effects and enhance user engagement. It employs dynamic visual cues and context-specific explanations to keep users alert and informed while using emails. Additionally, it allows users to explore identified malicious semantic features both through conversation with the agent and visual exploration, obtaining the advantages of both modalities for expert or non-expert users. It also allows users to keep track of the conversation, supports the user in solving additional questions on both computed features or new parts of the mail, and applies its detection on demand. To evaluate Cyri, we crafted a comprehensive dataset of 420 phishing emails and 420 legitimate emails. Results demonstrate high effectiveness in identifying critical phishing semantic features fundamental to phishing detection. A user study involving 10 participants, both experts and non-experts, evaluated Cyri's effectiveness and usability. Results indicated that Cyri significantly aided users in identifying phishing emails and enhanced their understanding of phishing tactics.

Authors:Jessica Eggers, Angela Dai, Matthew C. Gombolay
Title: Use of Winsome Robots for Understanding Human Feedback (UWU)
Abstract:
As social robots become more common, many have adopted cute aesthetics aiming to enhance user comfort and acceptance. However, the effect of this aesthetic choice on human feedback in reinforcement learning scenarios remains unclear. Previous research has shown that humans tend to give more positive than negative feedback, which can cause failure to reach optimal robot behavior. We hypothesize that this positive bias may be exacerbated by the robot's level of perceived cuteness. To investigate, we conducted a user study where participants critique a robot's trajectories while it performs a task. We then analyzed the impact of the robot's aesthetic cuteness on the type of participant feedback. Our results suggest that there is a shift in the ratio of positive to negative feedback when perceived cuteness changes. In light of this, we experiment with a stochastic version of TAMER which adapts based on the user's level of positive feedback bias to mitigate these effects.

Authors:Joshua C. Yang, Fynn Bachmann
Title: Bridging Voting and Deliberation with Algorithms: Field Insights from vTaiwan and Kultur Komitee
Abstract:
Democratic processes increasingly aim to integrate large-scale voting with face-to-face deliberation, addressing the challenge of reconciling individual preferences with collective decision-making. This work introduces new methods that use algorithms and computational tools to bridge online voting with face-to-face deliberation, tested in two real-world scenarios: Kultur Komitee 2024 (KK24) and vTaiwan. These case studies highlight the practical applications and impacts of the proposed methods. We present three key contributions: (1) Preference-based Clustering for Deliberation (PCD), which enables both in-depth and broad discussions in deliberative settings by computing homogeneous and heterogeneous group compositions with balanced and adjustable group sizes; (2) Human-in-the-loop MES, a practical method that enhances the Method of Equal Shares (MES) algorithm with real-time digital feedback. This builds algorithmic trust by giving participants full control over how much decision-making is delegated to the voting aggregation algorithm as compared to deliberation; and (3) the ReadTheRoom deliberation method, which uses opinion space mapping to identify agreement and divergence, along with spectrum-based preference visualisation to track opinion shifts during deliberation. This approach enhances transparency by clarifying collective sentiment and fosters collaboration by encouraging participants to engage constructively with differing perspectives. By introducing these actionable frameworks, this research extends in-person deliberation with scalable digital methods that address the complexities of modern decision-making in participatory processes.

Authors:Patrick Gildersleve, Anna Beers, Viviane Ito, Agustin Orozco, Francesca Tripodi
Title: WikiReddit: Tracing Information and Attention Flows Between Online Platforms
Abstract:
The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and evolves across the web because platform-specific data is often siloed and restricted by linguistic barriers. To address this gap, we present a comprehensive, multilingual dataset capturing all Wikipedia mentions and links shared in posts and comments on Reddit 2020-2023, excluding those from private and NSFW subreddits. Each linked Wikipedia article is enriched with revision history, page view data, article ID, redirects, and Wikidata identifiers. Through a research agreement with Reddit, our dataset ensures user privacy while providing a query and ID mechanism that integrates with the Reddit and Wikipedia APIs. This enables extended analyses for researchers studying how information flows across platforms. For example, Reddit discussions use Wikipedia for deliberation and fact-checking which subsequently influences Wikipedia content, by driving traffic to articles or inspiring edits. By analyzing the relationship between information shared and discussed on these platforms, our dataset provides a foundation for examining the interplay between social media discourse and collaborative knowledge consumption and production.

Authors:Reyhaneh Sabbagh Gol, Dimitar Valkov, Lars Linsen
Title: XMTC: Explainable Early Classification of Multivariate Time Series in Reach-to-Grasp Hand Kinematics
Abstract:
Hand kinematics can be measured in Human-Computer Interaction (HCI) with the intention to predict the user's intention in a reach-to-grasp action. Using multiple hand sensors, multivariate time series data are being captured. Given a number of possible actions on a number of objects, the goal is to classify the multivariate time series data, where the class shall be predicted as early as possible. Many machine-learning methods have been developed for such classification tasks, where different approaches produce favorable solutions on different data sets. We, therefore, employ an ensemble approach that includes and weights different approaches. To provide a trustworthy classification production, we present the XMTC tool that incorporates coordinated multiple-view visualizations to analyze the predictions. Temporal accuracy plots, confusion matrix heatmaps, temporal confidence heatmaps, and partial dependence plots allow for the identification of the best trade-off between early prediction and prediction quality, the detection and analysis of challenging classification conditions, and the investigation of the prediction evolution in an overview and detail manner. We employ XMTC to real-world HCI data in multiple scenarios and show that good classification predictions can be achieved early on with our classifier as well as which conditions are easy to distinguish, which multivariate time series measurements impose challenges, and which features have most impact.

Authors:Vlasis Kasapakis, Leonel Morgado
Title: Ancient Greek Technology: An Immersive Learning Use Case Described Using a Co-Intelligent Custom ChatGPT Assistant
Abstract:
Achieving consistency in immersive learning case descriptions is essential but challenging due to variations in research focus, methodology, and researchers' background. We address these challenges by leveraging the Immersive Learning Case Sheet (ILCS), a methodological instrument to standardize case descriptions, that we applied to an immersive learning case on ancient Greek technology in VRChat. Research team members had differing levels of familiarity with the ILCS and the case content, so we developed a custom ChatGPT assistant to facilitate consistent terminology and process alignment across the team. This paper constitutes an example of how structured case reports can be a novel contribution to immersive learning literature. Our findings demonstrate how the ILCS supports structured reflection and interpretation of the case. Further we report that the use of a ChatGPT assistant significantly sup-ports the coherence and quality of the team members development of the final ILCS. This exposes the potential of employing AI-driven tools to enhance collaboration and standardization of research practices in qualitative educational research. However, we also discuss the limitations and challenges, including reliance on AI for interpretive tasks and managing varied levels of expertise within the team. This study thus provides insights into the practical application of AI in standardizing immersive learning research processes.

Authors:Jiaxi Yang, Haowen Hou
Title: RWKV-UI: UI Understanding with Enhanced Perception and Reasoning
Abstract:
Existing Visual Language Modelsoften struggle with information loss and limited reasoning abilities when handling high-resolution web interfaces that combine complex visual, textual, and interactive elements. These challenges are particularly evident in tasks requiring webpage layout comprehension and multi-step interactive reasoning. To address these challenges, we propose RWKV-UI, a Visual Language Model based on the RWKV architecture, specifically designed to handle high-resolution UI images. During model training, we introduce layout detection as a visual prompt to help the model better understand the webpage layout structures. Additionally, we design a visual prompt based on the Chain-of-Thought(CoT) mechanism, which enhances the model's ability to understand and reason about webpage content through reasoning chains. Experimental results show that RWKV-UI demonstrates significant performance improvements in high-resolution UI understanding and interactive reasoning tasks.

Authors:Qi Xi, Shulin Li, Zhiqi Gao, Zibo Zhang, Shunye Tang, Jianchao Zhang, Liangxu Wang, Yiru Niu, Yan Zhang, Binhui Wang
Title: How to Make Your Multi-Image Posts Popular? An Approach to Enhanced Grid for Nine Images on Social Media
Abstract:
The nine-grid layout is commonly used for multi-image posts, arranging nine images in a tic-tac-toe board. This layout effectively presents content within limited space. Moreover, due to the numerous possible arrangements within the nine-image grid, the optimal arrangement that yields the highest level of attractiveness remains unknown. Our study investigates how the arrangement of images within a nine-grid layout affects the overall popularity of the image set, aiming to explore alignment schemes more aligned with user preferences. Based on survey results regarding user preferences in image arrangement, we have identified two ordering sequences that are widely recognized: sequential order and center prioritization, considering both image visual content and aesthetic quality as alignment metrics, resulting in four layout schemes. Finally, we recruited participants to annotate various layout schemes of the same set of images. Our experience-centered evaluation indicates that layout schemes based on aesthetic quality outperformed others. This research yields empirical evidence supporting the optimization of the nine-grid layout for multi-image posts, thereby furnishing content creators with valuable insights to enhance both attractiveness and user experience.

Authors:Ananth N. Ramaseri-Chandra, Hassan Reza
Title: Dynamic Cybersickness Mitigation via Adaptive FFR and FoV adjustments
Abstract:
This paper presents a novel adaptive Virtual Reality (VR) system that aims to mitigate cybersickness in immersive environments through dynamic, real-time adjustments. The system predicts cybersickness levels in real-time using a machine learning (ML) model trained on head tracking and kinematic data. The adaptive system adjusts foveated rendering (FFR) strength and field of view (FOV) to enhance user comfort. With a goal to balance usability with system performance, we believe our approach will optimize both user experience and performance. Adapting responsively to user needs, our work explores the potential of a machine learning-based feedback loop for user experience management, contributing to a user-centric VR system design.

Authors:Nina Freise, Marius Heitlinger, Ruben Nuredini, Gerrit Meixner
Title: Automatic Prompt Optimization Techniques: Exploring the Potential for Synthetic Data Generation
Abstract:
Artificial Intelligence (AI) advancement is heavily dependent on access to large-scale, high-quality training data. However, in specialized domains such as healthcare, data acquisition faces significant constraints due to privacy regulations, ethical considerations, and limited availability. While synthetic data generation offers a promising solution, conventional approaches typically require substantial real data for training generative models. The emergence of large-scale prompt-based models presents new opportunities for synthetic data generation without direct access to protected data. However, crafting effective prompts for domain-specific data generation remains challenging, and manual prompt engineering proves insufficient for achieving output with sufficient precision and authenticity. We review recent developments in automatic prompt optimization, following PRISMA guidelines. We analyze six peer-reviewed studies published between 2020 and 2024 that focus on automatic data-free prompt optimization methods. Our analysis reveals three approaches: feedback-driven, error-based, and control-theoretic. Although all approaches demonstrate promising capabilities in prompt refinement and adaptation, our findings suggest the need for an integrated framework that combines complementary optimization techniques to enhance synthetic data generation while minimizing manual intervention. We propose future research directions toward developing robust, iterative prompt optimization frameworks capable of improving the quality of synthetic data. This advancement can be particularly crucial for sensitive fields and in specialized domains where data access is restricted, potentially transforming how we approach synthetic data generation for AI development.

Authors:Mehul Agarwal, Gauri Agarwal, Santiago Benoit, Andrew Lippman, Jean Oh
Title: Secure & Personalized Music-to-Video Generation via CHARCHA
Abstract:
Music is a deeply personal experience and our aim is to enhance this with a fully-automated pipeline for personalized music video generation. Our work allows listeners to not just be consumers but co-creators in the music video generation process by creating personalized, consistent and context-driven visuals based on lyrics, rhythm and emotion in the music. The pipeline combines multimodal translation and generation techniques and utilizes low-rank adaptation on listeners' images to create immersive music videos that reflect both the music and the individual. To ensure the ethical use of users' identity, we also introduce CHARCHA (patent pending), a facial identity verification protocol that protects people against unauthorized use of their face while at the same time collecting authorized images from users for personalizing their videos. This paper thus provides a secure and innovative framework for creating deeply personalized music videos.

Authors:Yigang Qin, Yanheng Li, EunJeong Cheon
Title: Encountering Robotic Art: The Social, Material, and Temporal Processes of Creation with Machines
Abstract:
Robots extend beyond the tools of productivity; they also contribute to creativity. While typically defined as utility-driven technologies designed for productive or social settings, the role of robots in creative settings remains underexplored. This paper examines how robots participate in artistic creation. Through semi-structured interviews with robotic artists, we analyze the impact of robots on artistic processes and outcomes. We identify the critical roles of social interaction, material properties, and temporal dynamics in facilitating creativity. Our findings reveal that creativity emerges from the co-constitution of artists, robots, and audiences within spatial-temporal dimensions. Based on these insights, we propose several implications for socially informed, material-attentive, and process-oriented approaches to creation with computing systems. These approaches can inform the domains of HCI, including media and art creation, craft, digital fabrication, and tangible computing.

Authors:Isabella Pu, Jeff Snyder, Naomi Ehrich Leonard
Title: The Beatbots: A Musician-Informed Multi-Robot Percussion Quartet
Abstract:
Artistic creation is often seen as a uniquely human endeavor, yet robots bring distinct advantages to music-making, such as precise tempo control, unpredictable rhythmic complexities, and the ability to coordinate intricate human and robot performances. While many robotic music systems aim to mimic human musicianship, our work emphasizes the unique strengths of robots, resulting in a novel multi-robot performance instrument called the Beatbots, capable of producing music that is challenging for humans to replicate using current methods. The Beatbots were designed using an ``informed prototyping'' process, incorporating feedback from three musicians throughout development. We evaluated the Beatbots through a live public performance, surveying participants (N=28) to understand how they perceived and interacted with the robotic performance. Results show that participants valued the playfulness of the experience, the aesthetics of the robot system, and the unconventional robot-generated music. Expert musicians and non-expert roboticists demonstrated especially positive mindset shifts during the performance, although participants across all demographics had favorable responses. We propose design principles to guide the development of future robotic music systems and identify key robotic music affordances that our musician consultants considered particularly important for robotic music performance.

Authors:Felix Tener, Joel Lanir
Title: Guiding, not Driving: Design and Evaluation of a Command-Based User Interface for Teleoperation of Autonomous Vehicles
Abstract:
Autonomous vehicles (AVs) are rapidly evolving as an innovative mode of transportation. However, the consensus in both industry and academia is that AVs cannot independently resolve all traffic scenarios. Consequently, the need for remote human assistance becomes clear. To enable the widespread integration of AVs on public roadways, it is imperative to develop novel models for remote operation. One such model is tele-assistance, which promotes delegating low-level maneuvers to automation through high-level directives. Our study investigates the design and evaluation of a new command-based tele-assistance user interface for the teleoperation of AVs. First, by integrating various control paradigms and interaction concepts, we created a simulation-based, high-fidelity interactive prototype consisting of 175 screens. Next, we conducted a comprehensive usability study with 14 expert teleoperators to assess the acceptance and usability of the system. Finally, we formulated high-level insights and guidelines for designing command-based user interfaces for the remote operation of AVs.

Authors:Juliette Zaccour, Reuben Binns, Luc Rocher
Title: Access Denied: Meaningful Data Access for Quantitative Algorithm Audits
Abstract:
Independent algorithm audits hold the promise of bringing accountability to automated decision-making. However, third-party audits are often hindered by access restrictions, forcing auditors to rely on limited, low-quality data. To study how these limitations impact research integrity, we conduct audit simulations on two realistic case studies for recidivism and healthcare coverage prediction. We examine the accuracy of estimating group parity metrics across three levels of access: (a) aggregated statistics, (b) individual-level data with model outputs, and (c) individual-level data without model outputs. Despite selecting one of the simplest tasks for algorithmic auditing, we find that data minimization and anonymization practices can strongly increase error rates on individual-level data, leading to unreliable assessments. We discuss implications for independent auditors, as well as potential avenues for HCI researchers and regulators to improve data access and enable both reliable and holistic evaluations.

Authors:Alexandra Gonzalez, J. Nathan Matias
Title: Measuring the Mental Health of Content Reviewers, a Systematic Review
Abstract:
Artificial intelligence and social computing rely on hundreds of thousands of content reviewers to classify high volumes of harmful and forbidden content. Many workers report long-term, potentially irreversible psychological harm. This work is similar to activities that cause psychological harm to other kinds of helping professionals even after small doses of exposure. Yet researchers struggle to measure the mental health of content reviewers well enough to inform diagnoses, evaluate workplace improvements, hold employers accountable, or advance scientific understanding. This systematic review summarizes psychological measures from other professions and relates them to the experiences of content reviewers. After identifying 1,673 potential papers, we reviewed 143 that validate measures in related occupations. We summarize the uses of psychological measurement for content reviewing, differences between clinical and research measures, and 12 measures that are adaptable to content reviewing. We find serious gaps in measurement validity in regions where content review labor is common. Overall, we argue for reliable measures of content reviewer mental health that match the nature of the work and are culturally-relevant.

Authors:Ljubisa Bojic, Zorica Dodevska, Yashar Deldjoo, Nenad Pantelic
Title: Towards Recommender Systems LLMs Playground (RecSysLLMsP): Exploring Polarization and Engagement in Simulated Social Networks
Abstract:
Given the exponential advancement in AI technologies and the potential escalation of harmful effects from recommendation systems, it is crucial to simulate and evaluate these effects early on. Doing so can help prevent possible damage to both societies and technology companies. This paper introduces the Recommender Systems LLMs Playground (RecSysLLMsP), a novel simulation framework leveraging Large Language Models (LLMs) to explore the impacts of different content recommendation setups on user engagement and polarization in social networks. By creating diverse AI agents (AgentPrompts) with descriptive, static, and dynamic attributes, we assess their autonomous behaviour across three scenarios: Plurality, Balanced, and Similarity. Our findings reveal that the Similarity Scenario, which aligns content with user preferences, maximizes engagement while potentially fostering echo chambers. Conversely, the Plurality Scenario promotes diverse interactions but produces mixed engagement results. Our study emphasizes the need for a careful balance in recommender system designs to enhance user satisfaction while mitigating societal polarization. It underscores the unique value and challenges of incorporating LLMs into simulation environments. The benefits of RecSysLLMsP lie in its potential to calculate polarization effects, which is crucial for assessing societal impacts and determining user engagement levels with diverse recommender system setups. This advantage is essential for developing and maintaining a successful business model for social media companies. However, the study's limitations revolve around accurately emulating reality. Future efforts should validate the similarity in behaviour between real humans and AgentPrompts and establish metrics for measuring polarization scores.

Authors:Simon Schneegans, Lori Neary, Markus Flatken, Andreas Gerndt
Title: STRIELAD -- A Scalable Toolkit for Real-time Interactive Exploration of Large Atmospheric Datasets
Abstract:
Technological advances in high performance computing and maturing physical models allow scientists to simulate weather and climate evolutions with an increasing accuracy. While this improved accuracy allows us to explore complex dynamical interactions within such physical systems, inconceivable a few years ago, it also results in grand challenges regarding the data visualization and analytics process. We present STRIELAD, a scalable weather analytics toolkit, which allows for interactive exploration and real-time visualization of such large scale datasets. It combines parallel and distributed feature extraction using high-performance computing resources with smart level-of-detail rendering methods to assure interactivity during the complete analysis process.

Authors:Ryann M. Perez, Marie Shimogawa, Yanan Chang, Hoang Anh T. Phan, Jason G. Marmorstein, Evan S. K. Yanagawa, E. James Petersson
Title: Large Language Models for Education: ChemTAsk -- An Open-Source Paradigm for Automated Q&A in the Graduate Classroom
Abstract:
Large language models (LLMs) show promise for aiding graduate level education, but are limited by their training data and potential confabulations. We developed ChemTAsk, an open-source pipeline that combines LLMs with retrieval-augmented generation (RAG) to provide accurate, context-specific assistance. ChemTAsk utilizes course materials, including lecture transcripts and primary publications, to generate accurate responses to student queries. Over nine weeks in an advanced biological chemistry course at the University of Pennsylvania, students could opt in to use ChemTAsk for assistance in any assignment or to understand class material. Comparative analysis showed ChemTAsk performed on par with human teaching assistants (TAs) in understanding student queries and providing accurate information, particularly excelling in creative problem-solving tasks. In contrast, TAs were more precise in their responses and tailored their assistance to the specifics of the class. Student feedback indicated that ChemTAsk was perceived as correct, helpful, and faster than TAs. Open-source and proprietary models from Meta and OpenAI respectively were tested on an original biological chemistry benchmark for future iterations of ChemTAsk. It was found that OpenAI models were more tolerant to deviations in the input prompt and excelled in self-assessment to safeguard for potential confabulations. Taken together, ChemTAsk demonstrates the potential of integrating LLMs with RAG to enhance educational support, offering a scalable tool for students and educators.

Authors:Yuhao Sun, Albert Tenesa, John Vines
Title: Human-Precision Medicine Interaction: Public Perceptions of Polygenic Risk Score for Genetic Health Prediction
Abstract:
Precision Medicine (PM) transforms the traditional "one-drug-fits-all" paradigm by customising treatments based on individual characteristics, and is an emerging topic for HCI research on digital health. A key element of PM, the Polygenic Risk Score (PRS), uses genetic data to predict an individual's disease risk. Despite its potential, PRS faces barriers to adoption, such as data inclusivity, psychological impact, and public trust. We conducted a mixed-methods study to explore how people perceive PRS, formed of surveys (n=254) and interviews (n=11) with UK-based participants. The interviews were supplemented by interactive storyboards with the ContraVision technique to provoke deeper reflection and discussion. We identified ten key barriers and five themes to PRS adoption and proposed design implications for a responsible PRS framework. To address the complexities of PRS and enhance broader PM practices, we introduce the term Human-Precision Medicine Interaction (HPMI), which integrates, adapts, and extends HCI approaches to better meet these challenges.

Authors:Debora Firmino de Souza, Sonia Sousa, Kadri Kristjuhan-Ling, Olga Dunajeva, Mare Roosileht, Avar Pentel, Mati Mõttus, Mustafa Can Özdemir, Žanna Gratšjova
Title: Trust and Trustworthiness from Human-Centered Perspective in HRI -- A Systematic Literature Review
Abstract:
The Industry 5.0 transition highlights EU efforts to design intelligent devices that can work alongside humans to enhance human capabilities, and such vision aligns with user preferences and needs to feel safe while collaborating with such systems take priority. This demands a human-centric research vision and requires a societal and educational shift in how we perceive technological advancements. To better understand this perspective, we conducted a systematic literature review focusing on understanding how trust and trustworthiness can be key aspects of supporting this move towards Industry 5.0. This review aims to overview the most common methodologies and measurements and collect insights about barriers and facilitators for fostering trustworthy HRI. After a rigorous quality assessment following the Systematic Reviews and Meta-Analyses guidelines, using rigorous inclusion criteria and screening by at least two reviewers, 34 articles were included in the review. The findings underscores the significance of trust and safety as foundational elements for promoting secure and trustworthy human-machine cooperation. Confirm that almost 30% of the revised articles do not present a definition of trust, which can be problematic as this lack of conceptual clarity can undermine research efforts in addressing this problem from a central perspective. It highlights that the choice of domain and area of application should influence the choice of methods and approaches to fostering trust in HRI, as those choices can significantly affect user preferences and their perceptions and assessment of robot capabilities. Additionally, this lack of conceptual clarity can be a potential barrier to fostering trust in HRI and explains the sometimes contradictory findings or choice of methods and instruments used to investigate trust in robots and other autonomous systems in the literature.

Authors:Balint Gyevnar, Mark Towers
Title: Objective Metrics for Human-Subjects Evaluation in Explainable Reinforcement Learning
Abstract:
Explanation is a fundamentally human process. Understanding the goal and audience of the explanation is vital, yet existing work on explainable reinforcement learning (XRL) routinely does not consult humans in their evaluations. Even when they do, they routinely resort to subjective metrics, such as confidence or understanding, that can only inform researchers of users' opinions, not their practical effectiveness for a given problem. This paper calls on researchers to use objective human metrics for explanation evaluations based on observable and actionable behaviour to build more reproducible, comparable, and epistemically grounded research. To this end, we curate, describe, and compare several objective evaluation methodologies for applying explanations to debugging agent behaviour and supporting human-agent teaming, illustrating our proposed methods using a novel grid-based environment. We discuss how subjective and objective metrics complement each other to provide holistic validation and how future work needs to utilise standardised benchmarks for testing to enable greater comparisons between research.

Authors:Michael T. Knierim, Fabio Stano, Fabio Kurz, Antonius Heusch, Max L. Wilson
Title: Exploring Flow in Real-World Knowledge Work Using Discreet cEEGrid Sensors
Abstract:
Flow, a state of deep task engagement, is associated with optimal experience and well-being, making its detection a prolific HCI research focus. While physiological sensors show promise for flow detection, most studies are lab-based. Furthermore, brain sensing during natural work remains unexplored due to the intrusive nature of traditional EEG setups. This study addresses this gap by using wearable, around-the-ear EEG sensors to observe flow during natural knowledge work, measuring EEG throughout an entire day. In a semi-controlled field experiment, participants engaged in academic writing or programming, with their natural flow experiences compared to those from a classic lab paradigm. Our results show that natural work tasks elicit more intense flow than artificial tasks, albeit with smaller experience contrasts. EEG results show a well-known quadratic relationship between theta power and flow across tasks, and a novel quadratic relationship between beta asymmetry and flow during complex, real-world tasks.

Authors:Xingyi Wang, Xiaozheng Wang, Sunyup Park, Yaxing Yao
Title: Users' Mental Models of Generative AI Chatbot Ecosystems
Abstract:
The capability of GenAI-based chatbots, such as ChatGPT and Gemini, has expanded quickly in recent years, turning them into GenAI Chatbot Ecosystems. Yet, users' understanding of how such ecosystems work remains unknown. In this paper, we investigate users' mental models of how GenAI Chatbot Ecosystems work. This is an important question because users' mental models guide their behaviors, including making decisions that impact their privacy. Through 21 semi-structured interviews, we uncovered users' four mental models towards first-party (e.g., Google Gemini) and third-party (e.g., ChatGPT) GenAI Chatbot Ecosystems. These mental models centered around the role of the chatbot in the entire ecosystem. We further found that participants held a more consistent and simpler mental model towards third-party ecosystems than the first-party ones, resulting in higher trust and fewer concerns towards the third-party ecosystems. We discuss the design and policy implications based on our results.

Authors:Victor Hoffmann, Federico Paredes-Valles, Valentina Cavinato
Title: From Soft Materials to Controllers with NeuroTouch: A Neuromorphic Tactile Sensor for Real-Time Gesture Recognition
Abstract:
This work presents NeuroTouch, an optical-based tactile sensor that combines a highly deformable dome-shaped soft material with an integrated neuromorphic camera, leveraging frame-based and dynamic vision for gesture detection. Our approach transforms an elastic body into a rich and nuanced interactive controller by tracking markers printed on its surface with event-based methods and harnessing their trajectories through RANSAC-based techniques. To benchmark our framework, we have created a 25 min gesture dataset, which we make publicly available to foster research in this area. Achieving over 91% accuracy in gesture classification, a 3.41 mm finger localization distance error, and a 0.96 mm gesture intensity error, our real-time, lightweight, and low-latency pipeline holds promise for applications in video games, augmented/virtual reality, and accessible devices. This research lays the groundwork for advancements in gesture detection for vision-based soft-material input technologies. Dataset: Coming Soon, Video: Coming Soon

Authors:Jie Lu, Matthew Schmidt, Jinnie Shin
Title: Beyond Technological Usability: Exploratory Factor Analysis of the Comprehensive Assessment of Usability Scale for Learning Technologies (CAUSLT)
Abstract:
Traditionally rooted in the domain of Human-Computer Interaction (HCI), usability has been primarily associated with the technological performance of a system's user interface. However, as learning technologies continue to advance, a pressing need exists to evaluate these tools from a broader perspective, encompassing not just technological but also pedagogical and sociocultural dimensions. The current paper delves into the multifaceted nature of usability in the context of Learning Design and Technology (LDT). We identified prevailing gaps in current usability research practices within LDT, notably the over-reliance on HCI-derived instruments that may not holistically capture the unique usability demands of learning technologies. To address these challenges, we embarked on the development and analysis of the Comprehensive Assessment of Usability Scale for Learning Technologies (CAUSLT). A total of 155 responses were collected and analyzed. Utilizing exploratory factor analysis, this study aimed to explore core constructs for the development of CAUSLT. Our findings underscore the importance and the critical need for a comprehensive usability evaluation framework tailored for learning technologies, setting the stage for more effective and user-centric educational tools.

Authors:Lehao Lin, Ke Wang, Maha Abdallah, Wei Cai
Title: BounTCHA: A CAPTCHA Utilizing Boundary Identification in Guided Generative AI-extended Videos
Abstract:
In recent years, the rapid development of artificial intelligence (AI) especially multi-modal Large Language Models (MLLMs), has enabled it to understand text, images, videos, and other multimedia data, allowing AI systems to execute various tasks based on human-provided prompts. However, AI-powered bots have increasingly been able to bypass most existing CAPTCHA systems, posing significant security threats to web applications. This makes the design of new CAPTCHA mechanisms an urgent priority. We observe that humans are highly sensitive to shifts and abrupt changes in videos, while current AI systems still struggle to comprehend and respond to such situations effectively. Based on this observation, we design and implement BounTCHA, a CAPTCHA mechanism that leverages human perception of boundaries in video transitions and disruptions. By utilizing generative AI's capability to extend original videos with prompts, we introduce unexpected twists and changes to create a pipeline for generating guided short videos for CAPTCHA purposes. We develop a prototype and conduct experiments to collect data on humans' time biases in boundary identification. This data serves as a basis for distinguishing between human users and bots. Additionally, we perform a detailed security analysis of BounTCHA, demonstrating its resilience against various types of attacks. We hope that BounTCHA will act as a robust defense, safeguarding millions of web applications in the AI-driven era.

Authors:Manuela Petrescu, Tudor Dan Mihoc
Title: Massive Online Course on Entrepreneurship. Case Study
Abstract:
Entrepreneurship is a key component of society, and universities and major political structures have tried to support its development in recent years. The present study aims to check the perception of students (based on gender) about entrepreneurial intentions after participating in a course that had a large number of undergraduate students. There were 970 students enrolled from different faculties with various specializations. We conducted a gender-based survey on the unconventional entrepreneurial fundamentals course, where each course was delivered by a different speaker. We also compared the responses provided by computer science students with the overall responses to find differences in their perceptions related to the feasibility of teaching entrepreneurship online, determining the entrepreneurial intention of the students taking this course, and analyzing the perceptions related to the business environment and the ease of starting a business. We found that students, regardless of gender or field of study, prefer interactive online presentations based on the manner in which lectures on this subject were conducted.

Authors:Myra Cheng, Angela Y. Lee, Kristina Rapuano, Kate Niederhoffer, Alex Liebscher, Jeffrey Hancock
Title: From tools to thieves: Measuring and understanding public perceptions of AI through crowdsourced metaphors
Abstract:
How has the public responded to the increasing prevalence of artificial intelligence (AI)-based technologies? We investigate public perceptions of AI by collecting over 12,000 responses over 12 months from a nationally representative U.S. sample. Participants provided open-ended metaphors reflecting their mental models of AI, a methodology that overcomes the limitations of traditional self-reported measures by capturing more nuance. Using a mixed-methods approach combining quantitative clustering and qualitative coding, we identify 20 dominant metaphors shaping public understanding of AI. To analyze these metaphors systematically, we present a scalable framework integrating language modeling (LM)-based techniques to measure key dimensions of public perception: anthropomorphism (attribution of human-like qualities), warmth, and competence. We find that Americans generally view AI as warm and competent, and that over the past year, perceptions of AI's human-likeness and warmth have significantly increased ($+34\%, r = 0.80, p < 0.01; +41\%, r = 0.62, p < 0.05$). These implicit perceptions, along with the identified dominant metaphors, strongly predict trust in and willingness to adopt AI ($r^2 = 0.21, 0.18, p < 0.001$). Moreover, we uncover systematic demographic differences in metaphors and implicit perceptions, such as the higher propensity of women, older individuals, and people of color to anthropomorphize AI, which shed light on demographic disparities in trust and adoption. In addition to our dataset and framework for tracking evolving public attitudes, we provide actionable insights on using metaphors for inclusive and responsible AI development.

Authors:Manas Mhasakar, Rachel Baker-Ramos, Ben Carter, Evyn-Bree Helekahi-Kaiwi, Josiah Hester
Title: "I Would Never Trust Anything Western": Kumu (Educator) Perspectives on Use of LLMs for Culturally Revitalizing CS Education in Hawaiian Schools
Abstract:
As large language models (LLMs) become increasingly integrated into educational technology, their potential to assist in developing curricula has gained interest among educators. Despite this growing attention, their applicability in culturally responsive Indigenous educational settings like Hawai`i's public schools and Kaiapuni (immersion language) programs, remains understudied. Additionally, `Olelo Hawai`i, the Hawaiian language, as a low-resource language, poses unique challenges and concerns about cultural sensitivity and the reliability of generated content. Through surveys and interviews with kumu (educators), this study explores the perceived benefits and limitations of using LLMs for culturally revitalizing computer science (CS) education in Hawaiian public schools with Kaiapuni programs. Our findings highlight AI's time-saving advantages while exposing challenges such as cultural misalignment and reliability concerns. We conclude with design recommendations for future AI tools to better align with Hawaiian cultural values and pedagogical practices, towards the broader goal of trustworthy, effective, and culturally grounded AI technologies.

Authors:Cheng Guo, Kelly Caine
Title: Throwaway Accounts and Moderation on Reddit
Abstract:
Social media platforms (SMPs) facilitate information sharing across varying levels of sensitivity. A crucial design decision for SMP administrators is the platform's identity policy, with some opting for real-name systems while others allow anonymous participation. Content moderation on these platforms is conducted by both humans and automated bots. This paper examines the relationship between anonymity, specifically through the use of ``throwaway'' accounts, and the extent and nature of content moderation on Reddit. Our findings indicate that content originating from anonymous throwaway accounts is more likely to violate rules on Reddit. Thus, they are more likely to be removed by moderation than standard pseudonymous accounts. However, the moderation actions applied to throwaway accounts are consistent with those applied to ordinary accounts, suggesting that the use of anonymous accounts does not necessarily necessitate increased human moderation. We conclude by discussing the implications of these findings for identity policies and content moderation strategies on SMPs.

Authors:Juhee Kim, Chunghu Mok, Jisun Lee, Hyang Sook Kim, Yohan Jo
Title: Dialogue Systems for Emotional Support via Value Reinforcement
Abstract:
Emotional support dialogue systems aim to reduce help-seekers' distress and help them overcome challenges. While human values$\unicode{x2013}$core beliefs that shape an individual's priorities$\unicode{x2013}$are increasingly emphasized in contemporary psychological therapy for their role in fostering internal transformation and long-term emotional well-being, their integration into emotional support systems remains underexplored. To bridge this gap, we present a value-driven method for training emotional support dialogue systems designed to reinforce positive values in seekers. Notably, our model identifies which values to reinforce at each turn and how to do so, by leveraging online support conversations from Reddit. We evaluate the method across support skills, seekers' emotional intensity, and value reinforcement. Our method consistently outperforms various baselines, effectively exploring and eliciting values from seekers. Additionally, leveraging crowd knowledge from Reddit significantly enhances its effectiveness. Therapists highlighted its ability to validate seekers' challenges and emphasize positive aspects of their situations$\unicode{x2013}$both crucial elements of value reinforcement. Our work, being the first to integrate value reinforcement into emotional support systems, demonstrates its promise and establishes a foundation for future research.

Authors:Michael Robert Haupt, Meng Zhen Larsen, Michelle Strayer, Luning Yang, Tim K. Mackey
Title: Examining Online Social Support for Countering QAnon Conspiracies
Abstract:
As radical messaging has proliferated on social networking sites, platforms like Reddit have been used to host support groups, including support communities for the families and friends of radicalized individuals. This study examines the subreddit r/QAnonCasualties, an online forum for users whose loved ones have been radicalized by QAnon. We collected 1,665 posts and 78,171 comments posted between 7/2021 and 7/2022 and content coded top posts for prominent themes. Sentiment analysis was also conducted on all posts. We find venting, advice and validation-seeking, and pressure to refuse the COVID-19 vaccine were prominent themes. 40% (n=167) of coded posts identified the Q relation(s) of users as their parent(s) and 16.3% (n=68) as their partner. Posts with higher proportions of words related to swearing, social referents, and physical needs were positively correlated with engagement. These findings show ways that communities around QAnon adherents leverage anonymous online spaces to seek and provide social support.

Authors:Kelly B. Wagman, Matthew T. Dearing, Marshini Chetty
Title: Generative AI Uses and Risks for Knowledge Workers in a Science Organization
Abstract:
Generative AI could enhance scientific discovery by supporting knowledge workers in science organizations. However, the real-world applications and perceived concerns of generative AI use in these organizations are uncertain. In this paper, we report on a collaborative study with a US national laboratory with employees spanning Science and Operations about their use of generative AI tools. We surveyed 66 employees, interviewed a subset (N=22), and measured early adoption of an internal generative AI interface called Argo lab-wide. We have four findings: (1) Argo usage data shows small but increasing use by Science and Operations employees; Common current and envisioned use cases for generative AI in this context conceptually fall into either a (2) copilot or (3) workflow agent modality; and (4) Concerns include sensitive data security, academic publishing, and job impacts. Based on our findings, we make recommendations for generative AI use in science and other organizations.

Authors:Ashita Batra, Mannas Narang, Neeraj Kumar Sharma, Pradip K Das
Title: Boli: A dataset for understanding stuttering experience and analyzing stuttered speech
Abstract:
There is a growing need for diverse, high-quality stuttered speech data, particularly in the context of Indian languages. This paper introduces Project Boli, a multi-lingual stuttered speech dataset designed to advance scientific understanding and technology development for individuals who stutter, particularly in India. The dataset constitutes (a) anonymized metadata (gender, age, country, mother tongue) and responses to a questionnaire about how stuttering affects their daily lives, (b) captures both read speech (using the Rainbow Passage) and spontaneous speech (through image description tasks) for each participant and (c) includes detailed annotations of five stutter types: blocks, prolongations, interjections, sound repetitions and word repetitions. We present a comprehensive analysis of the dataset, including the data collection procedure, experience summarization of people who stutter, severity assessment of stuttering events and technical validation of the collected data. The dataset is released as an open access to further speech technology development.

Authors:Chathurika S. Silva, Prasad Wimalaratne
Title: Navigation Framework for Blind and Visually Impaired Persons based on Sensor Fusion
Abstract:
Individuals who are differently-able in vision cannot proceed with their day-to-day activities as smoothly as other people do. Especially independent walking is a hard target to achieve with their visual impairment. Assistive electronic travel aids equipped with different types of sensors are designed for visually impaired persons to assist their safe navigation. The amount of research on combining multiple sensors in assistive navigation aids for visually impaired navigation is limited. Most work is targeted at sensor integration but not at sensor fusion. This paper aims to address how sensor fusion and integration will be used to improve the sub-processes of visually impaired navigation and the way to evaluate the sensor fusion-based approach for visually impaired navigation which consists of several contributions to field sensor fusion in visually impaired navigation such as a novel homogeneous sensor fusion algorithm based on extended Kalman filter, a novel heterogeneous sensor integration approach, and a complementary sensor fusion algorithm based on error state extended Kaman filter. Overall this research presents a novel navigational framework to integrate obstacle detection, obstacle recognition, localization, motion planning, and current context awareness with sensor fusion.

Authors:Jiaqi Zhu, Andras Molnar
Title: Blissful (A)Ignorance: People form overly positive impressions of others based on their written messages, despite wide-scale adoption of Generative AI
Abstract:
As the use of Generative AI (GenAI) tools becomes more prevalent in interpersonal communication, understanding their impact on social perceptions is crucial. According to signaling theory, GenAI may undermine the credibility of social signals conveyed in writing, since it reduces the cost of writing and makes it hard to verify the authenticity of messages. Using a pre-registered large-scale online experiment (N = 647; Prolific), featuring scenarios in a range of communication contexts (personal vs. professional; close others vs. strangers), we explored how senders' use of GenAI influenced recipients' impressions of senders, both when GenAI use was known or uncertain. Consistent with past work, we found strong negative effects on social impressions when disclosing that a message was AI-generated, compared to when the same message was human-written. However, under the more realistic condition when potential GenAI use was not explicitly highlighted, recipients did not exhibit any skepticism towards senders, and these "uninformed" impressions were virtually indistinguishable from those of fully human-written messages. Even when we highlighted the potential (but uncertain) use of GenAI, recipients formed overly positive impressions. These results are especially striking given that 46% of our sample admitted having used such tools for writing messages, just within the past two weeks. Our findings put past work in a new light: While social judgments can be substantially affected when GenAI use is explicitly disclosed, this information may not be readily available in more realistic communication settings, making recipients blissfully ignorant about others' potential use of GenAI.

Authors:Yuqian Sun, Stefano Gualeni
Title: Between Puppet and Actor: Reframing Authorship in this Age of AI Agents
Abstract:
This chapter examines the conceptual tensions in understanding artificial intelligence (AI) agents' role in creative processes, particularly focusing on Large Language Models (LLMs). Building upon Schmidt's 1954 categorization of human-technology relationships and the classical definition of "author," this chapter proposes to understand AI agency as existing somewhere between that of an inanimate puppet and a performing actor. While AI agents demonstrate a degree of creative autonomy, including the ability to improvise and construct complex narrative content in interactive storytelling, they cannot be considered authors in the classical sense of the term. This chapter thus suggests that AI agents exist in a dynamic state between human-controlled puppets and semi-autonomous actors. This conceptual positioning reflects how AI agents, while they can certainly contribute to creative work, remain bound to human direction. We also argue that existing conceptual frames concerning authorship should evolve and adapt to capture these new relationships.

Authors:Harshita Chopra, Chirag Shah
Title: Feedback-Aware Monte Carlo Tree Search for Efficient Information Seeking in Goal-Oriented Conversations
Abstract:
Effective decision-making and problem-solving in conversational systems require the ability to identify and acquire missing information through targeted questioning. A key challenge lies in efficiently narrowing down a large space of possible outcomes by posing questions that minimize uncertainty. To address this, we introduce a novel framework that leverages Large Language Models (LLMs) to generate information-seeking questions, with Monte Carlo Tree Search (MCTS) to strategically select questions that maximize information gain, as a part of inference-time planning. Our primary contribution includes a hierarchical feedback mechanism that exploits past interaction patterns to guide future strategy. Specifically, each new problem is mapped to a cluster based on semantic similarity, and our UCT (Upper Confidence bound for Trees) formulation employs a cluster-specific bonus reward to prioritize successful question trajectories that have proven effective for similar problems in the past. Extensive empirical evaluation across medical diagnosis and technical troubleshooting domains shows that our method achieves an average of 12% improvement in success rates and about 10x reduction in the number of LLM calls made for planning per conversation, compared to the state of the art. An additional 8% gain in success rate is observed on average when we start with a constrained set of possibilities. Our results underscore the efficacy of feedback-aware MCTS in enhancing information-seeking in goal-oriented dialogues.

Authors:Eduardo Puerta, Shani Spivak, Michael Correll
Title: The Many Tendrils of the Octopus Map
Abstract:
Conspiratorial thinking can connect many distinct or distant ills to a central cause. This belief has visual form in the octopus map: a map where a central force (for instance a nation, an ideology, or an ethnicity) is depicted as a literal or figurative octopus, with extending tendrils. In this paper, we explore how octopus maps function as visual arguments through an analysis of historical examples as well as a through a crowd-sourced study on how the underlying data and the use of visual metaphors contribute to specific negative or conspiratorial interpretations. We find that many features of the data or visual style can lead to "octopus-like" thinking in visualizations, even without the use of an explicit octopus motif. We conclude with a call for a deeper analysis of visual rhetoric, and an acknowledgment of the potential for the design of data visualizations to contribute to harmful or conspiratorial thinking.

Authors:Avinash Agarwal, Manisha J Nene
Title: Advancing Trustworthy AI for Sustainable Development: Recommendations for Standardising AI Incident Reporting
Abstract:
The increasing use of AI technologies has led to increasing AI incidents, posing risks and causing harm to individuals, organizations, and society. This study recognizes and addresses the lack of standardized protocols for reliably and comprehensively gathering such incident data crucial for preventing future incidents and developing mitigating strategies. Specifically, this study analyses existing open-access AI-incident databases through a systematic methodology and identifies nine gaps in current AI incident reporting practices. Further, it proposes nine actionable recommendations to enhance standardization efforts to address these gaps. Ensuring the trustworthiness of enabling technologies such as AI is necessary for sustainable digital transformation. Our research promotes the development of standards to prevent future AI incidents and promote trustworthy AI, thus facilitating achieving the UN sustainable development goals. Through international cooperation, stakeholders can unlock the transformative potential of AI, enabling a sustainable and inclusive future for all.

Authors:Zhenguang Zhong, Jia Tang
Title: Design and Implementation of a Psychiatry Resident Training System Based on Large Language Models
Abstract:
Mental disorders have become a significant global public health issue, while the shortage of psychiatrists and inefficient training systems severely hinder the accessibility of mental health services. This paper designs and implements an artificial intelligence-based training system for psychiatrists. By integrating technologies such as large language models, knowledge graphs, and expert systems, the system constructs an intelligent and standardized training platform. It includes six functional modules: case generation, consultation dialogue, examination prescription, diagnostic decision-making, integrated traditional Chinese and Western medicine prescription, and expert evaluation, providing comprehensive support from clinical skill training to professional level assessment.The system adopts a B/S architecture, developed using the Vue.js and Node.js technology stack, and innovatively applies deep learning algorithms for case generation and doctor-patient dialogue. In a clinical trial involving 60 psychiatrists at different levels, the system demonstrated excellent performance and training outcomes: system stability reached 99.95%, AI dialogue accuracy achieved 96.5%, diagnostic accuracy reached 92.5%, and user satisfaction scored 92.3%. Experimental data showed that doctors using the system improved their knowledge mastery, clinical thinking, and diagnostic skills by 35.6%, 28.4%, and 23.7%, respectively.The research results provide an innovative solution for improving the efficiency of psychiatrist training and hold significant importance for promoting the standardization and scalability of mental health professional development.

Authors:Im Eunyoung, Kang Sunghoon, Kim Hyeoneui
Title: Development of a Validation and Inspection Tool for Armband-based Lifelog Data (VITAL) to Facilitate the Clinical Use of Wearable Data: A Prototype and Usability Evaluation
Abstract:
Background: The rise of mobile technology and health apps has increased the use of person-generated health data (PGHD). PGHD holds significant potential for clinical decision-making but remains challenging to manage. Objective: This study aimed to enhance the clinical utilization of wearable health data by developing the Validation and Inspection Tool for Armband-Based Lifelog Data (VITAL), a pipeline for data integration, visualization, and quality management, and evaluating its usability. Methods: The study followed a structured process of requirement gathering, tool implementation, and usability evaluation. Requirements were identified through input from four clinicians. Wearable health data from Samsung, Apple, Fitbit, and Xiaomi devices were integrated into a standardized dataframe at 10-minute intervals, focusing on biometrics, activity, and sleep. Features of VITAL support data integration, visualization, and quality management. Usability evaluation involved seven clinicians performing tasks, completing the Unified Theory of Acceptance and Use of Technology (UTAUT) survey, and participating in interviews to identify usability issues. Results: VITAL successfully integrated wearable data, thus enabling all participants to complete tasks with minimal errors without prior participant training. UTAUT survey results were positive, with average scores of 4.2 for performance expectancy, 3.96 for effort expectancy, and 4.14 for intention to use, indicating high user satisfaction and intent to adopt the tool. Conclusions: By enhancing wearable data integration, visualization, and quality management, the VITAL prototype shows significant potential for clinical application. Positive feedback highlights its promise, while emphasizing the need for further studies to confirm its real-world effectiveness.

Authors:Daniela Napoli, Heather Molyneaux, Helene Fournier, Sonia Chiasson
Title: Exploring User Perspectives on Data Collection, Data Sharing Preferences, and Privacy Concerns with Remote Healthcare Technology
Abstract:
Remote healthcare technology can help tackle societal issues by improving access to quality healthcare services and enhancing diagnoses through in-place monitoring. These services can be implemented through a combination of mobile devices, applications, wearable sensors, and other smart technology. It is paramount to handle sensitive data that is collected in ways that meet users' privacy expectations. We surveyed 384 people in Canada aged 20 to 93 years old to explore participants' comfort with data collection, sharing preferences, and potential privacy concerns related to remote healthcare technology. We explore these topics within the context of various healthcare scenarios including health emergencies and managing chronic health conditions.

Authors:Xiang 'Anthony' Chen, Tiffany Knearem, Yang Li
Title: The GenUI Study: Exploring the Design of Generative UI Tools to Support UX Practitioners and Beyond
Abstract:
AI can now generate high-fidelity UI mock-up screens from a high-level textual description, promising to support UX practitioners' work. However, it remains unclear how UX practitioners would adopt such Generative UI (GenUI) models in a way that is integral and beneficial to their work. To answer this question, we conducted a formative study with 37 UX-related professionals that consisted of four roles: UX designers, UX researchers, software engineers, and product managers. Using a state-of-the-art GenUI tool, each participant went through a week-long, individual mini-project exercise with role-specific tasks, keeping a daily journal of their usage and experiences with GenUI, followed by a semi-structured interview. We report findings on participants' workflow using the GenUI tool, how GenUI can support all and each specific roles, and existing gaps between GenUI and users' needs and expectations, which lead to design implications to inform future work on GenUI development.

Authors:Bran Knowles, Jasmine Fledderjohann, Aneesha Singh, Richard Harper, Julia McDowell, Judith Tsouvalis, Alice Ashcroft, Yvonne Rogers, Ewan Soubutts, Andrew Steptoe, Caroline Swarbrick
Title: Not Just a Number: A Multidimensional Approach to Ageing in HCI
Abstract:
The focus on managing problems that can arise for older adults has meant that extant HCI and Ageing research has not given the concepts of 'age' and 'ageing' the explicit theoretical attention they deserve. Attending to this gap, we critically examine a ten-year corpus of CHI publications through the lens of an existing typology which we have further developed to analyse how age is understood, interpreted and constructed in the field of HCI. Our resulting multidimensional typology of age in HCI elucidates the distinctive characteristics of older adults considered when designing with and for this user group, but also highlights the need for a more critical, reflexive, social constructivist approach to age in HCI. Applying this approach, we explore age as a multidimensional system of stratification to better understand the phenomenon of the age-based digital divide.

Authors:Yunshu Liu, Lingjie Duan
Title: Mechanism Design for Blockchain Order Books against Selfish Miners
Abstract:
In blockchain-based order book systems, buyers and sellers trade assets, while it is miners to match them and include their transactions in the blockchain. It is found that many miners behave selfishly and myopically, prioritizing transactions with high fees and ignoring many desirable matches that could enhance social welfare. Existing blockchain mechanisms fail to address this issue by overlooking miners' selfish behaviors. To our best knowledge, this work presents the first analytical study to quantify and understand buyer and seller transaction fee choices and selfish miners' transaction matching strategies, proving an infinitely large price of anarchy (PoA) for social welfare loss. To mitigate this, we propose an adjustable block size mechanism that is easy to implement without altering the existing decentralized protocols and still allows buyers and sellers to freely decide transaction fees and miners to selfishly match. The analysis is challenging, as pure strategy Nash equilibria do not always exist, requiring the analysis of many buyers' or sellers' interactive mixed-strategy distributions. Moreover, the system designer may even lack information about each buyer's or seller's bid/ask prices and trading quantities. Nevertheless, our mechanism achieves a well-bounded PoA, and under the homogeneous-quantity trading for non-fungible tokens (NFT), it attains a PoA of 1 with no social welfare loss. We implement our mechanism on a local instance of Ethereum to demonstrate the feasibility of our approach. Experiments based on the realistic dataset demonstrate that our mechanism achieves social optimum for homogeneous-quantity trading like NFT. It can enhance social welfare up to 3.7 times compared to the existing order book benchmarks for heterogeneous-quantity trading of Bitcoin tokens. It exhibits robustness against random variations in buyers and sellers.

Authors:Rock Yuren Pang, Hope Schroeder, Kynnedy Simone Smith, Solon Barocas, Ziang Xiao, Emily Tseng, Danielle Bragg
Title: Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review
Abstract:
Large language models (LLMs) have been positioned to revolutionize HCI, by reshaping not only the interfaces, design patterns, and sociotechnical systems that we study, but also the research practices we use. To-date, however, there has been little understanding of LLMs' uptake in HCI. We address this gap via a systematic literature review of 153 CHI papers from 2020-24 that engage with LLMs. We taxonomize: (1) domains where LLMs are applied; (2) roles of LLMs in HCI projects; (3) contribution types; and (4) acknowledged limitations and risks. We find LLM work in 10 diverse domains, primarily via empirical and artifact contributions. Authors use LLMs in five distinct roles, including as research tools or simulated users. Still, authors often raise validity and reproducibility concerns, and overwhelmingly study closed models. We outline opportunities to improve HCI research with and on LLMs, and provide guiding questions for researchers to consider the validity and appropriateness of LLM-related work.

Authors:Daeun Jeong, Sungbok Shin, Jongwook Jeong
Title: Conversation Progress Guide : UI System for Enhancing Self-Efficacy in Conversational AI
Abstract:
In this study, we introduce the Conversation Progress Guide (CPG), a system designed for text-based conversational AI interactions that provides a visual interface to represent progress. Users often encounter failures when interacting with conversational AI, which can negatively affect their self-efficacy-an individual's belief in their capabilities, reducing their willingness to engage with these services. The CPG offers visual feedback on task progress, providing users with mastery experiences, a key source of self-efficacy. To evaluate the system's effectiveness, we conducted a user study assessing how the integration of the CPG influences user engagement and self-efficacy. Results demonstrate that users interacting with a conversational AI enhanced by the CPG showed significant improvements in self-efficacy measures compared to those using a conventional conversational AI.

Authors:Noah L. Schroeder, Chris Davis Jaldi, Shan Zhang
Title: Large Language Models with Human-In-The-Loop Validation for Systematic Review Data Extraction
Abstract:
Systematic reviews are time-consuming endeavors. Historically speaking, knowledgeable humans have had to screen and extract data from studies before it can be analyzed. However, large language models (LLMs) hold promise to greatly accelerate this process. After a pilot study which showed great promise, we investigated the use of freely available LLMs for extracting data for systematic reviews. Using three different LLMs, we extracted 24 types of data, 9 explicitly stated variables and 15 derived categorical variables, from 112 studies that were included in a published scoping review. Overall we found that Gemini 1.5 Flash, Gemini 1.5 Pro, and Mistral Large 2 performed reasonably well, with 71.17%, 72.14%, and 62.43% of data extracted being consistent with human coding, respectively. While promising, these results highlight the dire need for a human-in-the-loop (HIL) process for AI-assisted data extraction. As a result, we present a free, open-source program we developed (AIDE) to facilitate user-friendly, HIL data extraction with LLMs.

Authors:Boran Zhang, Muhan Xu, Zhijun Pan
Title: Human-AI Collaborative Game Testing with Vision Language Models
Abstract:
As modern video games become increasingly complex, traditional manual testing methods are proving costly and inefficient, limiting the ability to ensure high-quality game experiences. While advancements in Artificial Intelligence (AI) offer the potential to assist human testers, the effectiveness of AI in truly enhancing real-world human performance remains underexplored. This study investigates how AI can improve game testing by developing and experimenting with an AI-assisted workflow that leverages state-of-the-art machine learning models for defect detection. Through an experiment involving 800 test cases and 276 participants of varying backgrounds, we evaluate the effectiveness of AI assistance under four conditions: with or without AI support, and with or without detailed knowledge of defects and design documentation. The results indicate that AI assistance significantly improves defect identification performance, particularly when paired with detailed knowledge. However, challenges arise when AI errors occur, negatively impacting human decision-making. Our findings show the importance of optimizing human-AI collaboration and implementing strategies to mitigate the effects of AI inaccuracies. By this research, we demonstrate AI's potential and problems in enhancing efficiency and accuracy in game testing workflows and offers practical insights for integrating AI into the testing process.

Authors:Shutong Zhang, Tianyu Zhang, Jinghui Cheng, Shurui Zhou
Title: Who is to Blame: A Comprehensive Review of Challenges and Opportunities in Designer-Developer Collaboration
Abstract:
Software development relies on effective collaboration between Software Development Engineers (SDEs) and User eXperience Designers (UXDs) to create software products of high quality and usability. While this collaboration issue has been explored over the past decades, anecdotal evidence continues to indicate the existence of challenges in their collaborative efforts. To understand this gap, we first conducted a systematic literature review (SLR) of 45 papers published since 2004, uncovering three key collaboration challenges and two main categories of potential best practices. We then analyzed designer and developer forums and discussions from one open-source software repository to assess how the challenges and practices manifest in the status quo. Our findings have broad applicability for collaboration in software development, extending beyond the partnership between SDEs and UXDs. The suggested best practices and interventions also act as a reference for future research, assisting in the development of dedicated collaboration tools for SDEs and UXDs.

Authors:Zhikun Wu, Thomas Weber, Florian Müller
Title: One Does Not Simply Meme Alone: Evaluating Co-Creativity Between LLMs and Humans in the Generation of Humor
Abstract:
Collaboration has been shown to enhance creativity, leading to more innovative and effective outcomes. While previous research has explored the abilities of Large Language Models (LLMs) to serve as co-creative partners in tasks like writing poetry or creating narratives, the collaborative potential of LLMs in humor-rich and culturally nuanced domains remains an open question. To address this gap, we conducted a user study to explore the potential of LLMs in co-creating memes - a humor-driven and culturally specific form of creative expression. We conducted a user study with three groups of 50 participants each: a human-only group creating memes without AI assistance, a human-AI collaboration group interacting with a state-of-the-art LLM model, and an AI-only group where the LLM autonomously generated memes. We assessed the quality of the generated memes through crowdsourcing, with each meme rated on creativity, humor, and shareability. Our results showed that LLM assistance increased the number of ideas generated and reduced the effort participants felt. However, it did not improve the quality of the memes when humans collaborated with LLM. Interestingly, memes created entirely by AI performed better than both human-only and human-AI collaborative memes in all areas on average. However, when looking at the top-performing memes, human-created ones were better in humor, while human-AI collaborations stood out in creativity and shareability. These findings highlight the complexities of human-AI collaboration in creative tasks. While AI can boost productivity and create content that appeals to a broad audience, human creativity remains crucial for content that connects on a deeper level.

Authors:Zeinab Rajabi, Seyed Mohsen Rahnamafard
Title: A Survey on Conceptual model of Enterprise ontology
Abstract:
Enterprise ontology serves as a foundational framework for semantically comprehending the nature of organizations and the essential components that uphold their integrity. The systematic and conceptual understanding of organizations has garnered significant attention from researchers due to its pivotal role in various domains, including business modeling, enterprise architecture, business process management, context-aware systems, application development, interoperability across diverse systems and platforms, knowledge management, organizational learning and innovation, and conflict resolution within organizations. Achieving a consensus on the concepts related to the fundamental elements that constitute an organization is therefore critical. This study aims to conduct a comprehensive analysis and comparison of existing conceptual models of enterprises as documented in scholarly articles published over the past decade. We discuss the strengths and weaknesses of each model and introduce a robust framework for their evaluation. To facilitate this evaluation, we propose several pertinent criteria derived from established methodologies for assessing ontologies. Furthermore, we identify contemporary challenges and issues that have been overlooked in prior studies, offering insights and suggestions for future research directions in enterprise modeling. This article ultimately presents a roadmap for enhancing the systematic understanding of organizations through refined enterprise ontology frameworks.

Authors:Christian Rahe, Walid Maalej
Title: How Do Programming Students Use Generative AI?
Abstract:
Programming students have a widespread access to powerful Generative AI tools like ChatGPT. While this can help understand the learning material and assist with exercises, educators are voicing more and more concerns about an overreliance on generated outputs and lack of critical thinking skills. It is thus important to understand how students actually use generative AI and what impact this could have on their learning behavior. To this end, we conducted a study including an exploratory experiment with 37 programming students, giving them monitored access to ChatGPT while solving a code authoring exercise. The task was not directly solvable by ChatGPT and required code comprehension and reasoning. While only 23 of the students actually opted to use the chatbot, the majority of those eventually prompted it to simply generate a full solution. We observed two prevalent usage strategies: to seek knowledge about general concepts and to directly generate solutions. Instead of using the bot to comprehend the code and their own mistakes, students often got trapped in a vicious cycle of submitting wrong generated code and then asking the bot for a fix. Those who self-reported using generative AI regularly were more likely to prompt the bot to generate a solution. Our findings indicate that concerns about potential decrease in programmers' agency and productivity with Generative AI are justified. We discuss how researchers and educators can respond to the potential risk of students uncritically over-relying on Generative AI. We also discuss potential modifications to our study design for large-scale replications.

Authors:Anshul Goswami, Ojaswa Sharma
Title: Holoview: An Immersive Mixed-Reality Visualization System for Anatomical Education
Abstract:
We present Holoview, an augmented reality (AR) system designed to support immersive and interactive learning of human anatomy. Holoview enables users to dynamically explore volumetric anatomical data through intuitive hand gestures in a 3D AR environment, allowing inspection of individual organs and cross-sectional views via clipping and bioscope features. The system adopts a lightweight client-server architecture optimized for real-time performance on the HoloLens through hybrid and foveated rendering. Our user study demonstrated Holoview's educational effectiveness, with participants showing a 135 percent improvement in task-specific knowledge and reporting increased confidence in understanding anatomical structures. The system was perceived as engaging and intuitive, particularly for organ selection and cross-sectional exploration, with low cognitive load and increasing ease of use over time. These findings highlight Holoview's potential to enhance anatomy learning through immersive, user-centered AR experiences.

Authors:Xiaoyu Bao, Kailin Xu, Jiawei Zhu, Haiyun Huang, Kangning Li, Qiyun Huang, Yuanqing Li
Title: Alleviating Seasickness through Brain-Computer Interface-based Attention Shift
Abstract:
Seasickness poses a widespread problem that adversely impacts both passenger comfort and the operational efficiency of maritime crews. Although attention shift has been proposed as a potential method to alleviate symptoms of motion sickness, its efficacy remains to be rigorously validated, especially in maritime environments. In this study, we develop an AI-driven brain-computer interface (BCI) to realize sustained and practical attention shift by incorporating tasks such as breath counting. Forty-three participants completed a real-world nautical experiment consisting of a real-feedback session, a resting session, and a pseudo-feedback session. Notably, 81.39\% of the participants reported that the BCI intervention was effective. EEG analysis revealed that the proposed system can effectively regulate motion sickness EEG signatures, such as an decrease in total band power, along with an increase in theta relative power and a decrease in beta relative power. Furthermore, an indicator of attentional focus, the theta/beta ratio, exhibited a significant reduction during the real-feedback session, providing further evidence to support the effectiveness of the BCI in shifting attention. Collectively, this study presents a novel nonpharmacological, portable, and effective approach for seasickness intervention, which has the potential to open up a brand-new application domain for BCIs.

Authors:Nastaran Saffaryazdi, Tamil Selvan Gunasekaran, Kate Laveys, Elizabeth Broadbent, Mark Billinghurst
Title: Empathetic Conversational Agents: Utilizing Neural and Physiological Signals for Enhanced Empathetic Interactions
Abstract:
Conversational agents (CAs) are revolutionizing human-computer interaction by evolving from text-based chatbots to empathetic digital humans (DHs) capable of rich emotional expressions. This paper explores the integration of neural and physiological signals into the perception module of CAs to enhance empathetic interactions. By leveraging these cues, the study aims to detect emotions in real-time and generate empathetic responses and expressions. We conducted a user study where participants engaged in conversations with a DH about emotional topics. The DH responded and displayed expressions by mirroring detected emotions in real-time using neural and physiological cues. The results indicate that participants experienced stronger emotions and greater engagement during interactions with the Empathetic DH, demonstrating the effectiveness of incorporating neural and physiological signals for real-time emotion recognition. However, several challenges were identified, including recognition accuracy, emotional transition speeds, individual personality effects, and limitations in voice tone modulation. Addressing these challenges is crucial for further refining Empathetic DHs and fostering meaningful connections between humans and artificial entities. Overall, this research advances human-agent interaction and highlights the potential of real-time neural and physiological emotion recognition in creating empathetic DHs.

Authors:Palmira Victoria González-Erena, Sara Fernández-Guinea, Panagiotis Kourtesis
Title: Cognitive Assessment and Training in Extended Reality: Multimodal Systems, Clinical Utility, and Current Challenges
Abstract:
Extended reality (XR) technologies-encompassing virtual reality (VR), augmented reality (AR), and mixed reality (MR) are transforming cognitive assessment and training by offering immersive, interactive environments that simulate real-world tasks. XR enhances ecological validity while enabling real-time, multimodal data collection through tools such as galvanic skin response (GSR), electroencephalography (EEG), eye tracking (ET), hand tracking, and body tracking. This allows for a more comprehensive understanding of cognitive and emotional processes, as well as adaptive, personalized interventions for users. Despite these advancements, current XR applications often underutilize the full potential of multimodal integration, relying primarily on visual and auditory inputs. Challenges such as cybersickness, usability concerns, and accessibility barriers further limit the widespread adoption of XR tools in cognitive science and clinical practice. This review examines XR-based cognitive assessment and training, focusing on its advantages over traditional methods, including ecological validity, engagement, and adaptability. It also explores unresolved challenges such as system usability, cost, and the need for multimodal feedback integration. The review concludes by identifying opportunities for optimizing XR tools to improve cognitive evaluation and rehabilitation outcomes, particularly for diverse populations, including older adults and individuals with cognitive impairments.

Authors:Aishwarya Jadhav, Jeffery Cao, Abhishree Shetty, Urvashi Priyam Kumar, Aditi Sharma, Ben Sukboontip, Jayant Sravan Tamarapalli, Jingyi Zhang, Anirudh Koul
Title: AI Guide Dog: Egocentric Path Prediction on Smartphone
Abstract:
This paper presents AI Guide Dog (AIGD), a lightweight egocentric (first-person) navigation system for visually impaired users, designed for real-time deployment on smartphones. AIGD employs a vision-only multi-label classification approach to predict directional commands, ensuring safe navigation across diverse environments. We introduce a novel technique for goal-based outdoor navigation by integrating GPS signals and high-level directions, while also handling uncertain multi-path predictions for destination-free indoor navigation. As the first navigation assistance system to handle both goal-oriented and exploratory navigation across indoor and outdoor settings, AIGD establishes a new benchmark in blind navigation. We present methods, datasets, evaluations, and deployment insights to encourage further innovations in assistive navigation systems.

Authors:Ying Weng, Yiming Zhang
Title: Assessment of Personalized Learning in Immersive and Intelligent Virtual Classroom on Student Engagement
Abstract:
As trends in education evolve, personalized learning has transformed individuals' engagement with knowledge and skill development. In the digital age, state-of-the-art technologies have been increasingly integrated into classrooms to support intelligent education and foster personalized learning experiences. One promising approach is the use of eye-tracking technology to evaluate student engagement in intelligent virtual classrooms. This paper explores the assessment of personalized learning in the virtual classroom and its impact on student engagement through the eye movement paradigm. The study aims to provide insights into how personalized learning approaches can enhance student participation, motivation, and academic performance in the online learning environment. Through a comprehensive literature review, case study, and data analysis, the paper examines the key elements of personalized learning, the methods of assessment, and the resulting effects on student engagement. The findings suggest that the eye movement paradigm has the potential to assess student engagement and promote better educational outcomes.

Authors:Xiuqi Tommy Zhu, Ziyue Qiu, Ye Wei, Jianhao Wang, Yang Jiao
Title: Understanding the Practice, Perception, and Challenge of Blind or Low Vision Students Learning through Accessible Technologies in Non-Inclusive 'Blind Colleges'
Abstract:
In developing and underdeveloped regions, many 'Blind Colleges' exclusively enroll individuals with Blindness or Vision Impairment (BLV) for higher education. While advancements in accessible technologies have facilitated BLV student integration into 'Integrated Colleges,' their implementation in 'Blind Colleges' remains uneven due to complex economic, social, and policy challenges. This study investigates the practices, perceptions, and challenges of BLV students using accessible technologies in a Chinese 'Blind College' through a two-part empirical approach. Our findings demonstrate that tactile and digital technologies enhance access to education but face significant integration barriers. We emphasize the critical role of early education in addressing capability gaps, BLV students' aspirations for more inclusive educational environments, and the systemic obstacles within existing frameworks. We advocate for leveraging accessible technologies to transition 'Blind Colleges' into 'Integrated Colleges,' offering actionable insights for policymakers, designers, and educators. Finally, we outline future research directions on accessible technology innovation and its implications for BLV education in resource-constrained settings.

Authors:Awais Rashid, Sana Belguith, Matthew Bradbury, Sadie Creese, Ivan Flechais, Neeraj Suri
Title: Beyond Security-by-design: Securing a compromised system
Abstract:
Digital infrastructures are seeing convergence and connectivity at unprecedented scale. This is true for both current critical national infrastructures and emerging future systems that are highly cyber-physical in nature with complex intersections between humans and technologies, e.g., smart cities, intelligent transportation, high-value manufacturing and Industry 4.0. Diverse legacy and non-legacy software systems underpinned by heterogeneous hardware compose on-the-fly to deliver services to millions of users with varying requirements and unpredictable actions. This complexity is compounded by intricate and complicated supply-chains with many digital assets and services outsourced to third parties. The reality is that, at any particular point in time, there will be untrusted, partially-trusted or compromised elements across the infrastructure. Given this reality, and the societal scale of digital infrastructures, delivering secure and resilient operations is a major challenge. We argue that this requires us to move beyond the paradigm of security-by-design and embrace the challenge of securing-a-compromised-system.

Authors:Naser Al Madi, Brett Torra, Yixin Li, Najam Tariq
Title: Combining Automation and Expertise: A Semi-automated Approach to Correcting Eye Tracking Data in Reading Tasks
Abstract:
In reading tasks drift can move fixations from one word to another or even another line, invalidating the eye tracking recording. Manual correction is time-consuming and subjective, while automated correction is fast yet limited in accuracy. In this paper we present Fix8 (Fixate), an open-source GUI tool that offers a novel semi-automated correction approach for eye tracking data in reading tasks. The proposed approach allows the user to collaborate with an algorithm to produce accurate corrections faster without sacrificing accuracy. Through a usability study (N=14) we assess the time benefits of the proposed technique, and measure the correction accuracy in comparison to manual correction. In addition, we assess subjective workload through NASA Task Load Index, and user opinions through Likert-scale questions. Our results show that on average the proposed technique was 44% faster than manual correction without any sacrifice in accuracy. In addition, users reported a preference for the proposed technique, lower workload, and higher perceived performance compared to manual correction. Fix8 is a valuable tool that offers useful features for generating synthetic eye tracking data, visualization, filters, data converters, and eye movement analysis in addition to the main contribution in data correction.

Authors:Nicholas Davis, Janet Rafner
Title: AI Drawing Partner: Co-Creative Drawing Agent and Research Platform to Model Co-Creation
Abstract:
This paper describes the AI Drawing Partner, which is a co-creative drawing agent that also serves as a research platform to model co-creation. The AI Drawing Partner is an early example of a quantified co-creative AI system that automatically models the co-creation that happens on the system. The method the system uses to capture this data is based on a new cognitive science framework called co-creative sense-making (CCSM). The CCSM is based on the cognitive theory of enaction, which describes how meaning emerges through interaction with the environment and other people in that environment in a process of sense-making. The CCSM quantifies elements of interaction dynamics to identify sense-making patterns and interaction trends. This paper describes a new technique for modeling the interaction and collaboration dynamics of co-creative AI systems with the co-creative sense-making (CCSM) framework. A case study is conducted of ten co-creative drawing sessions between a human user and the co-creative agent. The analysis includes showing the artworks produced, the quantified data from the AI Drawing Partner, the curves describing interaction dynamics, and a visualization of interaction trend sequences. The primary contribution of this paper is presenting the AI Drawing Partner, which is a unique co-creative AI system and research platform that collaborates with the user in addition to quantifying, modeling, and visualizing the co-creative process using the CCSM framework.

Authors:Shireesh Reddy Pyreddy, Tarannum Shaila Zaman
Title: EmoXpt: Analyzing Emotional Variances in Human Comments and LLM-Generated Responses
Abstract:
The widespread adoption of generative AI has generated diverse opinions, with individuals expressing both support and criticism of its applications. This study investigates the emotional dynamics surrounding generative AI by analyzing human tweets referencing terms such as ChatGPT, OpenAI, Copilot, and LLMs. To further understand the emotional intelligence of ChatGPT, we examine its responses to selected tweets, highlighting differences in sentiment between human comments and LLM-generated responses. We introduce EmoXpt, a sentiment analysis framework designed to assess both human perspectives on generative AI and the sentiment embedded in ChatGPT's responses. Unlike prior studies that focus exclusively on human sentiment, EmoXpt uniquely evaluates the emotional expression of ChatGPT. Experimental results demonstrate that LLM-generated responses are notably more efficient, cohesive, and consistently positive than human responses.

Authors:Vikram Kamath Cannanure, Sharon Wolf, Kaja Jasińska, Timothy X Brown, Amy Ogan
Title: Applying Think-Aloud in ICTD: A Case Study of a Chatbot Use by Teachers in Rural Côte d'Ivoire
Abstract:
Think-alouds are a common HCI usability method where participants verbalize their thoughts while using interfaces. However, their utility in cross-cultural settings, particularly in the Global South, is unclear, where cultural differences impact user interactions. This paper investigates the usability challenges teachers in rural Côte d'Ivoire faced when using a chatbot designed to support an educational program. We conducted think-aloud sessions with 20 teachers two weeks after a chatbot deployment, analyzing their navigation, errors, and time spent on tasks. We discuss our approach and findings that helped us identify usability issues and challenging features for improving the chatbot designs. Our note summarizes our reflections on using think-aloud and contributes to discussions on its culturally sensitive adaptation in the Global South.

Authors:Dong Hyun Jeon, Jong Kwan Lee, Prabal Dhaubhadel, Aaron Kuhlman
Title: Visualization Tool: Exploring COVID-19 Data
Abstract:
The ability to effectively visualize data is crucial in the contemporary world where information is often voluminous and complex. Visualizations, such as charts, graphs, and maps, provide an intuitive and easily understandable means to interpret, analyze, and communicate patterns, trends, and insights hidden within large datasets. These graphical representations can help researchers, policymakers, and the public to better comprehend and respond to a multitude of issues. In this study, we explore a visualization tool to interpret and understand various data of COVID-19 pandemic. While others have shown COVID-19 visualization methods/tools, our tool provides a mean to analyze COVID-19 data in a more comprehensive way. We have used the public data from NY Times and CDC, and various COVID-19 data (e.g., core places, patterns, foot traffic) from Safegraph. Figure 1 shows the basic view of our visualization view. In addition to providing visualizations of these data, our visualization also considered the Surprising Map. The Surprising Map is a type of choropleth map that can avoid misleading of producing visual prominence to known base rates or to artifacts of sample size and normalization in visualizing the density of events in spatial data. It is based on Bayesian surprise-it creates a space of equi-plausible models and uses Bayesian updating to re-estimate their plausibility based on individual events.

Authors:Paul Warren, Paul Mulholland, Naomi Barker
Title: Music and art: a study in cross-modal interpretation
Abstract:
Our study has investigated the effect of music on the experience of viewing art, investigating the factors which create a sense of connectivity between the two forms. We worked with 138 participants, and included multiple choice and open-ended questions. For the latter, we performed both a qualitative analysis and also sentiment analysis using text-mining. We investigated the relationship between the user experience and the emotions in the artwork and music. We found that, besides emotion, theme, story, and to a lesser extent music tempo were factors which helped form connections between artwork and music. Overall, participants rated the music as being helpful in developing an appreciation of the art. We propose guidelines for using music to enhance the experience of viewing art, and we propose directions for future research.

Authors:Shahaf Donio, Eran Toch
Title: Neighborhood Disparities in Smart City Service Adoption
Abstract:
While local governments have invested heavily in smart city infrastructure, significant disparities in adopting these services remain in urban areas. The success of many user-facing smart city technologies requires understanding barriers to adoption, including persistent inequalities in urban areas. An analysis of a random sample telephone survey (n=489) in four neighborhoods of Tel Aviv merged with digital municipal services usage data found that neighborhood residency influences the reasons why residents adopt resident-facing smart city services, as well as individual-level factors. Structured Equation Modeling shows that neighborhood residency is related to digital proficiency and privacy perceptions beyond demographic factors and that those influence the adoption of smart-city services. We summarize the paper by discussing why and how place effects must be considered in further research in smart cities and the study and mitigation of digital inequality.

Authors:Terrance Yu-Hao Chen, Yulin Chen, Pontus Soederhaell, Sadrishya Agrawal, Kateryna Shapovalenko
Title: Decoding EEG Speech Perception with Transformers and VAE-based Data Augmentation
Abstract:
Decoding speech from non-invasive brain signals, such as electroencephalography (EEG), has the potential to advance brain-computer interfaces (BCIs), with applications in silent communication and assistive technologies for individuals with speech impairments. However, EEG-based speech decoding faces major challenges, such as noisy data, limited datasets, and poor performance on complex tasks like speech perception. This study attempts to address these challenges by employing variational autoencoders (VAEs) for EEG data augmentation to improve data quality and applying a state-of-the-art (SOTA) sequence-to-sequence deep learning architecture, originally successful in electromyography (EMG) tasks, to EEG-based speech decoding. Additionally, we adapt this architecture for word classification tasks. Using the Brennan dataset, which contains EEG recordings of subjects listening to narrated speech, we preprocess the data and evaluate both classification and sequence-to-sequence models for EEG-to-words/sentences tasks. Our experiments show that VAEs have the potential to reconstruct artificial EEG data for augmentation. Meanwhile, our sequence-to-sequence model achieves more promising performance in generating sentences compared to our classification model, though both remain challenging tasks. These findings lay the groundwork for future research on EEG speech perception decoding, with possible extensions to speech production tasks such as silent or imagined speech.

Authors:Philip Weber, Kevin Krings, Lukas Schröder, Lea Katharina Michel, Thomas Ludwig
Title: Rendezfood: A Design Case Study of a Conversational Location-based Approach in Restaurants
Abstract:
The restaurant industry is currently facing a challenging socio-economic situation caused by the rise of delivery services, inflation, and typically low margins. Often, technological opportunities for process optimization or customer retention are not fully utilized. In our design case study, we investigate which technologies are already being used to improve the customer experience in restaurants and explore a novel new approach to this issue. We designed, implemented, and evaluated a platform with customers and restaurateurs to increase visibility and emotional connection to nearby restaurants through their dishes. Some of our key findings include the enormous potential of combining location-based systems and conversational agents, but also the difficulties in creating content for such platforms. We contribute to the field of Human-Food Interaction by (1) identifying promising design spaces as well as customer and restaurateur requirements for technology in this domain, (2) presenting an innovative design case study to improve the user experience, and (3) exploring the broader implications of our design case study findings for approaching a real-world metaverse.

Authors:Owais Mujtaba Khanday, José L. Pérez-Córdoba, Mohd Yaqub Mir, Ashfaq Ahmad Najar, Jose A. Gonzalez-Lopez
Title: NeuroIncept Decoder for High-Fidelity Speech Reconstruction from Neural Activity
Abstract:
This paper introduces a novel algorithm designed for speech synthesis from neural activity recordings obtained using invasive electroencephalography (EEG) techniques. The proposed system offers a promising communication solution for individuals with severe speech impairments. Central to our approach is the integration of time-frequency features in the high-gamma band computed from EEG recordings with an advanced NeuroIncept Decoder architecture. This neural network architecture combines Convolutional Neural Networks (CNNs) and Gated Recurrent Units (GRUs) to reconstruct audio spectrograms from neural patterns. Our model demonstrates robust mean correlation coefficients between predicted and actual spectrograms, though inter-subject variability indicates distinct neural processing mechanisms among participants. Overall, our study highlights the potential of neural decoding techniques to restore communicative abilities in individuals with speech disorders and paves the way for future advancements in brain-computer interface technologies.

Authors:Gaoussou Youssouf Kebe, Jeffrey M. Girard, Einat Liebenthal, Justin Baker, Fernando De la Torre, Louis-Philippe Morency
Title: LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment
Abstract:
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment using the Montgomery-Asberg Depression Rating Scale (MADRS). We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews. Our approach, tested on 236 real-world interviews from the Context-Adaptive Multimodal Informatics (CAMI) dataset, demonstrates strong correlations with clinician assessments. The Qwen 2.5--72b model achieves near-human level agreement across most MADRS items, with Intraclass Correlation Coefficients (ICC) closely approaching those between human raters. We provide a comprehensive analysis of model performance across different MADRS items, highlighting strengths and current limitations. Our findings suggest that LLMs, with appropriate prompting, can serve as efficient tools for mental health assessment, potentially increasing accessibility in resource-limited settings. However, challenges remain, particularly in assessing symptoms that rely on non-verbal cues, underscoring the need for multimodal approaches in future work.

Authors:Audrey Olson, Pratyusha Maiti, Ashok Goel
Title: The Textbook of Tomorrow: Rethinking Course Material Interfacing in the Era of GPT
Abstract:
Online Learning Management Systems (LMSs), such as Blackboard and Canvas, have existed for decades. Yet, course readings, when provided at all, consistently exist as simple digital twins to their real-life counterparts. While online tools and resources exist to help students process digital texts more efficiently or in ways better suited to their learning styles, knowledge about such resources is not evenly distributed and creates a gulf in advantage between students. This paper proposes the courseware integration of "smart" textbooks, a newfound way for students to chat with their readings, receive summaries and explanations for highlighted text, and generate quiz questions via an AI agent embedded in their online course material. Future iterations of the software aim to add in-context reference highlighting for AI-generated answers and personalized tunings for the end learner.

Authors:Ammar Ahmed, Margarida Fresco, Fredrik Forsberg, Hallvard Grotli
Title: From Code to Compliance: Assessing ChatGPT's Utility in Designing an Accessible Webpage -- A Case Study
Abstract:
Web accessibility ensures that individuals with disabilities can access and interact with digital content without barriers, yet a significant majority of most used websites fail to meet accessibility standards. This study evaluates ChatGPT's (GPT-4o) ability to generate and improve web pages in line with Web Content Accessibility Guidelines (WCAG). While ChatGPT can effectively address accessibility issues when prompted, its default code often lacks compliance, reflecting limitations in its training data and prevailing inaccessible web practices. Automated and manual testing revealed strengths in resolving simple issues but challenges with complex tasks, requiring human oversight and additional iterations. Unlike prior studies, we incorporate manual evaluation, dynamic elements, and use the visual reasoning capability of ChatGPT along with the prompts to fix accessibility issues. Providing screenshots alongside prompts enhances the LLM's ability to address accessibility issues by allowing it to analyze surrounding components, such as determining appropriate contrast colors. We found that effective prompt engineering, such as providing concise, structured feedback and incorporating visual aids, significantly enhances ChatGPT's performance. These findings highlight the potential and limitations of large language models for accessible web development, offering practical guidance for developers to create more inclusive websites.

Authors:Dora Medgyesy, Joella Galas, Julian van Pol, Rustam Eynaliyev, Thijs Vollebregt
Title: Existential Crisis: A Social Robot's Reason for Being
Abstract:
As Robots become ever more important in our daily lives there's growing need for understanding how they're perceived by people. This study aims to investigate how the user perception of robots is influenced by displays of personality. Using LLMs and speech to text technology, we designed a within-subject study to compare two conditions: a personality-driven robot and a purely task-oriented, personality-neutral robot. Twelve participants, recruited from Socially Intelligent Robotics course at Vrije Universiteit Amsterdam, interacted with a robot Nao tasked with asking them a set of medical questions under both conditions. After completing both interactions, the participants completed a user experience questionnaire measuring their emotional states and robot perception using standardized questionnaires from the SRI and Psychology literature.

Authors:Ravirajan K, Arvind Sundarajan
Title: Enhancing Workplace Productivity and Well-being Using AI Agent
Abstract:
This paper discusses the use of Artificial Intelligence (AI) to enhance workplace productivity and employee well-being. By integrating machine learning (ML) techniques with neurobiological data, the proposed approaches ensure alignment with human ethical standards through value alignment models and Hierarchical Reinforcement Learning (HRL) for autonomous task management. The system utilizes biometric feedback from employees to generate personalized health prompts, fostering a supportive work environment that encourages physical activity. Additionally, we explore decentralized multi-agent systems for improved collaboration and decision-making frameworks that enhance transparency. Various approaches using ML techniques in conjunction with AI implementations are discussed. Together, these innovations aim to create a more productive and health-conscious workplace. These outcomes assist HR management and organizations in launching more rational career progression streams for employees and facilitating organizational transformation.

Authors:Yahya Sowti Khiabani, Farris Atif, Chieh Hsu, Sven Stahlmann, Tobias Michels, Sebastian Kramer, Benedikt Heidrich, M. Saquib Sarfraz, Julian Merten, Faezeh Tafazzoli
Title: Optimizing Small Language Models for In-Vehicle Function-Calling
Abstract:
We propose a holistic approach for deploying Small Language Models (SLMs) as function-calling agents within vehicles as edge devices, offering a more flexible and robust alternative to traditional rule-based systems. By leveraging SLMs, we simplify vehicle control mechanisms and enhance the user experience. Given the in-vehicle hardware constraints, we apply state-of-the-art model compression techniques, including structured pruning, healing, and quantization, ensuring that the model fits within the resource limitations while maintaining acceptable performance. Our work focuses on optimizing a representative SLM, Microsoft's Phi-3 mini, and outlines best practices for enabling embedded models, including compression, task-specific fine-tuning, and vehicle integration. We demonstrate that, despite significant reduction in model size which removes up to 2 billion parameters from the original model, our approach preserves the model's ability to handle complex in-vehicle tasks accurately and efficiently. Furthermore, by executing the model in a lightweight runtime environment, we achieve a generation speed of 11 tokens per second, making real-time, on-device inference feasible without hardware acceleration. Our results demonstrate the potential of SLMs to transform vehicle control systems, enabling more intuitive interactions between users and their vehicles for an enhanced driving experience.

Authors:Michal Kuk, Jakub Harasta
Title: LLMs & Legal Aid: Understanding Legal Needs Exhibited Through User Queries
Abstract:
The paper presents a preliminary analysis of an experiment conducted by Frank Bold, a Czech expert group, to explore user interactions with GPT-4 for addressing legal queries. Between May 3, 2023, and July 25, 2023, 1,252 users submitted 3,847 queries. Unlike studies that primarily focus on the accuracy, factuality, or hallucination tendencies of large language models (LLMs), our analysis focuses on the user query dimension of the interaction. Using GPT-4o for zero-shot classification, we categorized queries on (1) whether users provided factual information about their issue (29.95%) or not (70.05%), (2) whether they sought legal information (64.93%) or advice on the course of action (35.07\%), and (3) whether they imposed requirements to shape or control the model's answer (28.57%) or not (71.43%). We provide both quantitative and qualitative insight into user needs and contribute to a better understanding of user engagement with LLMs.

Authors:Maryna Kapitonova, Tonio Ball
Title: Human-AI Teaming Using Large Language Models: Boosting Brain-Computer Interfacing (BCI) and Brain Research
Abstract:
Recently, there is an increasing interest in using artificial intelligence (AI) to automate aspects of the research process, or even autonomously conduct the full research cycle from idea generation, over data analysis, to composing and evaluation of scientific manuscripts. Examples of working AI scientist systems have been demonstrated for computer science tasks and running molecular biology labs. While some approaches aim for full autonomy of the scientific AI, others rather aim for leveraging human-AI teaming. Here, we address how to adapt such approaches for boosting Brain-Computer Interface (BCI) development, as well as brain research resp. neuroscience at large. We argue that at this time, a strong emphasis on human-AI teaming, in contrast to fully autonomous AI BCI researcher will be the most promising way forward. We introduce the collaborative workspaces concept for human-AI teaming based on a set of Janusian design principles, looking both ways, to the human as well as to the AI side. Based on these principles, we present ChatBCI, a Python-based toolbox for enabling human-AI collaboration based on interaction with Large Language Models (LLMs), designed for BCI research and development projects. We show how ChatBCI was successfully used in a concrete BCI project on advancing motor imagery decoding from EEG signals. Our approach can be straightforwardly extended to broad neurotechnological and neuroscientific topics, and may by design facilitate human expert knowledge transfer to scientific AI systems in general.

Authors:Annika Bush, Amin Alibakhshi
Title: Bridging the Early Science Gap with Artificial Intelligence: Evaluating Large Language Models as Tools for Early Childhood Science Education
Abstract:
Early childhood science education is crucial for developing scientific literacy, yet translating complex scientific concepts into age-appropriate content remains challenging for educators. Our study evaluates four leading Large Language Models (LLMs) - GPT-4, Claude, Gemini, and Llama - on their ability to generate preschool-appropriate scientific explanations across biology, chemistry, and physics. Through systematic evaluation by 30 nursery teachers using established pedagogical criteria, we identify significant differences in the models' capabilities to create engaging, accurate, and developmentally appropriate content. Unexpectedly, Claude outperformed other models, particularly in biological topics, while all LLMs struggled with abstract chemical concepts. Our findings provide practical insights for educators leveraging AI in early science education and offer guidance for developers working to enhance LLMs' educational applications. The results highlight the potential and current limitations of using LLMs to bridge the early childhood science literacy gap.

Authors:Mihnea C. Moldoveanu, George Siemens
Title: Interactionalism: Re-Designing Higher Learning for the Large Language Agent Era
Abstract:
We introduce Interactionalism as a new set of guiding principles and heuristics for the design and architecture of learning now available due to Generative AI (GenAI) platforms. Specifically, we articulate interactional intelligence as a net new skill set that is increasingly important when core cognitive tasks are automatable and augmentable by GenAI functions. We break down these skills into core sets of meta-cognitive and meta-emotional components and show how working with Large Language Model (LLM)-based agents can be proactively used to help develop learners. Interactionalism is not advanced as a theory of learning; but as a blueprint for the practice of learning - in coordination with GenAI.

Authors:Han Xu, Mingqi Chen, Gaofeng Li, Lei Wei, Shichi Peng, Haoliang Xu, Qiang Li
Title: An Immersive Virtual Reality Bimanual Telerobotic System With Haptic Feedback
Abstract:
In robotic bimanual teleoperation, multimodal sensory feedback plays a crucial role, providing operators with a more immersive operating experience, reducing cognitive burden, and improving operating efficiency. In this study, we develop an immersive bilateral isomorphic bimanual telerobotic system, which comprises dual arm and dual dexterous hands, with visual and haptic force feedback. To assess the performance of this system, we carried out a series of experiments and investigated the user's teleoperation experience. The results demonstrate that haptic force feedback enhances physical perception capabilities and complex task operating abilities. In addition, it compensates for visual perception deficiencies and reduces the operator's work burden. Consequently, our proposed system achieves more intuitive, realistic and immersive teleoperation, improves operating efficiency, and expands the complexity of tasks that robots can perform through teleoperation.

Authors:Alfredo Cuzzocrea, Giovanni Pilato, Pablo Garcia Bringas
Title: Creating, Using and Assessing a Generative-AI-Based Human-Chatbot-Dialogue Dataset with User-Interaction Learning Capabilities
Abstract:
The study illustrates a first step towards an ongoing work aimed at developing a dataset of dialogues potentially useful for customer service conversation management between humans and AI chatbots. The approach exploits ChatGPT 3.5 to generate dialogues. One of the requirements is that the dialogue is characterized by a specific language proficiency level of the user; the other one is that the user expresses a specific emotion during the interaction. The generated dialogues were then evaluated for overall quality. The complexity of the language used by both humans and AI agents, has been evaluated by using standard complexity measurements. Furthermore, the attitudes and interaction patterns exhibited by the chatbot at each turn have been stored for further detection of common conversation patterns in specific emotional contexts. The methodology could improve human-AI dialogue effectiveness and serve as a basis for systems that can learn from user interactions.

Authors:Sushil Ghildiyal, Kishankumar Bhimani, Manimozhi M
Title: Design To Convert a Wired PLC into Wireless PLC
Abstract:
This paper implies Bluetooth technology, which is put into effect to alter extant, wired into wireless Programmable Logic Controller (PLC). Here two Bluetooth devices are employed as a transceiver to transmit and receives the input signal to contrive wireless PLC. The main advantage of PLC is to control the output according to the status of input. In Bluetooth technology, the handshaking between the two Bluetooth modules takes place, which is interfaced with a microcontroller board (Arduino board) and then to PLC such that field devices can be controlled without wire.

Authors:Zhiting He, Jiayi Su, Li Chen, Tianqi Wang, Ray LC
Title: "I Recall the Past": Exploring How People Collaborate with Generative AI to Create Cultural Heritage Narratives
Abstract:
Visitors to cultural heritage sites often encounter official information, while local people's unofficial stories remain invisible. To explore expression of local narratives, we conducted a workshop with 20 participants utilizing Generative AI (GenAI) to support visual narratives, asking them to use Stable Diffusion to create images of familiar cultural heritage sites, as well as images of unfamiliar ones for comparison. The results revealed three narrative strategies and highlighted GenAI's strengths in illuminating, amplifying, and reinterpreting personal narratives. However, GenAI showed limitations in meeting detailed requirements, portraying cultural features, and avoiding bias, which were particularly pronounced with unfamiliar sites due to participants' lack of local knowledge. To address these challenges, we recommend providing detailed explanations, prompt engineering, and fine-tuning AI models to reduce uncertainties, using objective references to mitigate inaccuracies from participants' inability to recognize errors or misconceptions, and curating datasets to train AI models capable of accurately portraying cultural features.

Authors:Yiran Huang, Jian-Feng Yang, Haoda Fu
Title: Efficient Human-in-the-Loop Active Learning: A Novel Framework for Data Labeling in AI Systems
Abstract:
Modern AI algorithms require labeled data. In real world, majority of data are unlabeled. Labeling the data are costly. this is particularly true for some areas requiring special skills, such as reading radiology images by physicians. To most efficiently use expert's time for the data labeling, one promising approach is human-in-the-loop active learning algorithm. In this work, we propose a novel active learning framework with significant potential for application in modern AI systems. Unlike the traditional active learning methods, which only focus on determining which data point should be labeled, our framework also introduces an innovative perspective on incorporating different query scheme. We propose a model to integrate the information from different types of queries. Based on this model, our active learning frame can automatically determine how the next question is queried. We further developed a data driven exploration and exploitation framework into our active learning method. This method can be embedded in numerous active learning algorithms. Through simulations on five real-world datasets, including a highly complex real image task, our proposed active learning framework exhibits higher accuracy and lower loss compared to other methods.

Authors:Steve Mann, Martin Cooper, Bran Ferren, Thomas M. Coughlin, Paul Travers
Title: Advancing Technology for Humanity and Earth (+Water+Air)
Abstract:
As technology advances, the integration of physical, virtual, and social worlds has led to a complex landscape of ``Realities'' such as Virtual Reality (VR), Augmented Reality (AR), metaverse, spatial computing, and other emerging paradigms. This paper builds upon and refines the concept of eXtended Reality (XR) as the unifying framework that not only interpolates across these diverse realities but also extrapolates (extends) to create entirely new possibilities. XR is the ``physical spatial metaverse,'' bridging the physical world, the virtual world of artificial intelligence, and the social world of human interaction. These three worlds define the Socio-Cyber-Physical Taxonomy of XR that allows us to identify underexplored research areas such as Diminished Reality (DR), and chart future directions to {\bf advance technology for people and planet}. We highlight the six core properties of XR for applications in sustainability, healthcare, frontline work, and daily life. Central to this vision is the development of AI-driven wearable technologies, such as the smart eyeglass, that sustainably extend human capabilities.

Authors:Patrick Stokkink
Title: The Impact of AI on Educational Assessment: A Framework for Constructive Alignment
Abstract:
The influence of Artificial Intelligence (AI), and specifically Large Language Models (LLM), on education is continuously increasing. These models are frequently used by students, giving rise to the question whether current forms of assessment are still a valid way to evaluate student performance and comprehension. The theoretical framework developed in this paper is grounded in Constructive Alignment (CA) theory and Bloom's taxonomy for defining learning objectives. We argue that AI influences learning objectives of different Bloom levels in a different way, and assessment has to be adopted accordingly. Furthermore, in line with Bloom's vision, formative and summative assessment should be aligned on whether the use of AI is permitted or not. Although lecturers tend to agree that education and assessment need to be adapted to the presence of AI, a strong bias exists on the extent to which lecturers want to allow for AI in assessment. This bias is caused by a lecturer's familiarity with AI and specifically whether they use it themselves. To avoid this bias, we propose structured guidelines on a university or faculty level, to foster alignment among the staff. Besides that, we argue that teaching staff should be trained on the capabilities and limitations of AI tools. In this way, they are better able to adapt their assessment methods.

Authors:Wei Xu
Title: A User Experience 3.0 (UX 3.0) Paradigm Framework: Designing for Human-Centered AI Experiences
Abstract:
User experience (UX) practices have evolved in stages and are entering a transformative phase (UX 3.0), driven by AI technologies and shifting user needs. Human-centered AI (HCAI) experiences are emerging, necessitating new UX approaches to support UX practices in the AI era. We propose a UX 3.0 paradigm framework to respond and guide UX practices in developing HCAI systems.

Authors:Daniel Mwesigwa
Title: Against 'softmaxing' culture
Abstract:
AI is flattening culture. Evaluations of "culture" are showing the myriad ways in which large AI models are homogenizing language and culture, averaging out rich linguistic differences into generic expressions. I call this phenomenon "softmaxing culture,'' and it is one of the fundamental challenges facing AI evaluations today. Efforts to improve and strengthen evaluations of culture are central to the project of cultural alignment in large AI systems. This position paper argues that machine learning (ML) and human-computer interaction (HCI) approaches to evaluation are limited. I propose two key conceptual shifts. First, instead of asking "what is culture?" at the start of system evaluations, I propose beginning with the question: "when is culture?" Second, while I acknowledge the philosophical claim that cultural universals exist, the challenge is not simply to describe them, but to situate them in relation to their particulars. Taken together, these conceptual shifts invite evaluation approaches that move beyond technical requirements toward perspectives that are more responsive to the complexities of culture.

Authors:Haichang Li
Title: Memory as a Service (MaaS): Rethinking Contextual Memory as Service-Oriented Modules for Collaborative Agents
Abstract:
This position paper aims to rethink the role and design of memory in Large Language Model (LLM)-based agent systems. We observe that while current memory practices have begun to transcend the limitations of single interactions, they remain conceptually grounded in "bound memory" in terms of design concept-where memory is treated as local state attached to specific context or entities, forming "memory silos" that impede cross-entity collaboration. To overcome this architectural bottleneck, this paper proposes the timely design perspective of "Memory as a Service" (MaaS). MaaS advocates decoupling memory from its conventional role as an interaction byproduct and encapsulating it as a modular service that can be independently callable, dynamically composable, and finely governed. At its core, MaaS leverages the duality of memory-its inherently private nature and its potential for public service-to grant memory controlled, on-demand interoperability across entities. This paper introduces a two-dimensional design space defined by entity structure and service type, illustrating how MaaS aligns with current memory practices while naturally extending them to cross-entity collaborative scenarios. Finally, we outline an open research agenda spanning governance, security, and ethical ecosystems, and call upon the broader research community to explore this shift toward service-oriented memory for collaborative agents operating across entity boundaries.

Authors:Hitesh Mohapatra
Title: Golden Ratio Assisted Localization for Wireless Sensor Network
Abstract:
This paper presents a novel localization algorithm for wireless sensor networks (WSNs) called Golden Ratio Localization (GRL), which leverages the mathematical properties of the golden ratio (phi 1.618) to optimize both node placement and communication range. GRL introduces phi-based anchor node deployment and hop-sensitive weighting using phi-exponents to improve localization accuracy while minimizing energy consumption. Through extensive simulations conducted on a 100 m * 100 m sensor field with 100 nodes and 10 anchors, GRL achieved an average localization error of 2.35 meters, outperforming DV- Hop (3.87 meters) and Centroid (4.95 meters). In terms of energy efficiency, GRL reduced localization energy consumption to 1.12 microJ per node, compared to 1.78 microJ for DV-Hop and 1.45 microJ for Centroid. These results confirm that GRL provides a more balanced and efficient localization approach, making it especially suitable for energy-constrained and large-scale WSN deployments.

Authors:Russell Beale
Title: Adapting University Policies for Generative AI: Opportunities, Challenges, and Policy Solutions in Higher Education
Abstract:
The rapid proliferation of generative artificial intelligence (AI) tools - especially large language models (LLMs) such as ChatGPT - has ushered in a transformative era in higher education. Universities in developed regions are increasingly integrating these technologies into research, teaching, and assessment. On one hand, LLMs can enhance productivity by streamlining literature reviews, facilitating idea generation, assisting with coding and data analysis, and even supporting grant proposal drafting. On the other hand, their use raises significant concerns regarding academic integrity, ethical boundaries, and equitable access. Recent empirical studies indicate that nearly 47% of students use LLMs in their coursework - with 39% using them for exam questions and 7% for entire assignments - while detection tools currently achieve around 88% accuracy, leaving a 12% error margin. This article critically examines the opportunities offered by generative AI, explores the multifaceted challenges it poses, and outlines robust policy solutions. Emphasis is placed on redesigning assessments to be AI-resilient, enhancing staff and student training, implementing multi-layered enforcement mechanisms, and defining acceptable use. By synthesizing data from recent research and case studies, the article argues that proactive policy adaptation is imperative to harness AI's potential while safeguarding the core values of academic integrity and equity.

Authors:Zhuodi Cai
Title: 3Description: An Intuitive Human-AI Collaborative 3D Modeling Approach
Abstract:
This paper presents 3Description, an experimental human-AI collaborative approach for intuitive 3D modeling. 3Description aims to address accessibility and usability challenges in traditional 3D modeling by enabling non-professional individuals to co-create 3D models using verbal and gesture descriptions. Through a combination of qualitative research, product analysis, and user testing, 3Description integrates AI technologies such as Natural Language Processing and Computer Vision, powered by OpenAI and MediaPipe. Recognizing the web has wide cross-platform capabilities, 3Description is web-based, allowing users to describe the desired model and subsequently adjust its components using verbal and gestural inputs. In the era of AI and emerging media, 3Description not only contributes to a more inclusive and user-friendly design process, empowering more people to participate in the construction of the future 3D world, but also strives to increase human engagement in co-creation with AI, thereby avoiding undue surrender to technology and preserving human creativity.

Authors:Neha Raghuvanshi
Title: Follow the user meaningfully and product growth will follow: A mixed methods case study tying UX Point of View & Growth leading to measurable impact
Abstract:
Have you wondered how cross-functional teams balance between maximizing value that users derive and business growth leading to win-win situations? This case study shows how User Experience Research (UXR) and Data Science teams used mixed methods research to strategically influence Product Led Growth (PLG) for a Password Manager used by million+ users, thus allowing our users, internal teams, and business to win. The audience will take away practical lessons/techniques related to leveraging mixed methods to: a. Maximize user value while meeting business growth goals b. Influence cross-functional teams c. Measure user and business impact This case study can be easily tied to the UXR Point of view pyramid (POV) [2] that represents a methodological approach to construct a POV and further dives into actioning POV to create measurable user and business impact.

Authors:Russell Beale
Title: Dialogic Pedagogy for Large Language Models: Aligning Conversational AI with Proven Theories of Learning
Abstract:
Large Language Models (LLMs) are rapidly transforming education by enabling rich conversational learning experiences. This article provides a comprehensive review of how LLM-based conversational agents are being used in higher education, with extensions to secondary and lifelong learning contexts. We synthesize existing literature on LLMs in education and theories of conversational and dialogic pedagogy - including Vygotsky's sociocultural learning (scaffolding and the Zone of Proximal Development), the Socratic method, and Laurillard's conversational framework - and examine how prompting strategies and retrieval-augmented generation (RAG) can align LLM behaviors with these pedagogical theories, and how it can support personalized, adaptive learning. We map educational theories to LLM capabilities, highlighting where LLM-driven dialogue supports established learning principles and where it challenges or falls short of traditional pedagogical assumptions. Notable gaps in applying prior theories to LLMs are identified, such as the models tendency to provide direct answers instead of fostering co-construction of knowledge, and the need to account for the constant availability and broad but non-human expertise of LLM tutors. In response, we propose practical strategies to better align LLM interactions with sound pedagogy - for example, designing prompts that encourage Socratic questioning, scaffolded guidance, and student reflection, as well as integrating retrieval mechanisms to ensure accuracy and contextual relevance. Our aim is to bridge the gap between educational theory and the emerging practice of AI-driven conversational learning, offering insights and tools for making LLM-based dialogues more educationally productive and theory-aligned.

Authors:Romy Müller
Title: When concept-based XAI is imprecise: Do people distinguish between generalisations and misrepresentations?
Abstract:
Concept-based explainable artificial intelligence (C-XAI) can help reveal the inner representations of AI models. Understanding these representations is particularly important in complex tasks like safety evaluation. Such tasks rely on high-level semantic information (e.g., about actions) to make decisions about abstract categories (e.g., whether a situation is dangerous). In this context, it may desirable for C-XAI concepts to show some variability, suggesting that the AI is capable of generalising beyond the concrete details of a situation. However, it is unclear whether people recognise and appreciate such generalisations and can distinguish them from other, less desirable forms of imprecision. This was investigated in an experimental railway safety scenario. Participants evaluated the performance of a simulated AI that evaluated whether traffic scenes involving people were dangerous. To explain these decisions, the AI provided concepts in the form of similar image snippets. These concepts differed in their match with the classified image, either regarding a highly relevant feature (i.e., relation to tracks) or a less relevant feature (i.e., actions). Contrary to the hypotheses, concepts that generalised over less relevant features led to ratings that were lower than for precisely matching concepts and comparable to concepts that systematically misrepresented these features. Conversely, participants were highly sensitive to imprecisions in relevant features. These findings cast doubts on whether people spontaneously recognise generalisations. Accordingly, they might not be able to infer from C-XAI concepts whether AI models have gained a deeper understanding of complex situations.

Authors:Weixin Liang
Title: Computational Approaches to Understanding Large Language Model Impact on Writing and Information Ecosystems
Abstract:
Large language models (LLMs) have shown significant potential to change how we write, communicate, and create, leading to rapid adoption across society. This dissertation examines how individuals and institutions are adapting to and engaging with this emerging technology through three research directions. First, I demonstrate how the institutional adoption of AI detectors introduces systematic biases, particularly disadvantaging writers of non-dominant language varieties, highlighting critical equity concerns in AI governance. Second, I present novel population-level algorithmic approaches that measure the increasing adoption of LLMs across writing domains, revealing consistent patterns of AI-assisted content in academic peer reviews, scientific publications, consumer complaints, corporate communications, job postings, and international organization press releases. Finally, I investigate LLMs' capability to provide feedback on research manuscripts through a large-scale empirical analysis, offering insights into their potential to support researchers who face barriers in accessing timely manuscript feedback, particularly early-career researchers and those from under-resourced settings.

Authors:Bruno Campos
Title: Juicy or Dry? A Comparative Study of User Engagement and Information Retention in Interactive Infographics
Abstract:
This study compares the impact of "juiciness" on user engagement and short-term information retention in interactive infographics. Juicy designs generally showed a slight advantage in overall user engagement scores compared to dry designs. Specifically, the juicy version of the Burcalories infographic had the highest engagement score. However, the differences in engagement were often small. Regarding information retention, the results were mixed. The juicy versions of The Daily Routines of Famous Creative People and The Main Chakras infographics showed marginally better average recall and more participants with higher recall. Conversely, the dry version of Burcalories led to more correct answers in multiple-choice questions. The study suggests that while juicy design elements can enhance user engagement and, in some cases, short-term information retention, their effectiveness depends on careful implementation. Excessive juiciness could be overwhelming or distracting, while well-implemented juicy elements contributed to a more entertaining experience. The findings emphasize the importance of balancing engaging feedback with clarity and usability.

Authors:Zhicheng Lin
Title: Large Language Models as Psychological Simulators: A Methodological Guide
Abstract:
Large language models (LLMs) offer emerging opportunities for psychological and behavioral research, but methodological guidance is lacking. This article provides a framework for using LLMs as psychological simulators across two primary applications: simulating roles and personas to explore diverse contexts, and serving as computational models to investigate cognitive processes. For simulation, we present methods for developing psychologically grounded personas that move beyond demographic categories, with strategies for validation against human data and use cases ranging from studying inaccessible populations to prototyping research instruments. For cognitive modeling, we synthesize emerging approaches for probing internal representations, methodological advances in causal interventions, and strategies for relating model behavior to human cognition. We address overarching challenges including prompt sensitivity, temporal limitations from training data cutoffs, and ethical considerations that extend beyond traditional human subjects review. Throughout, we emphasize the need for transparency about model capabilities and constraints. Together, this framework integrates emerging empirical evidence about LLM performance--including systematic biases, cultural limitations, and prompt brittleness--to help researchers wrangle these challenges and leverage the unique capabilities of LLMs in psychological research.

Authors:Zhicheng Lin
Title: From Prompts to Constructs: A Dual-Validity Framework for LLM Research in Psychology
Abstract:
Large language models (LLMs) are rapidly being adopted across psychology, serving as research tools, experimental subjects, human simulators, and computational models of cognition. However, the application of human measurement tools to these systems can produce contradictory results, raising concerns that many findings are measurement phantoms--statistical artifacts rather than genuine psychological phenomena. In this Perspective, we argue that building a robust science of AI psychology requires integrating two of our field's foundational pillars: the principles of reliable measurement and the standards for sound causal inference. We present a dual-validity framework to guide this integration, which clarifies how the evidence needed to support a claim scales with its scientific ambition. Using an LLM to classify text may require only basic accuracy checks, whereas claiming it can simulate anxiety demands a far more rigorous validation process. Current practice systematically fails to meet these requirements, often treating statistical pattern matching as evidence of psychological phenomena. The same model output--endorsing "I am anxious"--requires different validation strategies depending on whether researchers claim to measure, characterize, simulate, or model psychological constructs. Moving forward requires developing computational analogues of psychological constructs and establishing clear, scalable standards of evidence rather than the uncritical application of human measurement tools.

Authors:Angxuan Chen
Title: When learning analytics dashboard is explainable: An exploratory study on the effect of GenAI-supported learning analytics dashboard
Abstract:
This study investigated the impact of a theory-driven, explainable Learning Analytics Dashboard (LAD) on university students' human-AI collaborative academic abstract writing task. Grounded in Self-Regulated Learning (SRL) theory and incorporating Explainable AI (XAI) principles, our LAD featured a three-layered design (Visual, Explainable, Interactive). In an experimental study, participants were randomly assigned to either an experimental group (using the full explainable LAD) or a control group (using a visual-only LAD) to collaboratively write an academic abstract with a Generative AI. While quantitative analysis revealed no significant difference in the quality of co-authored abstracts between the two groups, a significant and noteworthy difference emerged in conceptual understanding: students in the explainable LAD group demonstrated a superior grasp of abstract writing principles, as evidenced by their higher scores on a knowledge test (p= .026). These findings highlight that while basic AI-generated feedback may suffice for immediate task completion, the provision of explainable feedback is crucial for fostering deeper learning, enhancing conceptual understanding, and developing transferable skills fundamental to self-regulated learning in academic writing contexts.

Authors:Yuichiro Fujimoto
Title: ChatAR: Conversation Support using Large Language Model and Augmented Reality
Abstract:
Engaging in smooth conversations with others is a crucial social skill. However, differences in knowledge between conversation participants can sometimes hinder effective communication. To tackle this issue, this study proposes a real-time support system that integrates head-mounted display (HMD)-based augmented reality (AR) technology with large language models (LLMs). This system facilitates conversation by recognizing keywords during dialogue, generating relevant information using the LLM, reformatting it, and presenting it to the user via the HMD. A significant issue with this system is that the user's eye movements may reveal to the conversation partner that they are reading the displayed text. This study also proposes a method for presenting information that takes into account appropriate eye movements during conversation. Two experiments were conducted to evaluate the effectiveness of the proposed system. The first experiment revealed that the proposed information presentation method reduces the likelihood of the conversation partner noticing that the user is reading the displayed text. The second experiment demonstrated that the proposed method led to a more balanced speech ratio between the user and the conversation partner, as well as a increase in the perceived excitement of the conversation.

Authors:Changzeng Fu
Title: Foundation of Affective Computing and Interaction
Abstract:
This book provides a comprehensive exploration of affective computing and human-computer interaction technologies. It begins with the historical development and basic concepts of human-computer interaction, delving into the technical frameworks and practical applications of emotional computing, visual interaction, voice interaction, brain-computer interfaces, physiological electrical signal analysis, and social robotics. The book covers a wide range of topics, including the psychological and neuroscience foundations of emotion, multimodal emotion recognition, emotional expression mechanisms, and the principles of brain-computer interfaces. Key technologies such as affective computing based on discrete emotion theory and dimensional models, visual perception principles, speech recognition and synthesis, EEG signal acquisition and processing, and multimodal emotion recognition are explained in detail. This book also addresses the technical challenges in the field, including multimodal data fusion, privacy and security, and ethical considerations in human-machine relationships. It discusses the applications of these technologies across various domains such as education, healthcare, entertainment, and intelligent assistance. Looking to the future, the book anticipates trends such as the deep integration of artificial intelligence with emotion recognition, the advancement of multimodal interaction technologies, and the development of more personalized and adaptive emotion recognition systems. It emphasizes the importance of balancing technological innovation with ethical considerations to ensure the responsible development and application of affective computing technologies.

Authors:Patricia Diaz
Title: Building Blocks of a User Experience Research Point of View
Abstract:
This paper presents three User Experience Research (UXR) perspectives based on data, evidence and insights - known as Point of View (POV) - showcasing how the strategies and methods of building a POV work in an enterprise setting. The POV are: 1. Smart Visuals: Use AI to extract and translate text from visuals in videos (2019). 2. Assessable Code Editor: Focus on direct AI-feedback to the learner as it is the loop that requires the least effort for the highest impact(2023). 3. Opportunity Landscape: Identify high-impact opportunities at the intersection of emergent technical capabilities that unlock novel approaches to critical user needs while addressing business strategic priorities (2019). They all seemed far-fetched and went against common practice. All were adopted and had long-lasting impact.

Authors:Yikan Wang
Title: Accessible Gesture-Driven Augmented Reality Interaction System
Abstract:
Augmented reality (AR) offers immersive interaction but remains inaccessible for users with motor impairments or limited dexterity due to reliance on precise input methods. This study proposes a gesture-based interaction system for AR environments, leveraging deep learning to recognize hand and body gestures from wearable sensors and cameras, adapting interfaces to user capabilities. The system employs vision transformers (ViTs), temporal convolutional networks (TCNs), and graph attention networks (GATs) for gesture processing, with federated learning ensuring privacy-preserving model training across diverse users. Reinforcement learning optimizes interface elements like menu layouts and interaction modes. Experiments demonstrate a 20% improvement in task completion efficiency and a 25% increase in user satisfaction for motor-impaired users compared to baseline AR systems. This approach enhances AR accessibility and scalability. Keywords: Deep learning, Federated learning, Gesture recognition, Augmented reality, Accessibility, Human-computer interaction

Authors:Paul Murrell
Title: Data Verbalisation: What is Text Doing in a Data Visualisation?
Abstract:
This article discusses the role that text elements play in a data visualisation. We argue that there is a need for a simple, coherent explanation of text elements similar to the understanding that already exists for non-text elements like bars, points, and lines. We explore examples of how text is used within a data visualisation and use existing knowledge and assessment techniques to evaluate when text is effective and when it is not. The result is a framework that aims to be easy to understand and easy to apply in order to understand the purpose and effectiveness of the text elements in any data visualisation.

Authors:Paige Tuttösí
Title: I Know You're Listening: Adaptive Voice for HRI
Abstract:
While the use of social robots for language teaching has been explored, there remains limited work on a task-specific synthesized voices for language teaching robots. Given that language is a verbal task, this gap may have severe consequences for the effectiveness of robots for language teaching tasks. We address this lack of L2 teaching robot voices through three contributions: 1. We address the need for a lightweight and expressive robot voice. Using a fine-tuned version of Matcha-TTS, we use emoji prompting to create an expressive voice that shows a range of expressivity over time. The voice can run in real time with limited compute resources. Through case studies, we found this voice more expressive, socially appropriate, and suitable for long periods of expressive speech, such as storytelling. 2. We explore how to adapt a robot's voice to physical and social ambient environments to deploy our voices in various locations. We found that increasing pitch and pitch rate in noisy and high-energy environments makes the robot's voice appear more appropriate and makes it seem more aware of its current environment. 3. We create an English TTS system with improved clarity for L2 listeners using known linguistic properties of vowels that are difficult for these listeners. We used a data-driven, perception-based approach to understand how L2 speakers use duration cues to interpret challenging words with minimal tense (long) and lax (short) vowels in English. We found that the duration of vowels strongly influences the perception for L2 listeners and created an "L2 clarity mode" for Matcha-TTS that applies a lengthening to tense vowels while leaving lax vowels unchanged. Our clarity mode was found to be more respectful, intelligible, and encouraging than base Matcha-TTS while reducing transcription errors in these challenging tense/lax minimal pairs.

Authors:Antonios Saravanos
Title: How Warm-Glow Alters the Usability of Technology
Abstract:
As technology increasingly aligns with users' personal values, traditional models of usability, focused on functionality and specifically effectiveness, efficiency, and satisfaction, may not fully capture how people perceive and evaluate it. This study investigates how the warm-glow phenomenon, the positive feeling associated with doing good, shapes perceived usability. An experimental approach was taken in which participants evaluated a hypothetical technology under conditions designed to evoke either the intrinsic (i.e., personal fulfillment) or extrinsic (i.e., social recognition) dimensions of warm-glow. A Multivariate Analysis of Variance as well as subsequent follow-up analyses revealed that intrinsic warm-glow significantly enhances all dimensions of perceived usability, while extrinsic warm-glow selectively influences perceived effectiveness and satisfaction. These findings suggest that perceptions of usability extend beyond functionality and are shaped by how technology resonates with users' broader sense of purpose. We conclude by proposing that designers consider incorporating warm-glow into technology as a strategic design decision.

Authors:Yingchao Li
Title: Human-Centered Editable Speech-to-Sign-Language Generation via Streaming Conformer-Transformer and Resampling Hook
Abstract:
Existing end-to-end sign-language animation systems suffer from low naturalness, limited facial/body expressivity, and no user control. We propose a human-centered, real-time speech-to-sign animation framework that integrates (1) a streaming Conformer encoder with an autoregressive Transformer-MDN decoder for synchronized upper-body and facial motion generation, (2) a transparent, editable JSON intermediate representation empowering deaf users and experts to inspect and modify each sign segment, and (3) a human-in-the-loop optimization loop that refines the model based on user edits and ratings. Deployed on Unity3D, our system achieves a 13 ms average frame-inference time and a 103 ms end-to-end latency on an RTX 4070. Our key contributions include the design of a JSON-centric editing mechanism for fine-grained sign-level personalization and the first application of an MDN-based feedback loop for continuous model adaptation. This combination establishes a generalizable, explainable AI paradigm for user-adaptive, low-latency multimodal systems. In studies with 20 deaf signers and 5 professional interpreters, we observe a +13 point SUS improvement, 6.7 point reduction in cognitive load, and significant gains in naturalness and trust (p $<$ .001) over baselines. This work establishes a scalable, explainable AI paradigm for accessible sign-language technologies.

Authors:Yanwei Wang
Title: Steering Robots with Inference-Time Interactions
Abstract:
Imitation learning has driven the development of generalist policies capable of autonomously solving multiple tasks. However, when a pretrained policy makes errors during deployment, there are limited mechanisms for users to correct its behavior. While collecting additional data for finetuning can address such issues, doing so for each downstream use case is inefficient at deployment. My research proposes an alternative: keeping pretrained policies frozen as a fixed skill repertoire while allowing user interactions to guide behavior generation toward user preferences at inference time. By making pretrained policies steerable, users can help correct policy errors when the model struggles to generalize-without needing to finetune the policy. Specifically, I propose (1) inference-time steering, which leverages user interactions to switch between discrete skills, and (2) task and motion imitation, which enables user interactions to edit continuous motions while satisfying task constraints defined by discrete symbolic plans. These frameworks correct misaligned policy predictions without requiring additional training, maximizing the utility of pretrained models while achieving inference-time user objectives.

Authors:Md Nazmus Sakib
Title: An Interdisciplinary Review of Commonsense Reasoning and Intent Detection
Abstract:
This review explores recent advances in commonsense reasoning and intent detection, two key challenges in natural language understanding. We analyze 28 papers from ACL, EMNLP, and CHI (2020-2025), organizing them by methodology and application. Commonsense reasoning is reviewed across zero-shot learning, cultural adaptation, structured evaluation, and interactive contexts. Intent detection is examined through open-set models, generative formulations, clustering, and human-centered systems. By bridging insights from NLP and HCI, we highlight emerging trends toward more adaptive, multilingual, and context-aware models, and identify key gaps in grounding, generalization, and benchmark design.

Authors:Yashodip Dharmendra Jagtap
Title: Shelter Soul: Bridging Shelters and Adopters Through Technology
Abstract:
Pet adoption processes often face inefficiencies, including limited accessibility, lack of real-time information, and mismatched expectations between shelters and adopters. To address these challenges, this study presents Shelter Soul, a technology-based solution designed to streamline pet adoption through an integrated, web-based platform. Developed using the MERN stack and GraphQL, Shelter Soul is a prototype system built to improve pet matching accuracy, shelter management efficiency, and secure online donations. The system includes modules for intelligent pet matching, shelter administration, donation processing, volunteer coordination, and analytics. Prototype testing (performance load tests, usability studies, and security assessments) demonstrated that the system meets its design goals: it handled 500 concurrent users with a 99.2% transaction success rate and an average response time of 250 ms, and usability feedback rated the interface highly (4.5/5). These results indicate Shelter Soul's potential as a practical solution to enhance animal shelter operations and adoption outcomes.

Authors:Judson Leroy Dean Haynes
Title: Enter: Graduated Realism: A Pedagogical Framework for AI-Powered Avatars in Virtual Reality Teacher Training
Abstract:
Virtual Reality simulators offer a powerful tool for teacher training, yet the integration of AI-powered student avatars presents a critical challenge: determining the optimal level of avatar realism for effective pedagogy. This literature review examines the evolution of avatar realism in VR teacher training, synthesizes its theoretical implications, and proposes a new pedagogical framework to guide future design. Through a systematic review, this paper traces the progression from human-controlled avatars to generative AI prototypes. Applying learning theories like Cognitive Load Theory, we argue that hyper-realism is not always optimal, as high-fidelity avatars can impose excessive extraneous cognitive load on novices, a stance supported by recent empirical findings. A significant gap exists between the technological drive for photorealism and the pedagogical need for scaffolded learning. To address this gap, we propose Graduated Realism, a framework advocating for starting trainees with lower-fidelity avatars and progressively increasing behavioral complexity as skills develop. To make this computationally feasible, we outline a novel single-call architecture, Crazy Slots, which uses a probabilistic engine and a Retrieval-Augmented Generation database to generate authentic, real-time responses without the latency and cost of multi-step reasoning models. This review provides evidence-based principles for designing the next generation of AI simulators, arguing that a pedagogically grounded approach to realism is essential for creating scalable and effective teacher education tools.

Authors:Mohd Anwar Jamal Faiz
Title: Primender Sequence: A Novel Mathematical Construct for Testing Symbolic Inference and AI Reasoning
Abstract:
This paper introduces the Primender sequence, a novel integer sequence defined by a hybrid rule that combines classical primality with modular digit-based conditions. Specifically, a number n is included in the sequence if it is prime or ends with a prime number of unit digit or any length. In other words, numbers which are primes or have at least one prime suffix. The resulting sequence exhibits a deterministic yet non-trivial structure, blending number-theoretic properties with symbolic patterning. We propose the Primender sequence as a benchmark for evaluating the symbolic reasoning capabilities of Large Language Models (LLMs). The study is motivated by the need for interpretable, rule-based testbeds that can assess an LLM's ability to infer hidden rules, validate mathematical hypotheses, and generalize symbolic logic at scale. A key hypothesis explored is: Whenever a number in the Primender sequence is exactly one more than the largest prime less than or equal to it, the difference between it and the previous number in the sequence is also 1. We design a structured prompt and evaluation framework to test this hypothesis across multiple state-of-the-art LLMs, including ChatGPT, Copilot, DeepSeek, Gemini, Grok, and LLaMA. The models are tasked with identifying the underlying rule, validating the hypothesis, and generating the next 100,000 terms of the sequence. Comparative metrics such as rule inference accuracy, hypothesis evaluation, sequence validity, and symbolic explanation quality are used to assess model performance. This work contributes a novel mathematical construct and a reproducible methodology for benchmarking LLMs in symbolic reasoning, hypothesis testing, and scalable pattern generalization - bridging the domains of number theory, artificial intelligence, and software engineering.

Authors:Lalitha A R
Title: Beyond Compliance: A User-Autonomy Framework for Inclusive and Customizable Web Accessibility
Abstract:
This paper proposes a shift from compliance-centered web accessibility to a care-driven model that prioritizes user autonomy, using neurodivergent users as a catalyst case for broader personalization needs. While accessibility standards offer a flexible framework, they are often interpreted and implemented as static compliance checklists, our approach reframes it as a flexible, user-centered process. We introduce a customizable Comfort Mode framework that allows users to adapt interface settings, such as contrast, typography, motion, and scaling, according to their individual needs, while retaining the brand's core visual identity. Grounded in psychological and cognitive accessibility principles, our design supports personalization without sacrificing creative freedom. We present both minimal and advanced implementation models with mock-ups, demonstrating how inclusive design can be seamlessly integrated at minimal cost. This approach aims to broaden digital inclusivity by offering autonomy to those who require it, without imposing changes on those who do not. The proposed system is adaptable, scalable, and suitable for a wide range of users and brands, offering a new paradigm where user autonomy, aesthetic integrity, and accessibility converge not through compromise, but through choice.

Authors:Ece Gumusel
Title: Multiverse Privacy Theory for Contextual Risks in Complex User-AI Interactions
Abstract:
In an era of increasing interaction with artificial intelligence (AI), users face evolving privacy decisions shaped by complex, uncertain factors. This paper introduces Multiverse Privacy Theory, a novel framework in which each privacy decision spawns a parallel universe, representing a distinct potential outcome based on user choices over time. By simulating these universes, this theory provides a foundation for understanding privacy through the lens of contextual integrity, evolving preferences, and probabilistic decision-making. Future work will explore its application using real-world, scenario-based survey data.

Authors:Joonhyung Bae
Title: Thief of Truth: VR comics about the relationship between AI and humans
Abstract:
Thief of Truth is a first-person perspective Virtual Reality (VR) comic that explores the relationship between humans and artificial intelligence (AI). The work tells the story of a mind-uploaded human being reborn as a new subject while interacting with an AI that is looking for the meaning of life. In order to experiment with the expandability of VR comics, the work was produced by focusing on three problems. First, the comic is designed using the viewing control effect of VR. Second, through VR controller-based interaction, the player's immersion in the work is increased. Third, a method for increasing accessibility to VR comics was devised. This work aims to present an example of an experimental attempt in VR Comics.

Authors:Xia Li
Title: Designing conflict-based communicative tasks in Teaching Chinese as a Foreign Language with ChatGPT
Abstract:
In developing the teaching program for a course in Oral Expression in Teaching Chinese as a Foreign Language at the university level, the teacher designs communicative tasks based on conflicts to encourage learners to engage in interactive dynamics and develop their oral interaction skills. During the design of these tasks, the teacher uses ChatGPT to assist in finalizing the program. This article aims to present the key characteristics of the interactions between the teacher and ChatGPT during this program development process, as well as to examine the use of ChatGPT and its impacts in this specific context.

Authors:Jacob Erickson
Title: Fake Friends and Sponsored Ads: The Risks of Advertising in Conversational Search
Abstract:
Digital commerce thrives on advertising, with many of the largest technology companies relying on it as a significant source of revenue. However, in the context of information-seeking behavior, such as search, advertising may degrade the user experience by lowering search quality, misusing user data for inappropriate personalization, potentially misleading individuals, or even leading them toward harm. These challenges remain significant as conversational search technologies, such as ChatGPT, become widespread. This paper critically examines the future of advertising in conversational search, utilizing several speculative examples to illustrate the potential risks posed to users who seek guidance on sensitive topics. Additionally, it provides an overview of the forms that advertising might take in this space and introduces the "fake friend dilemma," the idea that a conversational agent may exploit unaligned user trust to achieve other objectives. This study presents a provocative discussion on the future of online advertising in the space of conversational search and ends with a call to action.

Authors:Kevin Baum
Title: Disentangling AI Alignment: A Structured Taxonomy Beyond Safety and Ethics
Abstract:
Recent advances in AI research make it increasingly plausible that artificial agents with consequential real-world impact will soon operate beyond tightly controlled environments. Ensuring that these agents are not only safe but that they adhere to broader normative expectations is thus an urgent interdisciplinary challenge. Multiple fields -- notably AI Safety, AI Alignment, and Machine Ethics -- claim to contribute to this task. However, the conceptual boundaries and interrelations among these domains remain vague, leaving researchers without clear guidance in positioning their work. To address this meta-challenge, we develop a structured conceptual framework for understanding AI alignment. Rather than focusing solely on alignment goals, we introduce a taxonomy distinguishing the alignment aim (safety, ethicality, legality, etc.), scope (outcome vs. execution), and constituency (individual vs. collective). This structural approach reveals multiple legitimate alignment configurations, providing a foundation for practical and philosophical integration across domains, and clarifying what it might mean for an agent to be aligned all-things-considered.

Authors:Mohammed Almutairi
Title: Sentiment Analysis in Learning Management Systems Understanding Student Feedback at Scale
Abstract:
During the wake of the Covid-19 pandemic, the educational paradigm has experienced a major change from in person learning traditional to online platforms. The change of learning convention has impacted the teacher-student especially in non-verbal communication. The absent of non-verbal communication has led to a reliance on verbal feedback which diminished the efficacy of the educational experience. This paper explores the integration of sentiment analysis into learning management systems (LMS) to bridge the student-teacher's gap by offering an alternative approach to interpreting student feedback beyond its verbal context. The research involves data preparation, feature selection, and the development of a deep neural network model encompassing word embedding, LSTM, and attention mechanisms. This model is compared against a logistic regression baseline to evaluate its efficacy in understanding student feedback. The study aims to bridge the communication gap between instructors and students in online learning environments, offering insights into the emotional context of student feedback and ultimately improving the quality of online education.

Authors:Mohammed Almutairi
Title: Teaming in the AI Era: AI-Augmented Frameworks for Forming, Simulating, and Optimizing Human Teams
Abstract:
Effective teamwork is essential across diverse domains. During the team formation stage, a key challenge is forming teams that effectively balance user preferences with task objectives to enhance overall team satisfaction. In the team performing stage, maintaining cohesion and engagement is critical for sustaining high team performance. However, existing computational tools and algorithms for team optimization often rely on static data inputs, narrow algorithmic objectives, or solutions tailored for specific contexts, failing to account for the dynamic interplay of team members personalities, evolving goals, and changing individual preferences. Therefore, teams may encounter member dissatisfaction, as purely algorithmic assignments can reduce members commitment to team goals or experience suboptimal engagement due to the absence of timely, personalized guidance to help members adjust their behaviors and interactions as team dynamics evolve. Ultimately, these challenges can lead to reduced overall team performance. My Ph.D. dissertation aims to develop AI-augmented team optimization frameworks and practical systems that enhance team satisfaction, engagement, and performance. First, I propose a team formation framework that leverages a multi-armed bandit algorithm to iteratively refine team composition based on user preferences, ensuring alignment between individual needs and collective team goals to enhance team satisfaction. Second, I introduce tAIfa (Team AI Feedback Assistant), an AI-powered system that utilizes large language models (LLMs) to deliver immediate, personalized feedback to both teams and individual members, enhancing cohesion and engagement. Finally, I present PuppeteerLLM, an LLM-based simulation framework that simulates multi-agent teams to model complex team dynamics within realistic environments, incorporating task-driven collaboration and long-term coordination.

Authors:Chen Chen
Title: Seamless and Efficient Interactions within a Mixed-Dimensional Information Space
Abstract:
Mediated by today's visual displays, information space allows users to discover, access and interact with a wide range of digital and physical information. The information presented in this space may be digital, physical or a blend of both, and appear across different dimensions - such as texts, images, 3D content and physical objects embedded within real-world environment. Navigating within the information space often involves interacting with mixed-dimensional entities, visually represented in both 2D and 3D. At times, interactions also involve transitioning among entities represented in different dimensions. We introduce the concept of mixed-dimensional information space, encompassing entities represented in both 2D and 3D. Interactions within the mixed-dimensional information space should be seamless and efficient: users should be able to focus on their primary tasks without being distracted by interactions with or transitions between entities. While incorporating 3D representations into the mixed-dimensional information space offers intuitive and immersive ways to interact with complex information, it is important to address potential seams and inefficiencies that arise while interacting with both 2D and 3D entities. This dissertation introduces new interactive techniques and systems to realize seamless and efficient interactions within the mixed-dimensional information space. This dissertation introduces three interactive systems: MemoVis which aims to use emergent generative AI to help users create reference images for 3D design feedback; PaperToPlace which demonstrates how paper-based instruction documents can be transformed and spatialized into a context-aware MR experience; and VRContour which explores how contour delineation workflow can be brought into VR.

Authors:Zhengyang Li
Title: Enhancing Text Comprehension for Dyslexic Readers: A 3D Semantic Visualization Approach Using Transformer Mode
Abstract:
Dyslexic individuals often face significant challenges with traditional reading, particularly when engaging with complex texts such as mystery novels. These texts typically demand advanced narrative tracking and information integration skills, making it difficult for dyslexic readers to fully comprehend the content. However, research indicates that while dyslexic individuals may struggle with textual processing, they often possess strong spatial imagination abilities. Leveraging this strength, this study proposes an innovative approach using Transformer models to map sentences and words into three-dimensional vector representations. This process clusters semantically similar sentences and words in spatial proximity, allowing dyslexic readers to interpret the semantic structure and narrative flow of the text through spatial perception. Experimental results demonstrate that, compared to direct text reading, this three-dimensional semantic visualization method significantly enhances dyslexic readers' comprehension of complex texts. In particular, it shows marked advantages in identifying narrative relationships and character connections. This study provides a novel pathway for improving textual comprehension among dyslexic individuals

Authors:Sean Steinle
Title: Sampling Preferences Yields Simple Trustworthiness Scores
Abstract:
With the onset of large language models (LLMs), the performance of artificial intelligence (AI) models is becoming increasingly multi-dimensional. Accordingly, there have been several large, multi-dimensional evaluation frameworks put forward to evaluate LLMs. Though these frameworks are much more realistic than previous attempts which only used a single score like accuracy, multi-dimensional evaluations can complicate decision-making since there is no obvious way to select an optimal model. This work introduces preference sampling, a method to extract a scalar trustworthiness score from multi-dimensional evaluation results by considering the many characteristics of model performance which users value. We show that preference sampling improves upon alternate aggregation methods by using multi-dimensional trustworthiness evaluations of LLMs from TrustLLM and DecodingTrust. We find that preference sampling is consistently reductive, fully reducing the set of candidate models 100% of the time whereas Pareto optimality never reduces the set by more than 50%. Likewise, preference sampling is consistently sensitive to user priors-allowing users to specify the relative weighting and confidence of their preferences-whereas averaging scores is intransigent to the users' prior knowledge.

Authors:Richard Armitage
Title: Performance of leading large language models in May 2025 in Membership of the Royal College of General Practitioners-style examination questions: a cross-sectional analysis
Abstract:
Background: Large language models (LLMs) have demonstrated substantial potential to support clinical practice. Other than Chat GPT4 and its predecessors, few LLMs, especially those of the leading and more powerful reasoning model class, have been subjected to medical specialty examination questions, including in the domain of primary care. This paper aimed to test the capabilities of leading LLMs as of May 2025 (o3, Claude Opus 4, Grok3, and Gemini 2.5 Pro) in primary care education, specifically in answering Member of the Royal College of General Practitioners (MRCGP) style examination questions. Methods: o3, Claude Opus 4, Grok3, and Gemini 2.5 Pro were tasked to answer 100 randomly chosen multiple choice questions from the Royal College of General Practitioners GP SelfTest on 25 May 2025. Questions included textual information, laboratory results, and clinical images. Each model was prompted to answer as a GP in the UK and was provided with full question information. Each question was attempted once by each model. Responses were scored against correct answers provided by GP SelfTest. Results: The total score of o3, Claude Opus 4, Grok3, and Gemini 2.5 Pro was 99.0%, 95.0%, 95.0%, and 95.0%, respectively. The average peer score for the same questions was 73.0%. Discussion: All models performed remarkably well, and all substantially exceeded the average performance of GPs and GP registrars who had answered the same questions. o3 demonstrated the best performance, while the performances of the other leading models were comparable with each other and were not substantially lower than that of o3. These findings strengthen the case for LLMs, particularly reasoning models, to support the delivery of primary care, especially those that have been specifically trained on primary care clinical data.

Authors:Alarith Uhde
Title: How Problematic are Suspenseful Interactions?
Abstract:
Current "social acceptability" guidelines for interactive technologies advise against certain, seemingly problematic forms of interaction. Specifically, "suspenseful" interactions, characterized by visible manipulations and invisible effects, are generally considered be problematic. However, the empirical grounding for this claim is surprisingly weak. To test its validity, this paper presents a controlled replication study (n = 281) of the "suspensefulness effect". Although it could be statistically replicated with two out of three social acceptability measures, effect sizes were small (r =< .2), and all compared forms of interaction, including the suspenseful one, had high absolute social acceptability scores. Thus, despite the slight negative effect, suspenseful interactions seem less problematic in the overall scheme of things. We discuss alternative approaches to improve the social acceptability of interactive technology, and recommend to more closely engage with their specific social situatedness.

Authors:Stefan Pasch
Title: Bottom-Up Perspectives on AI Governance: Insights from User Reviews of AI Products
Abstract:
With the growing importance of AI governance, numerous high-level frameworks and principles have been articulated by policymakers, institutions, and expert communities to guide the development and application of AI. While such frameworks offer valuable normative orientation, they may not fully capture the practical concerns of those who interact with AI systems in organizational and operational contexts. To address this gap, this study adopts a bottom-up approach to explore how governance-relevant themes are expressed in user discourse. Drawing on over 100,000 user reviews of AI products from G2.com, we apply BERTopic to extract latent themes and identify those most semantically related to AI governance. The analysis reveals a diverse set of governance-relevant topics spanning both technical and non-technical domains. These include concerns across organizational processes-such as planning, coordination, and communication-as well as stages of the AI value chain, including deployment infrastructure, data handling, and analytics. The findings show considerable overlap with institutional AI governance and ethics frameworks on issues like privacy and transparency, but also surface overlooked areas such as project management, strategy development, and customer interaction. This highlights the need for more empirically grounded, user-centered approaches to AI governance-approaches that complement normative models by capturing how governance unfolds in applied settings. By foregrounding how governance is enacted in practice, this study contributes to more inclusive and operationally grounded approaches to AI governance and digital policy.

Authors:Boning Zhao
Title: Human Empathy as Encoder: AI-Assisted Depression Assessment in Special Education
Abstract:
Assessing student depression in sensitive environments like special education is challenging. Standardized questionnaires may not fully reflect students' true situations. Furthermore, automated methods often falter with rich student narratives, lacking the crucial, individualized insights stemming from teachers' empathetic connections with students. Existing methods often fail to address this ambiguity or effectively integrate educator understanding. To address these limitations by fostering a synergistic human-AI collaboration, this paper introduces Human Empathy as Encoder (HEAE), a novel, human-centered AI framework for transparent and socially responsible depression severity assessment. Our approach uniquely integrates student narrative text with a teacher-derived, 9-dimensional "Empathy Vector" (EV), its dimensions guided by the PHQ-9 framework,to explicitly translate tacit empathetic insight into a structured AI input enhancing rather than replacing human judgment. Rigorous experiments optimized the multimodal fusion, text representation, and classification architecture, achieving 82.74% accuracy for 7-level severity classification. This work demonstrates a path toward more responsible and ethical affective computing by structurally embedding human empathy

Authors:Nick Byrd
Title: Strategic Reflectivism In Intelligent Systems
Abstract:
By late 20th century, the rationality wars had launched debates about the nature and norms of intuitive and reflective thinking. Those debates drew from mid-20th century ideas such as bounded rationality, which challenged more idealized notions of rationality observed since the 19th century. Now that 21st century cognitive scientists are applying the resulting dual pro-cess theories to artificial intelligence, it is time to dust off some lessons from this history. So this paper synthesizes old ideas with recent results from experiments on humans and machines. The result is Strategic Reflec-tivism, the position that one key to intelligent systems (human or artificial) is pragmatic switching between intuitive and reflective inference to opti-mally fulfill competing goals. Strategic Reflectivism builds on American Pragmatism, transcends superficial indicators of reflective thinking such as model size or chains of thought, applies to both individual and collective intelligence systems (including human-AI teams), and becomes increasingly actionable as we learn more about the value of intuition and reflection.

Authors:Eleni Vasilaki
Title: In Dialogue with Intelligence: Rethinking Large Language Models as Collective Knowledge
Abstract:
Large Language Models (LLMs) are typically analysed through architectural, behavioural, or training-data lenses. This article offers a theoretical and experiential re-framing: LLMs as dynamic instantiations of Collective human Knowledge (CK), where intelligence is evoked through dialogue rather than stored statically. Drawing on concepts from neuroscience and AI, and grounded in sustained interaction with ChatGPT-4, I examine emergent dialogue patterns, the implications of fine-tuning, and the notion of co-augmentation: mutual enhancement between human and machine cognition. This perspective offers a new lens for understanding interaction, representation, and agency in contemporary AI systems.

Authors:Aaditya Shankar Majumder
Title: Eye-Tracking and Biometric Feedback in UX Research: Measuring User Engagement and Cognitive Load
Abstract:
User experience research often uses surveys and interviews, which may miss subconscious user interactions. This study explores eye-tracking and biometric feedback as tools to assess user engagement and cognitive load in digital interfaces. These methods measure gaze behavior and bodily responses, providing an objective complement to qualitative insights. Using empirical evidence, practical applications, and advancements from 2023-2025, we present experimental data, describe our methodology, and place our work within foundational and recent literature. We address challenges like data interpretation, ethical issues, and technological integration. These tools are key for advancing UX design in complex digital environments.

Authors:Ben Rahman
Title: HOT-FIT-BR: A Context-Aware Evaluation Framework for Digital Health Systems in Resource-Limited Settings
Abstract:
Implementation of digital health systems in low-middle-income countries (LMICs) often fails due to a lack of evaluations that take into account infrastructure limitations, local policies, and community readiness. We introduce HOT-FIT-BR, a contextual evaluation framework that expands the HOT-FIT model with three new dimensions: (1) Infrastructure Index to measure electricity/internet availability, (2) Policy Compliance Layer to ensure regulatory compliance (e.g., Permenkes 24/2022 in Indonesia), and (3) Community Engagement Fit. Simulations at Indonesian Health Centers show that HOT-FIT-BR is 58% more sensitive to detecting problems than HOT-FIT, especially in rural areas with an Infra Index <3. The framework has also proven adaptive to the context of other LMICs such as India and Kenya through local parameter adjustments.

Authors:Dina Albassam
Title: Toward Human Centered Interactive Clinical Question Answering System
Abstract:
Unstructured clinical notes contain essential patient information but are challenging for physicians to search and interpret efficiently. Although large language models (LLMs) have shown promise in question answering (QA), most existing systems lack transparency, usability, and alignment with clinical workflows. This work introduces an interactive QA system that enables physicians to query clinical notes via text or voice and receive extractive answers highlighted directly in the note for traceability. The system was built using OpenAI models with zero-shot prompting and evaluated across multiple metrics, including exact string match, word overlap, SentenceTransformer similarity, and BERTScore. Results show that while exact match scores ranged from 47 to 62 percent, semantic similarity scores exceeded 87 percent, indicating strong contextual alignment even when wording varied. To assess usability, the system was also evaluated using simulated clinical personas. Seven diverse physician and nurse personas interacted with the system across scenario-based tasks and provided structured feedback. The evaluations highlighted strengths in intuitive design and answer accessibility, alongside opportunities for enhancing explanation clarity.

Authors:Gizem Gultekin-Varkonyi
Title: AI Literacy for Legal AI Systems: A practical approach
Abstract:
Legal AI systems are increasingly being adopted by judicial and legal system deployers and providers worldwide to support a range of applications. While they offer potential benefits such as reducing bias, increasing efficiency, and improving accountability, they also pose significant risks, requiring a careful balance between opportunities, and legal and ethical development and deployment. AI literacy, as a legal requirement under the EU AI Act and a critical enabler of ethical AI for deployers and providers, could be a tool to achieve this. The article introduces the term "legal AI systems" and then analyzes the concept of AI literacy and the benefits and risks associated with these systems. This analysis is linked to a broader AI-L concept for organizations that deal with legal AI systems. The outcome of the article, a roadmap questionnaire as a practical tool for developers and providers to assess risks, benefits, and stakeholder concerns, could be useful in meeting societal and regulatory expectations for legal AI.

Authors:Stefan Pasch
Title: AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals
Abstract:
As large language models (LLMs) are increasingly deployed in high-stakes settings, their ability to refuse ethically sensitive prompts-such as those involving hate speech or illegal activities-has become central to content moderation and responsible AI practices. While refusal responses can be viewed as evidence of ethical alignment and safety-conscious behavior, recent research suggests that users may perceive them negatively. At the same time, automated assessments of model outputs are playing a growing role in both evaluation and training. In particular, LLM-as-a-Judge frameworks-in which one model is used to evaluate the output of another-are now widely adopted to guide benchmarking and fine-tuning. This paper examines whether such model-based evaluators assess refusal responses differently than human users. Drawing on data from Chatbot Arena and judgments from two AI judges (GPT-4o and Llama 3 70B), we compare how different types of refusals are rated. We distinguish ethical refusals, which explicitly cite safety or normative concerns (e.g., "I can't help with that because it may be harmful"), and technical refusals, which reflect system limitations (e.g., "I can't answer because I lack real-time data"). We find that LLM-as-a-Judge systems evaluate ethical refusals significantly more favorably than human users, a divergence not observed for technical refusals. We refer to this divergence as a moderation bias-a systematic tendency for model-based evaluators to reward refusal behaviors more than human users do. This raises broader questions about transparency, value alignment, and the normative assumptions embedded in automated evaluation systems.

Authors:Kyungho Lee
Title: Towards a Working Definition of Designing Generative User Interfaces
Abstract:
Generative UI is transforming interface design by facilitating AI-driven collaborative workflows between designers and computational systems. This study establishes a working definition of Generative UI through a multi-method qualitative approach, integrating insights from a systematic literature review of 127 publications, expert interviews with 18 participants, and analyses of 12 case studies. Our findings identify five core themes that position Generative UI as an iterative and co-creative process. We highlight emerging design models, including hybrid creation, curation-based workflows, and AI-assisted refinement strategies. Additionally, we examine ethical challenges, evaluation criteria, and interaction models that shape the field. By proposing a conceptual foundation, this study advances both theoretical discourse and practical implementation, guiding future HCI research toward responsible and effective generative UI design practices.

Authors:Madjid Sadallah
Title: From Data to Actionable Understanding: A Learner-Centered Framework for Dynamic Learning Analytics
Abstract:
Learning Analytics Dashboards (LADs) often fall short of their potential to empower learners, frequently prioritizing data visualization over the cognitive processes crucial for translating data into actionable learning strategies. This represents a significant gap in the field: while much research has focused on data collection and presentation, there is a lack of comprehensive models for how LADs can actively support learners' sensemaking and self-regulation. This paper introduces the Adaptive Understanding Framework (AUF), a novel conceptual model for learner-centered LAD design. The AUF seeks to address this limitation by integrating a multi-dimensional model of situational awareness, dynamic sensemaking strategies, adaptive mechanisms, and metacognitive support. This transforms LADs into dynamic learning partners that actively scaffold learners' sensemaking. Unlike existing frameworks that tend to treat these aspects in isolation, the AUF emphasizes their dynamic and intertwined relationships, creating a personalized and adaptive learning ecosystem that responds to individual needs and evolving understanding. The paper details the AUF's core principles, key components, and suggests a research agenda for future empirical validation. By fostering a deeper, more actionable understanding of learning data, AUF-inspired LADs have the potential to promote more effective, equitable, and engaging learning experiences.

Authors:Cassandra Overney
Title: Designing for Constructive Civic Communication: A Framework for Human-AI Collaboration in Community Engagement Processes
Abstract:
Community engagement processes form a critical foundation of democratic governance, yet frequently struggle with resource constraints, sensemaking challenges, and barriers to inclusive participation. These processes rely on constructive communication between public leaders and community organizations characterized by understanding, trust, respect, legitimacy, and agency. As artificial intelligence (AI) technologies become increasingly integrated into civic contexts, they offer promising capabilities to streamline resource-intensive workflows, reveal new insights in community feedback, translate complex information into accessible formats, and facilitate reflection across social divides. However, these same systems risk undermining democratic processes through accuracy issues, transparency gaps, bias amplification, and threats to human agency. In this paper, we examine how human-AI collaboration might address these risks and transform civic communication dynamics by identifying key communication pathways and proposing design considerations that maintain a high level of control over decision-making for both public leaders and communities while leveraging computer automation. By thoughtfully integrating AI to amplify human connection and understanding while safeguarding agency, community engagement processes can utilize AI to promote more constructive communication in democratic governance.

Authors:Pascal Jansen
Title: Human-in-the-Loop Optimization for Inclusive Design: Balancing Automation and Designer Expertise
Abstract:
Accessible and inclusive design has gained increased attention in HCI, yet practical implementation remains challenging due to resource-intensive prototyping methods. Traditional approaches such as workshops, A-B tests, and co-design sessions struggle to capture the diverse and complex needs of users with disabilities at scale. This position paper argues for an automated, accessible Human-in-the-Loop (HITL) design optimization process that shifts the designer's role from directly crafting prototypes to curating constraints for algorithmic exploration. By pre-constraining the design space based on specific user interaction needs, integrating adaptive multi-modal feedback channels, and personalizing feedback prompts, the HITL approach could efficiently refine design parameters, such as text size, color contrast, layout, and interaction modalities, to achieve optimal accessibility. This approach promises scalable, individualized design solutions while raising critical questions about constraint curation, transparency, user agency, and ethical considerations, making it essential to discuss and refine these ideas collaboratively at the workshop.

Authors:Jürgen Bernard
Title: The Human-Data-Model Interaction Canvas for Visual Analytics
Abstract:
Visual Analytics (VA) integrates humans, data, and models as key actors in insight generation and data-driven decision-making. This position paper values and reflects on 16 VA process models and frameworks and makes nine high-level observations that motivate a fresh perspective on VA. The contribution is the HDMI Canvas, a perspective to VA that complements the strengths of existing VA process models and frameworks. It systematically characterizes diverse roles of humans, data, and models, and how these actors benefit from and contribute to VA processes. The descriptive power of the HDMI Canvas eases the differentiation between a series of VA building blocks, rather than describing general VA principles only. The canvas includes modern human-centered methodologies, including human knowledge externalization and forms of feedback loops, while interpretable and explainable AI highlight model contributions beyond their conventional outputs. The HDMI Canvas has generative power, guiding the design of new VA processes and is optimized for external stakeholders, improving VA outreach, interdisciplinary collaboration, and user-centered design. The utility of the HDMI Canvas is demonstrated through two preliminary case studies.

Authors:Suyeon Choi
Title: R-CAGE: A Structural Model for Emotion Output Design in Human-AI Interaction
Abstract:
This paper presents R-CAGE (Rhythmic Control Architecture for Guarding Ego), a theoretical framework for restructuring emotional output in long-term human-AI interaction. While prior affective computing approaches emphasized expressiveness, immersion, and responsiveness, they often neglected the cognitive and structural consequences of repeated emotional engagement. R-CAGE instead conceptualizes emotional output not as reactive expression but as ethical design structure requiring architectural intervention. The model is grounded in experiential observations of subtle affective symptoms such as localized head tension, interpretive fixation, and emotional lag arising from prolonged interaction with affective AI systems. These indicate a mismatch between system-driven emotion and user interpretation that cannot be fully explained by biometric data or observable behavior. R-CAGE adopts a user-centered stance prioritizing psychological recovery, interpretive autonomy, and identity continuity. The framework consists of four control blocks: (1) Control of Rhythmic Expression regulates output pacing to reduce fatigue; (2) Architecture of Sensory Structuring adjusts intensity and timing of affective stimuli; (3) Guarding of Cognitive Framing reduces semantic pressure to allow flexible interpretation; (4) Ego-Aligned Response Design supports self-reference recovery during interpretive lag. By structurally regulating emotional rhythm, sensory intensity, and interpretive affordances, R-CAGE frames emotion not as performative output but as sustainable design unit. The goal is to protect users from oversaturation and cognitive overload while sustaining long-term interpretive agency in AI-mediated environments.

Authors:Stefanos Gkikas
Title: A Pain Assessment Framework based on multimodal data and Deep Machine Learning methods
Abstract:
From the original abstract: This thesis initially aims to study the pain assessment process from a clinical-theoretical perspective while exploring and examining existing automatic approaches. Building on this foundation, the primary objective of this Ph.D. project is to develop innovative computational methods for automatic pain assessment that achieve high performance and are applicable in real clinical settings. A primary goal is to thoroughly investigate and assess significant factors, including demographic elements that impact pain perception, as recognized in pain research, through a computational standpoint. Within the limits of the available data in this research area, our goal was to design, develop, propose, and offer automatic pain assessment pipelines for unimodal and multimodal configurations that are applicable to the specific requirements of different scenarios. The studies published in this Ph.D. thesis showcased the effectiveness of the proposed methods, achieving state-of-the-art results. Additionally, they paved the way for exploring new approaches in artificial intelligence, foundation models, and generative artificial intelligence.

Authors:TianYi Yu
Title: Design description of Wisdom Computing Persperctive
Abstract:
This course design aims to develop and research a handwriting matrix recognition and step-by-step visual calculation process display system, addressing the issue of abstract formulas and complex calculation steps that students find difficult to understand when learning mathematics. By integrating artificial intelligence with visualization animation technology, the system enhances precise recognition of handwritten matrix content through the introduction of Mamba backbone networks, completes digital extraction and matrix reconstruction using the YOLO model, and simultaneously combines CoordAttention coordinate attention mechanisms to improve the accurate grasp of character spatial positions. The calculation process is demonstrated frame by frame through the Manim animation engine, vividly showcasing each mathematical calculation step, helping students intuitively understand the intrinsic logic of mathematical operations. Through dynamically generating animation processes for different computational tasks, the system exhibits high modularity and flexibility, capable of generating various mathematical operation examples in real-time according to student needs. By innovating human-computer interaction methods, it brings mathematical calculation processes to life, helping students bridge the gap between knowledge and understanding on a deeper level, ultimately achieving a learning experience where "every step is understood." The system's scalability and interactivity make it an intuitive, user-friendly, and efficient auxiliary tool in education.

Authors:Xiaoan Liu
Title: Augmenting Human Cognition through Everyday AR
Abstract:
As spatial computing and multimodal LLMs mature, AR is tending to become an intuitive "thinking tool," embedding semantic and context-aware intelligence directly into everyday environments. This paper explores how always-on AR can seamlessly bridge digital cognition and physical affordances, enabling proactive, context-sensitive interactions that enhance human task performance and understanding.

Authors:Marc Barthelemy
Title: Chess variation entropy and engine relevance for humans
Abstract:
Modern chess engines significantly outperform human players and are essential for evaluating positions and move quality. These engines assign a numerical evaluation $E$ to positions, indicating an advantage for either white or black, but similar evaluations can mask varying levels of move complexity. While some move sequences are straightforward, others demand near-perfect play, limiting the practical value of these evaluations for most players. To quantify this problem, we use entropy to measure the complexity of the principal variation (the sequence of best moves). Variations with forced moves have low entropy, while those with multiple viable alternatives have high entropy. Our results show that, except for experts, most human players struggle with high-entropy variations, especially when $|E|<100$ centipawns, which accounts for about $2/3$ of positions. This underscores the need for AI-generated evaluations to convey the complexity of underlying move sequences, as they often exceed typical human cognitive capabilities, reducing their practical utility.

Authors:Xule Lin
Title: Cognitio Emergens: Agency, Dimensions, and Dynamics in Human-AI Knowledge Co-Creation
Abstract:
Scientific knowledge creation is fundamentally transforming as humans and AI systems evolve beyond tool-user relationships into co-evolutionary epistemic partnerships. When AlphaFold revolutionized protein structure prediction, researchers described engaging with an epistemic partner that reshaped how they conceptualized fundamental relationships. This article introduces Cognitio Emergens (CE), a framework addressing critical limitations in existing models that focus on static roles or narrow metrics while failing to capture how scientific understanding emerges through recursive human-AI interaction over time. CE integrates three components addressing these limitations: Agency Configurations describing how authority distributes between humans and AI (Directed, Contributory, Partnership), with partnerships dynamically oscillating between configurations rather than following linear progression; Epistemic Dimensions capturing six specific capabilities emerging through collaboration across Discovery, Integration, and Projection axes, creating distinctive "capability signatures" that guide development; and Partnership Dynamics identifying forces shaping how these relationships evolve, particularly the risk of epistemic alienation where researchers lose interpretive control over knowledge they formally endorse. Drawing from autopoiesis theory, social systems theory, and organizational modularity, CE reveals how knowledge co-creation emerges through continuous negotiation of roles, values, and organizational structures. By reconceptualizing human-AI scientific collaboration as fundamentally co-evolutionary, CE offers a balanced perspective that neither uncritically celebrates nor unnecessarily fears AI's evolving role, instead providing conceptual tools for cultivating partnerships that maintain meaningful human participation while enabling transformative scientific breakthroughs.

Authors:Eric Easthope
Title: Coupling the Heart to Musical Machines
Abstract:
Biofeedback is being used more recently as a general control paradigm for human-computer interfaces (HCIs). While biofeedback especially from breath has seen increasing uptake as a controller for novel musical interfaces, new interfaces for musical expression (NIMEs), the community has not given as much attention to the heart. The heart is just as intimate a part of music as breath and it is argued that the heart determines our perception of time and so indirectly our perception of music. Inspired by this I demonstrate a photoplethysmogram (PPG)-based NIME controller using heart rate as a 1D control parameter to transform the qualities of sounds in real-time over a Bluetooth wireless HCI. I apply time scaling to "warp" audio buffers inbound to the sound card, and play these transformed audio buffers back to the listener wearing the PPG sensor, creating a hypothetical perceptual biofeedback loop: changes in sound change heart rate to change PPG measurements to change sound. I discuss how a sound-heart-PPG biofeedback loop possibly affords greater control and/or variety of movements with a 1D controller, how controlling the space and/or time scale of sound playback with biofeedback makes for possibilities in performance ambience, and I briefly discuss generative latent spaces as a possible way to extend a 1D PPG control space.

Authors:Simon Suh
Title: Investigating the Impact of Personalized AI Tutors on Language Learning Performance
Abstract:
Driven by the global shift towards online learning prompted by the COVID 19 pandemic, Artificial Intelligence has emerged as a pivotal player in the field of education. Intelligent Tutoring Systems offer a new method of personalized teaching, replacing the limitations of traditional teaching methods. However, concerns arise about the ability of AI tutors to address skill development and engagement during the learning process. In this paper, I will conduct a quasi experiment with paired sample t test on 34 students pre and post use of AI tutors in language learning platforms like Santa and Duolingo to examine the relationship between students engagement, academic performance, and students satisfaction during a personalized language learning experience.

Authors:Suyun Borjigin
Title: Triple-identity Authentication: The Future of Secure Access
Abstract:
In a typical authentication process, the local system verifies the user's identity using a stored hash value generated by a cross-system hash algorithm. This article shifts the research focus from traditional password encryption to the establishment of gatekeeping mechanisms for effective interactions between a system and the outside world. Here, we propose a triple-identity authentication system to achieve this goal. Specifically, this local system opens the inner structure of its hash algorithm to all user credentials, including the login name, login password, and authentication password. When a login credential is entered, the local system hashes it and then creates a unique identifier using intermediate hash elements randomly selected from the open algorithm. Importantly, this locally generated unique identifier (rather than the stored hash produced by the open algorithm) is utilized to verify the user's combined identity, which is generated by combining the entered credential with the International Mobile Equipment Identity and the International Mobile Subscriber Identity. The verification process is implemented at each interaction point: the login name field, the login password field, and the server's authentication point. Thus, within the context of this triple-identity authentication system, we establish a robust gatekeeping mechanism for system interactions, ultimately providing a level of security that is equivalent to multi-factor authentication.

Authors:Amir Hossein Khazaei
Title: Speculative Evolution Through 3D Cellular Automata
Abstract:
This project explores speculative evolution through a 3D implementation of Conway's Game of Life, using procedural simulation to generate unfamiliar extraterrestrial organic forms. By applying a volumetric optimized workflow, the raw cellular structures are smoothed into unified, bone-like geometries that resemble hypothetical non-terrestrial morphologies. The resulting forms, strange yet organic, are 3D printed as fossil-like artifacts, presenting a tangible representation of generative structures. This process situates the work at the intersection of artificial life, evolutionary modeling, and digital fabrication, illustrating how simple rules can simulate complex biological emergence and challenge conventional notions of organic form.

Authors:Zeynep Engin
Title: Human-AI Governance (HAIG): A Trust-Utility Approach
Abstract:
This paper introduces the HAIG framework for analysing trust dynamics across evolving human-AI relationships. Current categorical frameworks (e.g., "human-in-the-loop" models) inadequately capture how AI systems evolve from tools to partners, particularly as foundation models demonstrate emergent capabilities and multi-agent systems exhibit autonomous goal-setting behaviours. As systems advance, agency redistributes in complex patterns that are better represented as positions along continua rather than discrete categories, though progression may include both gradual shifts and significant step changes. The HAIG framework operates across three levels: dimensions (Decision Authority Distribution, Process Autonomy, and Accountability Configuration), continua (gradual shifts along each dimension), and thresholds (critical points requiring governance adaptation). Unlike risk-based or principle-based approaches, HAIG adopts a trust-utility orientation, focusing on maintaining appropriate trust relationships that maximise utility while ensuring sufficient safeguards. Our analysis reveals how technical advances in self-supervision, reasoning authority, and distributed decision-making drive non-uniform trust evolution across both contextual variation and technological advancement. Case studies in healthcare and European regulation demonstrate how HAIG complements existing frameworks while offering a foundation for alternative approaches that anticipate governance challenges before they emerge.

Authors:Tessa De La Fuente
Title: Photoshop Batch Rendering Using Actions for Stylistic Video Editing
Abstract:
My project looks at an efficient workflow for creative image/video editing using Adobe Photoshop Actions tool and Batch Processing System. This innovative approach to video editing through Photoshop creates a fundamental shift to creative workflow management through the integration of industry-leading image manipulation with video editing techniques. Through systematic automation of Actions, users can achieve a simple and consistent application of visual edits across a string of images. This approach provides an alternative method to optimize productivity while ensuring uniform results across image collections through a post-processing pipeline.

Authors:Nik Aberle
Title: Destructive Interference: Encoding Loss in the Overlap
Abstract:
Destructive Interference is a data visualization installation that representing the deaths and injuries caused by mass shootings in 2024 in the United States. I parametrically designed and fabricated an interlocking ring sculpture for each month of 2024; where the overall height corresponds to the level of violence in that month. Taller forms mark the deadliest months, while shorter ones reflect fewer casualties. Each inner ring encodes the number of people killed or injured, and each outer ring encodes the number of shootings and the number of days without them. The interlocking cylinders are powered via a motor to rotate, and lit from within. As the cylinders rotate, they cast overlapping shadows that represent those killed or injured by mass shootings. The goal of this work is to visualize otherwise overwhelming and disparate statistics in a way that is both physically present and emotionally resonant. By inviting viewers to step into and engage with these shadows, the piece creates space for reflection, conversation, and confrontation with the scale of this ongoing crisis.

Authors:Bhanuja Ainary
Title: Audo-Sight: Enabling Ambient Interaction For Blind And Visually Impaired Individuals
Abstract:
Visually impaired people face significant challenges when attempting to interact with and understand complex environments, and traditional assistive technologies often struggle to quickly provide necessary contextual understanding and interactive intelligence. This thesis presents Audo-Sight, a state-of-the-art assistive system that seamlessly integrates Multimodal Large Language Models (MLLMs) to provide expedient, context-aware interactions for Blind and Visually Impaired (BVI) individuals. The system operates in two different modalities: personalized interaction through user identification and public access in common spaces like museums and shopping malls. In tailored environments, the system adjusts its output to conform to the preferences of individual users, thus enhancing accessibility through a user-aware form of interaction. In shared environments, Audo-Sight employs a shared architecture that adapts to its current user with no manual reconfiguration required. To facilitate appropriate interactions with the LLM, the public Audo-Sight solution includes an Age-Range Determiner and Safe Query Filter. Additionally, the system ensures that responses are respectful to BVI users through NeMo Guardrails. By utilizing multimodal reasoning, BVI-cognizant response editing, and safeguarding features, this work represents a major leap in AI-driven accessibility technology capable of increasing autonomy, safety, and interaction for people with visual impairments in social settings. Finally, we present the integration of Audo-Sight and SmartSight, which enables enhanced situational awareness for BVI individuals. This integration takes advantage of the real-time visual analysis of SmartSight, combined with the extensive reasoning and interactive capabilities of Audo-Sight, and goes beyond object identification to provide context-driven, voice-controlled assistance in dynamic environments.

Authors:Shengqian Wang
Title: TheraQuest: A Gamified, LLM-Powered Simulation for Massage Therapy Training
Abstract:
Massage therapy training emphasizes hands-on techniques and effective therapist--patient communication. However, many educational programs struggle to provide realistic practice scenarios. To address this problem, we propose TheraQuest, a gamified, web-based simulation platform that employs large language models (LLMs) to generate diverse virtual patients with varying symptoms and cultural backgrounds. Through interactive dialogue, anatomical decision-making, and immediate assessment, trainees develop both diagnostic reasoning and empathetic communication skills in a low-risk environment. Unlike exclusively VR-based solutions, TheraQuest remains accessible via standard web browsers, mitigating the cost and discomfort associated with extended headset use. Preliminary testing suggests that integrating LLM-driven virtual patients with real-time skill metrics can enhance trainee engagement and help bridge the gap between theoretical knowledge and clinical proficiency.

Authors:Shou-Tzu Han
Title: Narrative-Centered Emotional Reflection: Scaffolding Autonomous Emotional Literacy with AI
Abstract:
Reflexion is an AI-powered platform designed to enable structured emotional self-reflection at scale. By integrating real-time emotion detection, layered reflective prompting, and metaphorical storytelling generation, Reflexion empowers users to engage in autonomous emotional exploration beyond basic sentiment categorization. Grounded in theories of expressive writing, cognitive restructuring, self-determination, and critical consciousness development, the system scaffolds a progressive journey from surface-level emotional recognition toward value-aligned action planning. Initial pilot studies with diverse participants demonstrate positive outcomes in emotional articulation, cognitive reframing, and perceived psychological resilience. Reflexion represents a promising direction for scalable, theory-informed affective computing interventions aimed at fostering emotional literacy and psychological growth across educational, therapeutic, and public health contexts.

Authors:David Almog
Title: AI Recommendations and Non-instrumental Image Concerns
Abstract:
There is growing enthusiasm about the potential for humans and AI to collaborate by leveraging their respective strengths. Yet in practice, this promise often falls short. This paper uses an online experiment to identify non-instrumental image concerns as a key reason individuals underutilize AI recommendations. I show that concerns about how one is perceived, even when those perceptions carry no monetary consequences, lead participants to disregard AI advice and reduce task performance.

Authors:Jessica Backus
Title: Investigating the Prominence and Severity of Bugs and Glitches Within Games and Their Effects on Player Experience
Abstract:
Different errors that occur in video games are often referred to as glitches or bugs. The goal of this exploratory research is to understand how these glitches and bugs within video games affect a players experience. To do this, I reviewed relevant literature and performed observations of these different errors in different games via Twitch livestreams. I then performed thematic analysis with the observation data and generated themes that tie back into to the relevant literature. Most of the current literature focuses on the what and how behind bugs in games, but very little on the implications of these bugs on the overall experience for the players, and what patterns of behavior may emerge because of them.

Authors:Siân Brooke
Title: Clones in the Machine: A Feminist Critique of Agency in Digital Cloning
Abstract:
This paper critiques digital cloning in academic research, highlighting how it exemplifies AI solutionism. Digital clones, which replicate user data to simulate behavior, are often seen as scalable tools for behavioral insights. However, this framing obscures ethical concerns around consent, agency, and representation. Drawing on feminist theories of agency, the paper argues that digital cloning oversimplifies human complexity and risks perpetuating systemic biases. To address these issues, it proposes decentralized data repositories and dynamic consent models, promoting ethical, context-aware AI practices that challenge the reductionist logic of AI solutionism

Authors:Nick von Felten
Title: Beyond Isolation: Towards an Interactionist Perspective on Human Cognitive Bias and AI Bias
Abstract:
Isolated perspectives have often paved the way for great scientific discoveries. However, many breakthroughs only emerged when moving away from singular views towards interactions. Discussions on Artificial Intelligence (AI) typically treat human and AI bias as distinct challenges, leaving their dynamic interplay and compounding potential largely unexplored. Recent research suggests that biased AI can amplify human cognitive biases, while well-calibrated systems might help mitigate them. In this position paper, I advocate for transcending beyond separate treatment of human and AI biases and instead focus on their interaction effects. I argue that a comprehensive framework, one that maps (compound human-AI) biases to mitigation strategies, is essential for understanding and protecting human cognition, and I outline concrete steps for its development.

Authors:Shamal Faily
Title: "Shifting Access Control Left" using Asset and Goal Models
Abstract:
Access control needs have broad design implications, but access control specifications may be elicited before, during, or after these needs are captured. Because access control knowledge is distributed, we need to make knowledge asymmetries more transparent, and use expertise already available to stakeholders. In this paper, we present a tool-supported technique identifying knowledge asymmetries around access control based on asset and goal models. Using simple and conventional modelling languages that complement different design techniques, we provide boundary objects to make access control transparent, thereby making knowledge about access control concerns more symmetric. We illustrate this technique using a case study example considering the suitability of a reusable software component in a new military air system.

Authors:Sunday David Ubur
Title: Augmenting Captions with Emotional Cues: An AR Interface for Real-Time Accessible Communication
Abstract:
This paper introduces an augmented reality (AR) captioning framework designed to support Deaf and Hard of Hearing (DHH) learners in STEM classrooms by integrating non-verbal emotional cues into live transcriptions. Unlike conventional captioning systems that offer only plain text, our system fuses real-time speech recognition with affective and visual signal interpretation, including facial movements, gestures, and vocal tone, to produce emotionally enriched captions. These enhanced captions are rendered in an AR interface developed with Unity and provide contextual annotations such as speaker tone markers (e.g., "concerned") and gesture indicators (e.g., "nods"). The system leverages live camera and microphone input, processed through AI models to detect multimodal cues. Findings from preliminary evaluations suggest that this AR-based captioning approach significantly enhances comprehension and reduces cognitive effort compared to standard captions. Our work emphasizes the potential of immersive environments for inclusive, emotion-aware educational accessibility.

Authors:Robert Kaufman
Title: Improving Human-Autonomous Vehicle Interaction in Complex Systems
Abstract:
Unresolved questions about how autonomous vehicles (AVs) should meet the informational needs of riders hinder real-world adoption. Complicating our ability to satisfy rider needs is that different people, goals, and driving contexts have different criteria for what constitutes interaction success. Unfortunately, most human-AV research and design today treats all people and situations uniformly. It is crucial to understand how an AV should communicate to meet rider needs, and how communications should change when the human-AV complex system changes. I argue that understanding the relationships between different aspects of the human-AV system can help us build improved and adaptable AV communications. I support this argument using three empirical studies. First, I identify optimal communication strategies that enhance driving performance, confidence, and trust for learning in extreme driving environments. Findings highlight the need for task-sensitive, modality-appropriate communications tuned to learner cognitive limits and goals. Next, I highlight the consequences of deploying faulty communication systems and demonstrate the need for context-sensitive communications. Third, I use machine learning (ML) to illuminate personal factors predicting trust in AVs, emphasizing the importance of tailoring designs to individual traits and concerns. Together, this dissertation supports the necessity of transparent, adaptable, and personalized AV systems that cater to individual needs, goals, and contextual demands. By considering the complex system within which human-AV interactions occur, we can deliver valuable insights for designers, researchers, and policymakers. This dissertation also provides a concrete domain to study theories of human-machine joint action and situational awareness, and can be used to guide future human-AI interaction research. [shortened for arxiv]

Authors:Rhayza Jolley Rangel
Title: Nurturing Language Proficiency in Spanish.speaking children Through digital competence
Abstract:
This article explores into the intricate design and meticulous construction of a digital platform aimed at revolutionizing early-age English education, particularly for Spanish-speaking children. The focus of this work used an innovative methodologies, vibrant and engaging visuals, and a comprehensive approach to phonics. The principles of usability, accessibility, and user-centered design are intricately woven into every facet of the platform's architecture.

Authors:Chaeyeon Lim
Title: DeBiasMe: De-biasing Human-AI Interactions with Metacognitive AIED (AI in Education) Interventions
Abstract:
While generative artificial intelligence (Gen AI) increasingly transforms academic environments, a critical gap exists in understanding and mitigating human biases in AI interactions, such as anchoring and confirmation bias. This position paper advocates for metacognitive AI literacy interventions to help university students critically engage with AI and address biases across the Human-AI interaction workflows. The paper presents the importance of considering (1) metacognitive support with deliberate friction focusing on human bias; (2) bi-directional Human-AI interaction intervention addressing both input formulation and output interpretation; and (3) adaptive scaffolding that responds to diverse user engagement patterns. These frameworks are illustrated through ongoing work on "DeBiasMe," AIED (AI in Education) interventions designed to enhance awareness of cognitive biases while empowering user agency in AI interactions. The paper invites multiple stakeholders to engage in discussions on design and evaluation methods for scaffolding mechanisms, bias visualization, and analysis frameworks. This position contributes to the emerging field of AI-augmented learning by emphasizing the critical role of metacognition in helping students navigate the complex interaction between human, statistical, and systemic biases in AI use while highlighting how cognitive adaptation to AI systems must be explicitly integrated into comprehensive AI literacy frameworks.

Authors:Mahrad Almotahari
Title: Cooperative Speech, Semantic Competence, and AI
Abstract:
Cooperative speech is purposive. From the speaker's perspective, one crucial purpose is the transmission of knowledge. Cooperative speakers care about getting things right for their conversational partners. This attitude is a kind of respect. Cooperative speech is an ideal form of communication because participants have respect for each other. And having respect within a cooperative enterprise is sufficient for a particular kind of moral standing: we ought to respect those who have respect for us. Respect demands reciprocity. I maintain that large language models aren't owed the kind of respect that partly constitutes a cooperative conversation. This implies that they aren't cooperative interlocutors, otherwise we would be obliged to reciprocate the attitude. Leveraging this conclusion, I argue that present-day LLMs are incapable of assertion and that this raises an overlooked doubt about their semantic competence. One upshot of this argument is that knowledge of meaning isn't just a subject for the cognitive psychologist. It's also a subject for the moral psychologist.

Authors:Baichuan Zeng
Title: Recent Advances and Future Directions in Extended Reality (XR): Exploring AI-Powered Spatial Intelligence
Abstract:
Extended Reality (XR), encompassing Augmented Reality (AR), Virtual Reality (VR) and Mixed Reality (MR), is a transformative technology bridging the physical and virtual world and it has diverse potential which will be ubiquitous in the future. This review examines XR's evolution through foundational framework - hardware ranging from monitors to sensors and software ranging from visual tasks to user interface; highlights state of the art (SOTA) XR products with the comparison and analysis of performance based on their foundational framework; discusses how commercial XR devices can support the demand of high-quality performance focusing on spatial intelligence. For future directions, attention should be given to the integration of multi-modal AI and IoT-driven digital twins to enable adaptive XR systems. With the concept of spatial intelligence, future XR should establish a new digital space with realistic experience that benefits humanity. This review underscores the pivotal role of AI in unlocking XR as the next frontier in human-computer interaction.

Authors:Jessica Backus
Title: Players' Perception of Bugs and Glitches in Video Games: An Exploratory Study
Abstract:
The goal of this exploratory research is to investigate how glitches and bugs within video games affect a players overall experience. The severity or frequency of bugs, as well as the nature of the bugs present, could influence how the players perceive these interactions. Another factor is the players personality because this will affect their motivations for playing certain games as well as how they react to bugs within these games. Glitches and bugs are framed as a negative aspect within games, but create the potential for enjoyable experiences, despite being unexpected. To explore this hypothesis, I observed some glitches within recorded gameplay via YouTube and Twitch livestream VODs and analyzed the streamers reaction, as well as the audiences. I also conducted semi-structured interviews with gamers with the goal of learning more about that players personality and attitudes towards bugs in the games they play. I concluded that the types of bugs matter less to the players than how frequently they occur, the context they occur, and the outcome of them.

Authors:Luis Morales-Navarro
Title: Investigating Youth's Technical and Ethical Understanding of Generative Language Models When Engaging in Construction and Deconstruction Activities
Abstract:
The widespread adoption of generative artificial intelligence/machine learning (AI/ML) technologies has increased the need to support youth in developing AI/ML literacies. However, most work has centered on preparing young people to use these systems, with less attention to how they can participate in designing and evaluating them. This study investigates how engaging young people in the design and auditing of generative language models (GLMs) may foster the development of their understanding of how these systems work from both technical and ethical perspectives. The study takes an in-pieces approach to investigate novices' conceptions of GLMs. Such an approach supports the analysis of how technical and ethical conceptions evolve and relate to each other. I am currently conducting a series of participatory design workshops with sixteen ninth graders (ages 14-15) in which they will (a) build GLMs from a data-driven perspective that glassboxes how data shapes model performance and (b) audit commercial GLMs by repeatedly and systematically querying them to draw inferences about their behaviors. I will analyze participants' interactions to identify ethical and technical conceptions they may exhibit while designing and auditing GLMs. I will also conduct clinical interviews and use microgenetic knowledge analysis and ordered network analysis to investigate how participants' ethical and technical conceptions of GLMs relate to each other and change after the workshop. The study will contribute (a) evidence of how engaging youth in design and auditing activities may support the development of ethical and technical understanding of GLMs and (b) an inventory of novice design and auditing practices that may support youth's technical and ethical understanding of GLMs.

Authors:Karen Abe
Title: Bridging Generations: Augmented Reality for Japanese Wartime Oral History
Abstract:
In this position paper, the author presents a process artifact that aims to serve as an archival and educational tool that revitalizes World War II oral histories in Japan. First, the author introduces the historical background and how the work is informed by the positionality of the author. Then, the author presents features of the artifact using references to interview footage of the author's grandmother and grandaunt sharing their firsthand accounts of the 1945 Tokyo Air Raids. The affordances and barriers of this application of augmented reality is discussed and a included is a list of questions to be posed at the workshop.

Authors:Thomas Weber
Title: Explainability for Embedding AI: Aspirations and Actuality
Abstract:
With artificial intelligence (AI) embedded in many everyday software systems, effectively and reliably developing and maintaining AI systems becomes an essential skill for software developers. However, the complexity inherent to AI poses new challenges. Explainable AI (XAI) may allow developers to understand better the systems they build, which, in turn, can help with tasks like debugging. In this paper, we report insights from a series of surveys with software developers that highlight that there is indeed an increased need for explanatory tools to support developers in creating AI systems. However, the feedback also indicates that existing XAI systems still fall short of this aspiration. Thus, we see an unmet need to provide developers with adequate support mechanisms to cope with this complexity so they can embed AI into high-quality software in the future.

Authors:Nir Cafri
Title: Dynamic Difficulty Adjustment With Brain Waves as a Tool for Optimizing Engagement
Abstract:
This study explores the use of electroencephalography (EEG)-based brain wave monitoring to enable dynamic difficulty adjustment (DDA) in a virtual reality (VR) gaming environment. Using the Task Engagement Index (TEI) derived from frontal EEG electrodes, we adapt game challenge levels in real time to maintain optimal player engagement. In a within-subject design with six participants, we found that the DDA condition significantly increased engagement duration by 19.79% compared to a non-DDA control condition. These results suggest that combining EEG, DDA, and VR technologies can enhance user experience and has potential applications in adaptive learning, rehabilitation, and personalized interfaces.

Authors:Jacob Tjaden
Title: The Balancing Act of Policies in Developing Machine Learning Explanations
Abstract:
Machine learning models are often criticized as opaque from a lack of transparency in their decision-making process. This study examines how policy design impacts the quality of explanations in ML models. We conducted a classroom experiment with 124 participants and analyzed the effects of policy length and purpose on developer compliance with policy requirements. Our results indicate that while policy length affects engagement with some requirements, policy purpose has no effect, and explanation quality is generally poor. These findings highlight the challenge of effective policy development and the importance of addressing diverse stakeholder perspectives within explanations.

Authors:Li Song
Title: LLM-Driven NPCs: Cross-Platform Dialogue System for Games and Social Platforms
Abstract:
NPCs in traditional games are often limited by static dialogue trees and a single platform for interaction. To overcome these constraints, this study presents a prototype system that enables large language model (LLM)-powered NPCs to communicate with players both in the game en vironment (Unity) and on a social platform (Discord). Dialogue logs are stored in a cloud database (LeanCloud), allowing the system to synchronize memory between platforms and keep conversa tions coherent. Our initial experiments show that cross-platform interaction is technically feasible and suggest a solid foundation for future developments such as emotional modeling and persistent memory support.

Authors:Marco Reidelbach
Title: MaRDMO: Future Gateway to FAIR Mathematical Data
Abstract:
Mathematical research data plays a crucial role across scientific disciplines, yet its documentation and dissemination remain challenging due to the lack of standardized research data management practices. The MaRDMO Plugin addresses these challenges by integrating mathematical models, algorithms, and interdisciplinary workflows into the established framework of the Research Data Management Organiser (RDMO). Built on FAIR principles, MaRDMO enables structured documentation and retrieval of mathematical research data through guided questionnaires. It connects to multiple knowledge graphs, including MathModDB, MathAlgoDB, and the MaRDI Portal. Users can document and search for models, algorithms, and workflows via dynamic selection interfaces that also leverage other sources such as Wikidata. The plugin facilitates the export to the individual MaRDI services, ensuring data quality through automated validation. By embedding mathematical research data management into the widely adopted RDMO platform, MaRDMO represents a significant step toward making mathematical research data more findable, accessible, and reusable.

Authors:Clara Sayffaerth
Title: Educational Twin: The Influence of Artificial XR Expert Duplicates on Future Learning
Abstract:
Currently, it is impossible for educators to be in multiple places simultaneously and teach each student individually. Technologies such as Extended Reality (XR) and Artificial Intelligence (AI) enable the creation of realistic educational copies of experts that preserve not only visual and mental characteristics but also social aspects crucial for learning. However, research in this area is limited, which opens new questions for future work. This paper discusses how these human digital twins can potentially improve aspects like scalability, engagement, and preservation of social learning factors. While this technology offers benefits, it also introduces challenges related to educator autonomy, social interaction shifts, and ethical considerations such as privacy, bias, and identity preservation. We outline key research questions that need to be addressed to ensure that human digital twins enhance the social aspects of education instead of harming them.

Authors:Manuele Veggi
Title: State of the Art on Artificial Intelligence Resources for Interaction Media Design in Digital Cultural Heritage
Abstract:
This paper explores the integration of Artificial Intelligence (AI) in the design of interactive experiences for Cultural Heritage (CH). Previous studies indeed either miss to represent the specificity of the CH or mention possible tools without making a clear reference to a structured Interaction Design (IxD) workflow. The study also attempts to overcome one of the major limitations of traditional literature review, which may fail to capture proprietary tools whose release is rarely accompanied by academic publications. Besides the analysis of previous research, the study proposes a possible workflow for IxD in CH, subdivided into phases and tasks: for each of them, this paper proposes possible AI-based tools that can support the activity of designers, curators, and CH professionals. The review concludes with a final section outlining future paths for research and development in this domain.

Authors:Harish Vijayakumar
Title: User Satisfaction -- UX Design Strategies for Seamless Virtual Experience
Abstract:
User Experience (UX) in virtual worlds is a fast-developing discipline that requires creative design concepts to overcome the divide between physical and virtual interaction. This research investigates primary principles and techniques to improve UX in virtual experiences based on usability, accessibility, user engagement, and technology advancements. It gives detailed insight into trends, issues, and prospects for UX design of virtual applications that guarantee an efficient, easy-to-use, and immersive experience.

Authors:Stefan Pietrusky
Title: Learning by gaming, coding and making with EDUMING: A new approach to utilising atypical digital games for learning
Abstract:
Papert's constructionism makes it clear that learning is particularly effective when learners create tangible artifacts and share and discuss them in social contexts. Technological progress in recent decades has created numerous opportunities for learners to not only passively consume media, but to actively shape it through construction. This article uses the EDUMING concept to present a new method to simplify the development of digital learning games and thus support their integration into learning situations. A key difference between the concept and established ideas such as game-based learning, gamification, serious games, etc. is that games are not closed and are consumed passively, but can also be actively developed by users individually by modifying the source code with the help of an IDE. As part of an empirical study, the usability of the game "Professor Chip's Learning Quest" (PCLQ) is recorded, as well as previous experience with digital learning games and the acceptance and motivation to use new technologies. The purpose of this article is to test the PCLQ digital learning game, developed according to the EDUMING concept, as part of an exploratory study regarding its usability, acceptance and suitability for use in schools. The study is intended as a first empirical approach to practical testing of the concept.

Authors:Augusto Ciuffoletti
Title: Designing a Geo-Tourism App: A Principled Approach
Abstract:
Walking along trails in natural areas is a rewarding experience, but visitors sometimes need proper assistance to enhance their enjoyment, maximize learning, and ensure safety. Over the years, various signage techniques have been introduced, but today, the widespread use of smartphones offers new opportunities for visitor support. In this paper, we outline the key principles for designing an Android app tailored for geotourists. Our approach begins by defining user personas and deriving app requirements based on their needs. We then present a proof of concept that addresses the critical aspects identified during the design process.

Authors:John R. Kitchin
Title: The Evolving Role of Programming and LLMs in the Development of Self-Driving Laboratories
Abstract:
Machine learning and automation are transforming scientific research, yet the implementation of self-driving laboratories (SDLs) remains costly and complex, and it remains difficult to learn how to use these facilities. To address this, we introduce Claude-Light, a lightweight, remotely accessible instrument designed for prototyping automation algorithms and machine learning workflows. Claude-Light integrates a REST API, a Raspberry Pi-based control system, and an RGB LED with a photometer that measures ten spectral outputs, providing a controlled but realistic experimental environment. This device enables users to explore automation at multiple levels, from basic programming and experimental design to machine learning-driven optimization. We demonstrate the application of Claude-Light in structured automation approaches, including traditional scripting, statistical design of experiments, and active learning methods. Additionally, we explore the role of large language models (LLMs) in laboratory automation, highlighting their use in instrument selection, structured data extraction, function calling, and code generation. While LLMs present new opportunities for streamlining automation, they also introduce challenges related to reproducibility, security, and reliability. We discuss strategies to mitigate these risks while leveraging LLMs for enhanced efficiency in self-driving laboratories. Claude-Light provides a practical and accessible platform for students and researchers to develop automation skills and test algorithms before deploying them in larger-scale SDLs. By lowering the barrier to entry for automation in scientific research, this tool facilitates broader adoption of AI-driven experimentation and fosters innovation in autonomous laboratories.

Authors:Felix Haag
Title: The Effect of Explainable AI-based Decision Support on Human Task Performance: A Meta-Analysis
Abstract:
The desirable properties of explanations in information systems have fueled the demands for transparency in artificial intelligence (AI) outputs. To address these demands, the field of explainable AI (XAI) has put forth methods that can support human decision-making by explaining AI outputs. However, current empirical works present inconsistent findings on whether such explanations help to improve users' task performance in decision support systems (DSS). In this paper, we conduct a meta-analysis to explore how XAI affects human performance in classification tasks. Our results show an improvement in task performance through XAI-based decision support, though explanations themselves are not the decisive driver for this improvement. The analysis reveals that the studies' risk of bias moderates the effect of explanations in AI, while the explanation type appears to play only a negligible role. Our findings contribute to the human computer interaction field by enhancing the understanding of human-XAI collaboration in DSS.

Authors:Diego Saldivar
Title: Gamification as a Data Acquisition Strategy for Neurogames
Abstract:
The nascent field of neurogames relies on active Brain-Computer Interface input to drive its game mechanics. Consequently, users expect their conscious will to be meaningfully reflected on the virtual environment they're engaging in. Additionally, the videogame industry considers it paramount to provide gamers with seamless experiences to avoid disrupting their state of flow. Thus, this paper suggests gamification as a strategy to camouflage the often fatiguing data acquisition process in Machine Learning from neurodata so that neurogamers can further immerse themselves in the virtual experience while Artificial Intelligence models benefit from data taken in reproducible contexts.

Authors:Puneet Jain
Title: Assistive XR research for disability at ACM ASSETS: A Scoping Review
Abstract:
Despite the rise in affordable eXtended Reality (XR) technologies, accessibility still remains a key concern, often excluding people with disabilities from accessing these immersive XR platforms. Consequently, there has been a notable surge in HCI research on creating accessible XR solutions (also known as, assistive XR). This increased focus in assistive XR research is also reflected in the number of research and innovative solutions submitted at the ACM Conference on Accessible Computing (ASSETS), with an aim to make XR experiences inclusive for disabled communities. However, till date, there is little to no work that provides a comprehensive overview of state-of-the-art research in assistive XR for disability at ACM ASSETS, a premier conference dedicated for research in HCI for people with disabilities. This study aims to fill this research gap by conducting a scoping review of literature delineating the key focus areas, research methods, statistical and temporal trends in XR research for disability at ACM ASSETS (2019-2023). From a pool of 1595 articles submitted to ASSETS, 26 articles are identified that specifically focus on XR research for disability. Through a detailed analysis, 6 key focus areas of XR research explored at ACM ASSETS are identified and a detailed examination of each is provided. Additionally, an overview of multiple research methods employed for XR research at ASSETS is also presented. Lastly, this work reports on the statistics and temporal trends regarding the number of publications, XR technologies used, disabilities addressed, and methodologies adopted for assistive XR research at ASSETS, highlighting emerging trends and possible future research directions.

Authors:Zhe Liu
Title: Interview AI-ssistant: Designing for Real-Time Human-AI Collaboration in Interview Preparation and Execution
Abstract:
Recent advances in large language models (LLMs) offer unprecedented opportunities to enhance human-AI collaboration in qualitative research methods, including interviews. While interviews are highly valued for gathering deep, contextualized insights, interviewers often face significant cognitive challenges, such as real-time information processing, question adaptation, and rapport maintenance. My doctoral research introduces Interview AI-ssistant, a system designed for real-time interviewer-AI collaboration during both the preparation and execution phases. Through four interconnected studies, this research investigates the design of effective human-AI collaboration in interviewing contexts, beginning with a formative study of interviewers' needs, followed by a prototype development study focused on AI-assisted interview preparation, an experimental evaluation of real-time AI assistance during interviews, and a field study deploying the system in a real-world research setting. Beyond informing practical implementations of intelligent interview support systems, this work contributes to the Intelligent User Interfaces (IUI) community by advancing the understanding of human-AI collaborative interfaces in complex social tasks and establishing design guidelines for AI-enhanced qualitative research tools.

Authors:Antonio Strippoli
Title: VoxLogicA UI: Supporting Declarative Medical Image Analysis
Abstract:
This Master's Thesis in Computer Science dives into the design and creation of a user-friendly interface for VoxLogicA, an image analysis tool using spatial model checking with a focus on neuroimaging. The research tackles the problem of existing tools being too complex, which makes them hard for medical professionals and researchers to use. By using spatial logic, the goal is to make these powerful analytical tools more practical and accessible in real-world clinical settings. The main objectives are to design a modern web interface that's easy to use, build it with the latest web technologies (e.g. Svelte and Niivue), and test its effectiveness through user studies and real-world case analyses.

Authors:Waaridh Borpujari
Title: Enhancing User Engagement in E-commerce through Dynamic Animations
Abstract:
The use of animation to gain user attention has been increasing, supported by various studies on user behavior and psychology. However, excessive use of animation in interfaces can negatively impact the user. This paper deals with a specific type of animation within a specialized domain of e-commerce. Drawing upon theories such as the Zeigarnik Effect, Aesthetic-Usability effect, Peak-End rule, and Hick's law, we analyze user behavior and psychology when exposed to a dynamic price-drop animation. Unlike conventional static pricing strategy, this animation introduces movement to signify price reduction. In our theoretical study approach, we evaluate and present a user study on how such an animation influences user perception, psychology, and attention. If acquired effectively, dynamic animations can enhance engagement, spark anticipation, and subconsciously create a positive experience by reducing cognitive load.

Authors:Julian Runge
Title: Experimentation in Gaming: an Adoption Guide
Abstract:
Experimentation is a cornerstone of successful game development and live operations, enabling teams to optimize player engagement, retention, and monetization. This article provides a comprehensive guide to implementing experimentation in gaming, structured around the game development lifecycle and the marketing mix. From pre-launch concept testing and prototyping to post-launch personalization and LiveOps, experimentation plays a pivotal role in driving innovation and adapting game experiences to diverse player preferences. Gaming presents unique challenges, such as highly engaged communities, complex interactive systems, and highly heterogeneous and evolving player behaviors, which require tailored approaches to experimentation. The article emphasizes the importance of collaborative frameworks across product, marketing, and analytics teams and provides practical guidance to game makers how to adopt experimentation successfully. It also addresses ethical considerations like fairness and player autonomy.

Authors:Anqi Shao
Title: Beyond Misinformation: A Conceptual Framework for Studying AI Hallucinations in (Science) Communication
Abstract:
This paper proposes a conceptual framework for understanding AI hallucinations as a distinct form of misinformation. While misinformation scholarship has traditionally focused on human intent, generative AI systems now produce false yet plausible outputs absent of such intent. I argue that these AI hallucinations should not be treated merely as technical failures but as communication phenomena with social consequences. Drawing on a supply-and-demand model and the concept of distributed agency, the framework outlines how hallucinations differ from human-generated misinformation in production, perception, and institutional response. I conclude by outlining a research agenda for communication scholars to investigate the emergence, dissemination, and audience reception of hallucinated content, with attention to macro (institutional), meso (group), and micro (individual) levels. This work urges communication researchers to rethink the boundaries of misinformation theory in light of probabilistic, non-human actors increasingly embedded in knowledge production.

Authors:Russell Beale
Title: Large Language Models Will Change The Way Children Think About Technology And Impact Every Interaction Paradigm
Abstract:
This paper presents a hopeful perspective on the potentially dramatic impacts of Large Language Models on how we children learn and how they will expect to interact with technology. We review the effects of LLMs on education so far, and make the case that these effects are minor compared to the upcoming changes that are occurring. We present a small scenario and self-ethnographic study demonstrating the effects of these changes, and define five significant considerations that interactive systems designers will have to accommodate in the future.

Authors:Sean Koon
Title: Creating 'Full-Stack' Hybrid Reasoning Systems that Prioritize and Enhance Human Intelligence
Abstract:
The idea of augmented or hybrid intelligence offers a compelling vision for combining human and AI capabilities, especially in tasks where human wisdom, expertise, or common sense are essential. Unfortunately, human reasoning can be flawed and shortsighted, resulting in adverse individual impacts or even long-term societal consequences. While strong efforts are being made to develop and optimize the AI aspect of hybrid reasoning, the real urgency lies in fostering wiser and more intelligent human participation. Tools that enhance critical thinking, ingenuity, expertise, and even wisdom could be essential in addressing the challenges of our emerging future. This paper proposes the development of generative AI-based tools that enhance both the human ability to reflect upon a problem as well as the ability to explore the technical aspects of it. A high-level model is also described for integrating AI and human capabilities in a way that centralizes human participation and control.

Authors:Dong Wang
Title: CPG-EVAL: A Multi-Tiered Benchmark for Evaluating the Chinese Pedagogical Grammar Competence of Large Language Models
Abstract:
Purpose: The rapid emergence of large language models (LLMs) such as ChatGPT has significantly impacted foreign language education, yet their pedagogical grammar competence remains under-assessed. This paper introduces CPG-EVAL, the first dedicated benchmark specifically designed to evaluate LLMs' knowledge of pedagogical grammar within the context of foreign language instruction. Methodology: The benchmark comprises five tasks designed to assess grammar recognition, fine-grained grammatical distinction, categorical discrimination, and resistance to linguistic interference. Findings: Smaller-scale models can succeed in single language instance tasks, but struggle with multiple instance tasks and interference from confusing instances. Larger-scale models show better resistance to interference but still have significant room for accuracy improvement. The evaluation indicates the need for better instructional alignment and more rigorous benchmarks, to effectively guide the deployment of LLMs in educational contexts. Value: This study offers the first specialized, theory-driven, multi-tiered benchmark framework for systematically evaluating LLMs' pedagogical grammar competence in Chinese language teaching contexts. CPG-EVAL not only provides empirical insights for educators, policymakers, and model developers to better gauge AI's current abilities in educational settings, but also lays the groundwork for future research on improving model alignment, enhancing educational suitability, and ensuring informed decision-making concerning LLM integration in foreign language instruction.

Authors:Rawan AlMakinah
Title: Factors That Influence the Adoption of AI-enabled Conversational Agents (AICAs) as an Augmenting Therapeutic Tool by Frontline Healthcare Workers: From Technology Acceptance Model 3 (TAM3) Lens -- A Systematic Mapping Review
Abstract:
Artificial intelligent (AI) conversational agents hold a promising future in the field of mental health, especially in helping marginalized communities that lack access to mental health support services. It is tempting to have a 24/7 mental health companion that can be accessed anywhere using mobile phones to provide therapist-like advice. Yet, caution should be taken, and studies around their feasibility need to be surveyed. Before adopting such a rapidly changing technology, studies on its feasibility should be explored, summarized, and synthesized to gain a solid understanding of the status quo and to enable us to build a framework that can guide us throughout the development and deployment processes. Different perspectives must be considered when investigating the feasibility of AI conversational agents, including the mental healthcare professional perspective. The literature can provide insights into their perspectives in terms of opportunities, concerns, and implications. Mental health professionals, the subject-matter experts in this field, have their points of view that should be understood and considered. This systematic literature review will explore mental health practitioners' attitudes toward AI conversational agents and the factors that affect their adoption and recommendation of the technology to augment their services and treatments. The TAM3 Framework will be the lens through which this systematic literature review will be conducted.

Authors:Michael T. Lopez
Title: Challenging the Eye-Mind Link Hypothesis: Visualizing Gazes For Each Programming Problem
Abstract:
This investigates the relationship between eye fixation patterns and performance in Java programming exercises using eye-tracking technology. Thirty-one students from a university in Metro Manila participated, and their eye movements were recorded while solving five Java programming exercises (three of the five exercises were picked). The fixation data were preprocessed and visualized using heatmap bin graphs, dividing the participants into correct and wrong answer groups. The Mann-Whitney U Test was employed to determine if there were significant differences in the fixation patterns between the two groups.

Authors:Maksim Vishnevskiy
Title: A Phenomenological Approach to Analyzing User Queries in IT Systems Using Heidegger's Fundamental Ontology
Abstract:
This paper presents a novel research analytical IT system grounded in Martin Heidegger's Fundamental Ontology, distinguishing between beings (das Seiende) and Being (das Sein). The system employs two modally distinct, descriptively complete languages: a categorical language of beings for processing user inputs and an existential language of Being for internal analysis. These languages are bridged via a phenomenological reduction module, enabling the system to analyze user queries (including questions, answers, and dialogues among IT specialists), identify recursive and self-referential structures, and provide actionable insights in categorical terms. Unlike contemporary systems limited to categorical analysis, this approach leverages Heidegger's phenomenological existential analysis to uncover deeper ontological patterns in query processing, aiding in resolving logical traps in complex interactions, such as metaphor usage in IT contexts. The path to full realization involves formalizing the language of Being by a research team based on Heidegger's Fundamental Ontology; given the existing completeness of the language of beings, this reduces the system's computability to completeness, paving the way for a universal query analysis tool. The paper presents the system's architecture, operational principles, technical implementation, use cases--including a case based on real IT specialist dialogues--comparative evaluation with existing tools, and its advantages and limitations.

Authors:Vicent Briva-Iglesias
Title: Are AI agents the new machine translation frontier? Challenges and opportunities of single- and multi-agent systems for multilingual digital communication
Abstract:
The rapid evolution of artificial intelligence (AI) has introduced AI agents as a disruptive paradigm across various industries, yet their application in machine translation (MT) remains underexplored. This paper describes and analyses the potential of single- and multi-agent systems for MT, reflecting on how they could enhance multilingual digital communication. While single-agent systems are well-suited for simpler translation tasks, multi-agent systems, which involve multiple specialized AI agents collaborating in a structured manner, may offer a promising solution for complex scenarios requiring high accuracy, domain-specific knowledge, and contextual awareness. To demonstrate the feasibility of multi-agent workflows in MT, we are conducting a pilot study in legal MT. The study employs a multi-agent system involving four specialized AI agents for (i) translation, (ii) adequacy review, (iii) fluency review, and (iv) final editing. Our findings suggest that multi-agent systems may have the potential to significantly improve domain-adaptability and contextual awareness, with superior translation quality to traditional MT or single-agent systems. This paper also sets the stage for future research into multi-agent applications in MT, integration into professional translation workflows, and shares a demo of the system analyzed in the paper.

Authors:Eileen McGivney
Title: Leveraging Agency in Virtual Reality to Enable Situated Learning
Abstract:
Learning is an active process that is deeply tied to physical and social contexts. Yet schools traditionally place learners in a passive role and focus on decontextualizing knowledge. Situating learning in more authentic tasks and contexts typically requires taking it outside the classroom via field trips and apprenticeships, but virtual reality (VR) is a promising tool to bring more authentically situated learning experiences into classrooms. In this position paper, I discuss how one of VR's primary affordances for learning is heightening agenct, and how such heightened agency can facilitate more authenticlaly situated learning by allowing learners legitimate peripheral participation.

Authors:Anna-Carolina Haensch
Title: "It Listens Better Than My Therapist": Exploring Social Media Discourse on LLMs as Mental Health Tool
Abstract:
The emergence of generative AI chatbots such as ChatGPT has prompted growing public and academic interest in their role as informal mental health support tools. While early rule-based systems have been around for several years, large language models (LLMs) offer new capabilities in conversational fluency, empathy simulation, and availability. This study explores how users engage with LLMs as mental health tools by analyzing over 10,000 TikTok comments from videos referencing LLMs as mental health tools. Using a self-developed tiered coding schema and supervised classification models, we identify user experiences, attitudes, and recurring themes. Results show that nearly 20% of comments reflect personal use, with these users expressing overwhelmingly positive attitudes. Commonly cited benefits include accessibility, emotional support, and perceived therapeutic value. However, concerns around privacy, generic responses, and the lack of professional oversight remain prominent. It is important to note that the user feedback does not indicate which therapeutic framework, if any, the LLM-generated output aligns with. While the findings underscore the growing relevance of AI in everyday practices, they also highlight the urgent need for clinical and ethical scrutiny in the use of AI for mental health support.

Authors:Srivathsan Amruth
Title: SPreV
Abstract:
SPREV, short for hyperSphere Reduced to two-dimensional Regular Polygon for Visualisation, is a novel dimensionality reduction technique developed to address the challenges of reducing dimensions and visualizing labeled datasets that exhibit a unique combination of three characteristics: small class size, high dimensionality, and low sample size. SPREV is designed not only to uncover but also to visually represent hidden patterns within such datasets. Its distinctive integration of geometric principles, adapted for discrete computational environments, makes it an indispensable tool in the modern data science toolkit, enabling users to identify trends, extract insights, and navigate complex data efficiently and effectively.

Authors:Luyao Zhang
Title: EthosGPT: Mapping Human Value Diversity to Advance Sustainable Development Goals (SDGs)
Abstract:
Large language models (LLMs) are transforming global decision-making and societal systems by processing diverse data at unprecedented scales. However, their potential to homogenize human values poses critical risks, similar to biodiversity loss undermining ecological resilience. Rooted in the ancient Greek concept of ethos, meaning both individual character and the shared moral fabric of communities, EthosGPT draws on a tradition that spans from Aristotle's virtue ethics to Adam Smith's moral sentiments as the ethical foundation of economic cooperation. These traditions underscore the vital role of value diversity in fostering social trust, institutional legitimacy, and long-term prosperity. EthosGPT addresses the challenge of value homogenization by introducing an open-source framework for mapping and evaluating LLMs within a global scale of human values. Using international survey data on cultural indices, prompt-based assessments, and comparative statistical analyses, EthosGPT reveals both the adaptability and biases of LLMs across regions and cultures. It offers actionable insights for developing inclusive LLMs, such as diversifying training data and preserving endangered cultural heritage to ensure representation in AI systems. These contributions align with the United Nations Sustainable Development Goals (SDGs), especially SDG 10 (Reduced Inequalities), SDG 11.4 (Cultural Heritage Preservation), and SDG 16 (Peace, Justice and Strong Institutions). Through interdisciplinary collaboration, EthosGPT promotes AI systems that are both technically robust and ethically inclusive, advancing value plurality as a cornerstone for sustainable and equitable futures.

Authors:Yiran Du
Title: Confirmation Bias in Generative AI Chatbots: Mechanisms, Risks, Mitigation Strategies, and Future Research Directions
Abstract:
This article explores the phenomenon of confirmation bias in generative AI chatbots, a relatively underexamined aspect of AI-human interaction. Drawing on cognitive psychology and computational linguistics, it examines how confirmation bias, commonly understood as the tendency to seek information that aligns with existing beliefs, can be replicated and amplified by the design and functioning of large language models. The article analyzes the mechanisms by which confirmation bias may manifest in chatbot interactions, assesses the ethical and practical risks associated with such bias, and proposes a range of mitigation strategies. These include technical interventions, interface redesign, and policy measures aimed at promoting balanced AI-generated discourse. The article concludes by outlining future research directions, emphasizing the need for interdisciplinary collaboration and empirical evaluation to better understand and address confirmation bias in generative AI systems.

Authors:Dirk HR Spennemann
Title: Delving into: the quantification of Ai-generated content on the internet (synthetic data)
Abstract:
While it is increasingly evident that the internet is becoming saturated with content created by generated Ai large language models, accurately measuring the scale of this phenomenon has proven challenging. By analyzing the frequency of specific keywords commonly used by ChatGPT, this paper demonstrates that such linguistic markers can effectively be used to esti-mate the presence of generative AI content online. The findings suggest that at least 30% of text on active web pages originates from AI-generated sources, with the actual proportion likely ap-proaching 40%. Given the implications of autophagous loops, this is a sobering realization.

Authors:Edward Sun
Title: Enhancing Product Search Interfaces with Sketch-Guided Diffusion and Language Agents
Abstract:
The rapid progress in diffusion models, transformers, and language agents has unlocked new possibilities, yet their potential in user interfaces and commercial applications remains underexplored. We present Sketch-Search Agent, a novel framework that transforms the image search experience by integrating a multimodal language agent with freehand sketches as control signals for diffusion models. Using the T2I-Adapter, Sketch-Search Agent combines sketches and text prompts to generate high-quality query images, encoded via a CLIP image encoder for efficient matching against an image corpus. Unlike existing methods, Sketch-Search Agent requires minimal setup, no additional training, and excels in sketch-based image retrieval and natural language interactions. The multimodal agent enhances user experience by dynamically retaining preferences, ranking results, and refining queries for personalized recommendations. This interactive design empowers users to create sketches and receive tailored product suggestions, showcasing the potential of diffusion models in user-centric image retrieval. Experiments confirm Sketch-Search Agent's high accuracy in delivering relevant product search results.

Authors:Nomisha Kurian
Title: Once Upon an AI: Six Scaffolds for Child-AI Interaction Design, Inspired by Disney
Abstract:
To build AI that children can intuitively understand and benefit from, designers need a design grammar that serves their developmental needs. This paper bridges artificial intelligence design for children - an emerging field still defining its best practices - and animation, a well established field with decades of experience in engaging children through accessible storytelling. Pairing Piagetian developmental theory with design pattern extraction from 52 works of animation, the paper presents a six scaffold framework that integrates design insights transferable to child centred AI design: (1) signals for visual animacy and clarity, (2) sound for musical and auditory scaffolding, (3) synchrony in audiovisual cues, (4) sidekick style personas, (5) storyplay that supports symbolic play and imaginative exploration, and (6) structure in the form of predictable narratives. These strategies, long refined in animation, function as multimodal scaffolds for attention, understanding, and attunement, supporting learning and comfort. This structured design grammar is transferable to AI design. By reframing cinematic storytelling and child development theory as design logic for AI, the paper offers heuristics for AI that aligns with the cognitive stages and emotional needs of young users. The work contributes to design theory by showing how sensory, affective, and narrative techniques can inform developmentally attuned AI design. Future directions include empirical testing, cultural adaptation, and participatory co design.

Authors:Sheikh Muhammad Farjad
Title: DaemonSec: Examining the Role of Machine Learning for Daemon Security in Linux Environments
Abstract:
DaemonSec is an early-stage startup exploring machine learning (ML)-based security for Linux daemons, a critical yet often overlooked attack surface. While daemon security remains underexplored, conventional defenses struggle against adaptive threats and zero-day exploits. To assess the perspectives of IT professionals on ML-driven daemon protection, a systematic interview study based on semi-structured interviews was conducted with 22 professionals from industry and academia. The study evaluates adoption, feasibility, and trust in ML-based security solutions. While participants recognized the potential of ML for real-time anomaly detection, findings reveal skepticism toward full automation, limited security awareness among non-security roles, and concerns about patching delays creating attack windows. This paper presents the methods, key findings, and implications for advancing ML-driven daemon security in industry.

Authors:Joshua Hatherley
Title: Data over dialogue: Why artificial intelligence is unlikely to humanise medicine
Abstract:
Recently, a growing number of experts in artificial intelligence (AI) and medicine have be-gun to suggest that the use of AI systems, particularly machine learning (ML) systems, is likely to humanise the practice of medicine by substantially improving the quality of clinician-patient relationships. In this thesis, however, I argue that medical ML systems are more likely to negatively impact these relationships than to improve them. In particular, I argue that the use of medical ML systems is likely to comprise the quality of trust, care, empathy, understanding, and communication between clinicians and patients.

Authors:Naoko Hayashida
Title: Beyond the Winding Path of Learning: Exploring Affective, Cognitive, and Action-Oriented Prompts for Communication Skills
Abstract:
Since high dropout rates in online learning platforms were reported, various factors affecting learner retention have been identified, with learners' perceptions of their experiences playing a crucial role in shaping their persistence. For instance, Kittur et al. highlight how success expectations are shaped by perceived system fit and course difficulty. Recent advances in generative Artificial Intelligence (GenAI) present new possibilities for GenAI-mediated learning. AI-generated instructional messages are often perceived as clearer than human-written content, but their impact on learners' perceptions of skill-building experiences remains underexplored. This study examines GenAI-mediated learning in a self-directed context, focusing on communication skills. We compare three messaging styles - Affective, Cognitive, and Action-Oriented - to investigate their influence on learners' perceptions of the learning process. We applied this approach to ten instructional units, using GenAI to generate 30 learning items. Three evaluators assessed them for desirability and appropriateness through numerical ratings and open-ended feedback. The 180 excerpts were analyzed using reflexive thematic analysis, revealing four overarching themes: Prerequisite Common Ground, Intrinsic Value, User Responses, and Expressed Preferences. We discuss these insights to inform the design of GenAI-mediated, self-directed skill-building, with the goal of enhancing engagement, persistence, and learning outcomes.

Authors:Mahsa Nasri
Title: Towards Intelligent VR Training: A Physiological Adaptation Framework for Cognitive Load and Stress Detection
Abstract:
Adaptive Virtual Reality (VR) systems have the potential to enhance training and learning experiences by dynamically responding to users' cognitive states. This research investigates how eye tracking and heart rate variability (HRV) can be used to detect cognitive load and stress in VR environments, enabling real-time adaptation. The study follows a three-phase approach: (1) conducting a user study with the Stroop task to label cognitive load data and train machine learning models to detect high cognitive load, (2) fine-tuning these models with new users and integrating them into an adaptive VR system that dynamically adjusts training difficulty based on physiological signals, and (3) developing a privacy-aware approach to detect high cognitive load and compare this with the adaptive VR in Phase two. This research contributes to affective computing and adaptive VR using physiological sensing, with applications in education, training, and healthcare. Future work will explore scalability, real-time inference optimization, and ethical considerations in physiological adaptive VR.

Authors:Jan Beger
Title: Not someone, but something: Rethinking trust in the age of medical AI
Abstract:
As artificial intelligence (AI) becomes embedded in healthcare, trust in medical decision-making is changing fast. Nowhere is this shift more visible than in radiology, where AI tools are increasingly embedded across the imaging workflow - from scheduling and acquisition to interpretation, reporting, and communication with referrers and patients. This opinion paper argues that trust in AI isn't a simple transfer from humans to machines - it is a dynamic, evolving relationship that must be built and maintained. Rather than debating whether AI belongs in medicine, it asks: what kind of trust must AI earn, and how? Drawing from philosophy, bioethics, and system design, it explores the key differences between human trust and machine reliability - emphasizing transparency, accountability, and alignment with the values of good care. It argues that trust in AI should not be built on mimicking empathy or intuition, but on thoughtful design, responsible deployment, and clear moral responsibility. The goal is a balanced view - one that avoids blind optimism and reflexive fear. Trust in AI must be treated not as a given, but as something to be earned over time.

Authors:Maitree Hirunteeyakul
Title: A BLE and UWB Beacon-Assist Framework for Multiuser Augmented Reality Synchronization Across Multiple Devices in Shared Environments
Abstract:
The challenge to synchronize augmented reality (AR) across sessions/devices has been solved by relying solely on vision-feature mapping, which is suboptimal in scaling workable space and flaws under visual changes in surroundings. This study implemented AR synchronization solutions utilizing location beacon technology, namely Bluetooth Low Energy (BLE) and Ultra-Wideband (UWB), to discourse scalability issues and inconsistencies in the existing AR system. The framework is bifurcated into two approaches: BLE-assist and UWB-assist AR synchronization. The BLE-assist method utilizes iBeacon technology for room context recognition, integrating with Apple's ARKit ARWorldMap and Google's ARCore Cloud Anchors. The UWB-assist solution employs precise beacon ranging capabilities fusion with the device's azimuth to establish fixed spatial reference in AR across sessions/devices. Comparative evaluations show that the UWB-assist approach outperforms the BLE-assist approach in reliability across environmental variations, as it always successfully resolves virtual anchors with a near-constant latency average at 25 seconds, regardless of the physical setting changes. Conversely, the BLE-assist implementation tends to be more accurate in resolving virtual anchors with a mean of 0.02 metres in position error and within 0.03 radian in orientation error. In the UWB-assist approach, computed fixed spatial references have an average disparity of 0.04 metres and 0.11 radians in pose. The UWB-assist approach is ideal for scenarios requiring consistently successful localization with acceptable accuracy. In contrast, the BLE-assist approach is more suitable when demanding finer precision in virtual anchor poses with the performance tradeoffs when the surroundings are altered, such as for destinated short-lived AR sessions.

Authors:Joshua Hatherley
Title: A moving target in AI-assisted decision-making: Dataset shift, model updating, and the problem of update opacity
Abstract:
Machine learning (ML) systems are vulnerable to performance decline over time due to dataset shift. To address this problem, experts often suggest that ML systems should be regularly updated to ensure ongoing performance stability. Some scholarly literature has begun to address the epistemic and ethical challenges associated with different updating methodologies. Thus far, however, little attention has been paid to the impact of model updating on the ML-assisted decision-making process itself, particularly in the AI ethics and AI epistemology literatures. This article aims to address this gap in the literature. It argues that model updating introduces a new sub-type of opacity into ML-assisted decision-making -- update opacity -- that occurs when users cannot understand how or why an update has changed the reasoning or behaviour of an ML system. This type of opacity presents a variety of distinctive epistemic and safety concerns that available solutions to the black box problem in ML are largely ill-equipped to address. A variety of alternative strategies may be developed or pursued to address the problem of update opacity more directly, including bi-factual explanations, dynamic model reporting, and update compatibility. However, each of these strategies presents its own risks or carries significant limitations. Further research will be needed to address the epistemic and safety concerns associated with model updating and update opacity going forward.

Authors:Zihao Wu
Title: Autono: A ReAct-Based Highly Robust Autonomous Agent Framework
Abstract:
This paper proposes a highly robust autonomous agent framework based on the ReAct paradigm, designed to solve complex tasks through adaptive decision making and multi-agent collaboration. Unlike traditional frameworks that rely on fixed workflows generated by LLM-based planners, this framework dynamically generates next actions during agent execution based on prior trajectories, thereby enhancing its robustness. To address potential termination issues caused by adaptive execution paths, I propose a timely abandonment strategy incorporating a probabilistic penalty mechanism. For multi-agent collaboration, I introduce a memory transfer mechanism that enables shared and dynamically updated memory among agents. The framework's innovative timely abandonment strategy dynamically adjusts the probability of task abandonment via probabilistic penalties, allowing developers to balance conservative and exploratory tendencies in agent execution strategies by tuning hyperparameters. This significantly improves adaptability and task execution efficiency in complex environments. Additionally, agents can be extended through external tool integration, supported by modular design and MCP protocol compatibility, which enables flexible action space expansion. Through explicit division of labor, the multi-agent collaboration mechanism enables agents to focus on specific task components, thereby significantly improving execution efficiency and quality.

Authors:Romy Müller
Title: How humans evaluate AI systems for person detection in automatic train operation: Not all misses are alike
Abstract:
If artificial intelligence (AI) is to be applied in safety-critical domains, its performance needs to be evaluated reliably. The present study aimed to understand how humans evaluate AI systems for person detection in automatic train operation. In three experiments, participants saw image sequences of people moving in the vicinity of railway tracks. A simulated AI had highlighted all detected people, sometimes correctly and sometimes not. Participants had to provide a numerical rating of the AI's performance and then verbally explain their rating. The experiments varied several factors that might influence human ratings: the types and plausibility of AI mistakes, the number of affected images, the number of people present in an image, the position of people relevant to the tracks, and the methods used to elicit human evaluations. While all these factors influenced human ratings, some effects were unexpected or deviated from normative standards. For instance, the factor with the strongest impact was people's position relative to the tracks, although participants had explicitly been instructed that the AI could not process such information. Taken together, the results suggest that humans may sometimes evaluate more than the AI's performance on the assigned task. Such mismatches between AI capabilities and human expectations should be taken into consideration when conducting safety audits of AI systems.

Authors:Nathan TeBlunthuis
Title: Niche Dynamics in Complex Online Community Ecosystems
Abstract:
Online communities are important organizational forms where members socialize and share information. Curiously, different online communities often overlap considerably in topic and membership. Recent research has investigated competition and mutualism among overlapping online communities through the lens of organizational ecology; however, it has not accounted for how the nonlinear dynamics of online attention may lead to episodic competition and mutualism. Neither has it explored the origins of competition and mutualism in the processes by which online communities select or adapt to their niches. This paper presents a large-scale study of 8,806 Reddit communities belonging to 1,919 clusters of high user overlap over a 5-year period. The method uses nonlinear time series methods to infer bursty, often short-lived ecological dynamics. Results reveal that mutualism episodes are longer lived and slightly more frequent than competition episodes. Next, it tests whether online communities find their niches by specializing to avoid competition using panel regression models. It finds that competitive ecological interactions lead to decreasing topic and user overlaps; however, changes that decrease such niche overlaps do not lead to mutualism. The discussion proposes that future designs may enable online community ecosystem management by informing online community leaders to organize "spin-off" communities or via feeds and recommendations.

Authors:Alexander M. Sidorkin
Title: Form-Substance Discrimination: Concept, Cognition, and Pedagogy
Abstract:
The skill to separate form from substance in writing has gained new prominence in the age of AI-generated content. The challenge - discriminating between fluent expression and substantive thought - constitutes a critical literacy skill for modern education. This paper examines form-substance discrimination (FSD) as an essential learning outcome for curriculum development in higher education. We analyze its cognitive foundations in fluency bias and inhibitory control, trace its evolution from composition theory concepts like "higher-order concerns," and explore how readers progress from novice acceptance of polished text to expert critical assessment. Drawing on research in cognitive psychology, composition studies, and emerging AI pedagogy, we propose practical strategies for fostering this ability through curriculum design, assessment practices, and explicit instruction. By prioritizing substance over surface in writing education, institutions can prepare students to navigate an information landscape where AI-generated content amplifies the ancient tension between style and meaning, ultimately safeguarding the value of authentic human thought in knowledge construction and communication.

Authors:Jun Rekimoto
Title: GazeLLM: Multimodal LLMs incorporating Human Visual Attention
Abstract:
Large Language Models (LLMs) are advancing into Multimodal LLMs (MLLMs), capable of processing image, audio, and video as well as text. Combining first-person video, MLLMs show promising potential for understanding human activities through video and audio, enabling many human-computer interaction and human-augmentation applications such as human activity support, real-world agents, and skill transfer to robots or other individuals. However, handling high-resolution, long-duration videos generates large latent representations, leading to substantial memory and processing demands, limiting the length and resolution MLLMs can manage. Reducing video resolution can lower memory usage but often compromises comprehension. This paper introduces a method that optimizes first-person video analysis by integrating eye-tracking data, and proposes a method that decomposes first-person vision video into sub areas for regions of gaze focus. By processing these selectively gazed-focused inputs, our approach achieves task comprehension equivalent to or even better than processing the entire image at full resolution, but with significantly reduced video data input (reduce the number of pixels to one-tenth), offering an efficient solution for using MLLMs to interpret and utilize human skills.

Authors:Giorgia Adorni
Title: Towards an intelligent assessment system for evaluating the development of algorithmic thinking skills: An exploratory study in Swiss compulsory schools
Abstract:
The rapid digitalisation of contemporary society has profoundly impacted various facets of our lives, including healthcare, communication, business, and education. The ability to engage with new technologies and solve problems has become crucial, making CT skills, such as pattern recognition, decomposition, and algorithm design, essential competencies. In response, Switzerland is conducting research and initiatives to integrate CT into its educational system. This study aims to develop a comprehensive framework for large-scale assessment of CT skills, particularly focusing on AT, the ability to design algorithms. To achieve this, we first developed a competence model capturing the situated and developmental nature of CT, guiding the design of activities tailored to cognitive abilities, age, and context. This framework clarifies how activity characteristics influence CT development and how to assess these competencies. Additionally, we developed an activity for large-scale assessment of AT skills, offered in two variants: one based on non-digital artefacts (unplugged) and manual expert assessment, and the other based on digital artefacts (virtual) and automatic assessment. To provide a more comprehensive evaluation of students' competencies, we developed an IAS based on BNs with noisy gates, which offers real-time probabilistic assessment for each skill rather than a single overall score. The results indicate that the proposed instrument can measure AT competencies across different age groups and educational contexts in Switzerland, demonstrating its applicability for large-scale use. AT competencies exhibit a progressive development, with no overall gender differences, though variations are observed at the school level, significantly influenced by the artefact-based environment and its context, underscoring the importance of creating accessible and adaptable assessment tools.

Authors:Yue Yin
Title: InfoBid: A Simulation Framework for Studying Information Disclosure in Auctions with Large Language Model-based Agents
Abstract:
In online advertising systems, publishers often face a trade-off in information disclosure strategies: while disclosing more information can enhance efficiency by enabling optimal allocation of ad impressions, it may lose revenue potential by decreasing uncertainty among competing advertisers. Similar to other challenges in market design, understanding this trade-off is constrained by limited access to real-world data, leading researchers and practitioners to turn to simulation frameworks. The recent emergence of large language models (LLMs) offers a novel approach to simulations, providing human-like reasoning and adaptability without necessarily relying on explicit assumptions about agent behavior modeling. Despite their potential, existing frameworks have yet to integrate LLM-based agents for studying information asymmetry and signaling strategies, particularly in the context of auctions. To address this gap, we introduce InfoBid, a flexible simulation framework that leverages LLM agents to examine the effects of information disclosure strategies in multi-agent auction settings. Using GPT-4o, we implemented simulations of second-price auctions with diverse information schemas. The results reveal key insights into how signaling influences strategic behavior and auction outcomes, which align with both economic and social learning theories. Through InfoBid, we hope to foster the use of LLMs as proxies for human economic and social agents in empirical studies, enhancing our understanding of their capabilities and limitations. This work bridges the gap between theoretical market designs and practical applications, advancing research in market simulations, information design, and agent-based reasoning while offering a valuable tool for exploring the dynamics of digital economies.

Authors:Joshua Krook
Title: When Autonomy Breaks: The Hidden Existential Risk of AI
Abstract:
AI risks are typically framed around physical threats to humanity, a loss of control or an accidental error causing humanity's extinction. However, I argue in line with the gradual disempowerment thesis, that there is an underappreciated risk in the slow and irrevocable decline of human autonomy. As AI starts to outcompete humans in various areas of life, a tipping point will be reached where it no longer makes sense to rely on human decision-making, creativity, social care or even leadership. What may follow is a process of gradual de-skilling, where we lose skills that we currently take for granted. Traditionally, it is argued that AI will gain human skills over time, and that these skills are innate and immutable in humans. By contrast, I argue that humans may lose such skills as critical thinking, decision-making and even social care in an AGI world. The biggest threat to humanity is therefore not that machines will become more like humans, but that humans will become more like machines.

Authors:Masanori Shimono
Title: In vitro 2 In vivo : Bidirectional and High-Precision Generation of In Vitro and In Vivo Neuronal Spike Data
Abstract:
Neurons encode information in a binary manner and process complex signals. However, predicting or generating diverse neural activity patterns remains challenging. In vitro and in vivo studies provide distinct advantages, yet no robust computational framework seamlessly integrates both data types. We address this by applying the Transformer model, widely used in large-scale language models, to neural data. To handle binary data, we introduced Dice loss, enabling accurate cross-domain neural activity generation. Structural analysis revealed how Dice loss enhances learning and identified key brain regions facilitating high-precision data generation. Our findings support the 3Rs principle in animal research, particularly Replacement, and establish a mathematical framework bridging animal experiments and human clinical studies. This work advances data-driven neuroscience and neural activity modeling, paving the way for more ethical and effective experimental methodologies.

Authors:Yue Yin
Title: Dynamic Learning and Productivity for Data Analysts: A Bayesian Hidden Markov Model Perspective
Abstract:
Data analysts are essential in organizations, transforming raw data into insights that drive decision-making and strategy. This study explores how analysts' productivity evolves on a collaborative platform, focusing on two key learning activities: writing queries and viewing peer queries. While traditional research often assumes static models, where performance improves steadily with cumulative learning, such models fail to capture the dynamic nature of real-world learning. To address this, we propose a Hidden Markov Model (HMM) that tracks how analysts transition between distinct learning states based on their participation in these activities. Using an industry dataset with 2,001 analysts and 79,797 queries, this study identifies three learning states: novice, intermediate, and advanced. Productivity increases as analysts advance to higher states, reflecting the cumulative benefits of learning. Writing queries benefits analysts across all states, with the largest gains observed for novices. Viewing peer queries supports novices but may hinder analysts in higher states due to cognitive overload or inefficiencies. Transitions between states are also uneven, with progression from intermediate to advanced being particularly challenging. This study advances understanding of into dynamic learning behavior of knowledge worker and offers practical implications for designing systems, optimizing training, enabling personalized learning, and fostering effective knowledge sharing.

Authors:Murong Yue
Title: A Survey of Large Language Model Agents for Question Answering
Abstract:
This paper surveys the development of large language model (LLM)-based agents for question answering (QA). Traditional agents face significant limitations, including substantial data requirements and difficulty in generalizing to new environments. LLM-based agents address these challenges by leveraging LLMs as their core reasoning engine. These agents achieve superior QA results compared to traditional QA pipelines and naive LLM QA systems by enabling interaction with external environments. We systematically review the design of LLM agents in the context of QA tasks, organizing our discussion across key stages: planning, question understanding, information retrieval, and answer generation. Additionally, this paper identifies ongoing challenges and explores future research directions to enhance the performance of LLM agent QA systems.

Authors:Eman Alashwali
Title: Two Types of Data Privacy Controls
Abstract:
Users share a vast amount of data while using web and mobile applications. Most service providers such as email and social media providers provide users with privacy controls, which aim to give users the means to control what, how, when, and with whom, users share data. Nevertheless, it is not uncommon to hear users say that they feel they have lost control over their data on the web. This article aims to shed light on the often overlooked difference between two main types of privacy from a control perspective: privacy between a user and other users, and privacy between a user and institutions. We argue why this difference is important and what we need to do from here.

Authors:Joshua Krook
Title: Manipulation and the AI Act: Large Language Model Chatbots and the Danger of Mirrors
Abstract:
Large Language Model chatbots are increasingly taking the form and visage of human beings, adapting human faces, names, voices, personalities, and quirks, including those of celebrities and well-known political figures. Personifying AI chatbots could foreseeably increase their trust with users. However, it could also make them more capable of manipulation, by creating the illusion of a close and intimate relationship with an artificial entity. The European Commission has finalized the AI Act, with the EU Parliament making amendments banning manipulative and deceptive AI systems that cause significant harm to users. Although the AI Act covers harms that accumulate over time, it is unlikely to prevent harms associated with prolonged discussions with AI chatbots. Specifically, a chatbot could reinforce a person's negative emotional state over weeks, months, or years through negative feedback loops, prolonged conversations, or harmful recommendations, contributing to a user's deteriorating mental health.

Authors:Jin Kim
Title: How to Capture and Study Conversations Between Research Participants and ChatGPT: GPT for Researchers (g4r.org)
Abstract:
As large language models (LLMs) like ChatGPT become increasingly integrated into our everyday lives--from customer service and education to creative work and personal productivity--understanding how people interact with these AI systems has become a pressing issue. Despite the widespread use of LLMs, researchers lack standardized tools for systematically studying people's interactions with LLMs. To address this issue, we introduce GPT for Researchers (G4R), or g4r.org, a free website that researchers can use to easily create and integrate a GPT Interface into their studies. At g4r.org, researchers can (1) enable their study participants to interact with GPT (such as ChatGPT), (2) customize GPT Interfaces to guide participants' interactions with GPT (e.g., set constraints on topics or adjust GPT's tone or response style), and (3) capture participants' interactions with GPT by downloading data on messages exchanged between participants and GPT. By facilitating study participants' interactions with GPT and providing detailed data on these interactions, G4R can support research on topics such as consumer interactions with AI agents or LLMs, AI-assisted decision-making, and linguistic patterns in human-AI communication. With this goal in mind, we provide a step-by-step guide to using G4R at g4r.org.

Authors:Torsten Tiltack
Title: AIJIM: A Scalable Model for Real-Time AI in Environmental Journalism
Abstract:
This paper introduces AIJIM, the Artificial Intelligence Journalism Integration Model -- a novel framework for integrating real-time AI into environmental journalism. AIJIM combines Vision Transformer-based hazard detection, crowdsourced validation with 252 validators, and automated reporting within a scalable, modular architecture. A dual-layer explainability approach ensures ethical transparency through fast CAM-based visual overlays and optional LIME-based box-level interpretations. Validated in a 2024 pilot on the island of Mallorca using the NamicGreen platform, AIJIM achieved 85.4\% detection accuracy and 89.7\% agreement with expert annotations, while reducing reporting latency by 40\%. Unlike conventional approaches such as Data-Driven Journalism or AI Fact-Checking, AIJIM provides a transferable model for participatory, community-driven environmental reporting, advancing journalism, artificial intelligence, and sustainability in alignment with the UN Sustainable Development Goals and the EU AI Act.

Authors:Roberto Balestri
Title: Gender and content bias in Large Language Models: a case study on Google Gemini 2.0 Flash Experimental
Abstract:
This study evaluates the biases in Gemini 2.0 Flash Experimental, a state-of-the-art large language model (LLM) developed by Google, focusing on content moderation and gender disparities. By comparing its performance to ChatGPT-4o, examined in a previous work of the author, the analysis highlights some differences in ethical moderation practices. Gemini 2.0 demonstrates reduced gender bias, notably with female-specific prompts achieving a substantial rise in acceptance rates compared to results obtained by ChatGPT-4o. It adopts a more permissive stance toward sexual content and maintains relatively high acceptance rates for violent prompts, including gender-specific cases. Despite these changes, whether they constitute an improvement is debatable. While gender bias has been reduced, this reduction comes at the cost of permitting more violent content toward both males and females, potentially normalizing violence rather than mitigating harm. Male-specific prompts still generally receive higher acceptance rates than female-specific ones. These findings underscore the complexities of aligning AI systems with ethical standards, highlighting progress in reducing certain biases while raising concerns about the broader implications of the model's permissiveness. Ongoing refinements are essential to achieve moderation practices that ensure transparency, fairness, and inclusivity without amplifying harmful content.

Authors:Lars Malmqvist
Title: Enhancing Post-Merger Integration Planning through AI-Assisted Dependency Analysis and Path Generation
Abstract:
Post-merger integration (PMI) planning presents significant challenges due to the complex interdependencies between integration initiatives and their associated synergies. While dependency-based planning approaches offer valuable frameworks, practitioners often become anchored to specific integration paths without systematically exploring alternative solutions. This research introduces a novel AI-assisted tool designed to expand and enhance the exploration of viable integration planning options. The proposed system leverages a frontier model-based agent augmented with specialized reasoning techniques to map and analyze dependencies between integration plan elements. Through a chain-of-thought planning approach, the tool guides users in systematically exploring the integration planning space, helping identify and evaluate alternative paths that might otherwise remain unconsidered. In an initial evaluation using a simulated case study, participants using the tool identified 43% more viable integration planning options compared to the control group. While the quality of generated options showed improvement, the effect size was modest. These preliminary results suggest promising potential for AI-assisted tools in enhancing the systematic exploration of PMI planning alternatives. This early-stage research contributes to both the theoretical understanding of AI-assisted planning in complex organizational contexts and the practical development of tools to support PMI planning. Future work will focus on refining the underlying models and expanding the evaluation scope to real-world integration scenarios.

Authors:Iyad Sultan
Title: Open-Source Tool for Evaluating Human-Generated vs. AI-Generated Medical Notes Using the PDQI-9 Framework
Abstract:
Background: The increasing use of artificial intelligence (AI) in healthcare documentation necessitates robust methods for evaluating the quality of AI-generated medical notes compared to those written by humans. This paper introduces an open-source tool, the Human Notes Evaluator, designed to assess clinical note quality and differentiate between human and AI authorship. Methods: The Human Notes Evaluator is a Flask-based web application implemented on Hugging Face Spaces. It employs the Physician Documentation Quality Instrument (PDQI-9), a validated 9-item rubric, to evaluate notes across dimensions such as accuracy, thoroughness, clarity, and more. The tool allows users to upload clinical notes in CSV format and systematically score each note against the PDQI-9 criteria, as well as assess the perceived origin (human, AI, or undetermined). Results: The Human Notes Evaluator provides a user-friendly interface for standardized note assessment. It outputs comprehensive results, including individual PDQI-9 scores for each criterion, origin assessments, and overall quality metrics. Exportable data facilitates comparative analyses between human and AI-generated notes, identification of quality trends, and areas for documentation improvement. The tool is available online at https://huggingface.co/spaces/iyadsultan/human_evaluator . Discussion: This open-source tool offers a valuable resource for researchers, healthcare professionals, and AI developers to rigorously evaluate and compare the quality of medical notes. By leveraging the PDQI-9 framework, it provides a structured and reliable approach to assess clinical documentation, contributing to the responsible integration of AI in healthcare. The tool's availability on Hugging Face promotes accessibility and collaborative development in the field of AI-driven medical documentation.

Authors:Andy Buschmann
Title: AIDetection: A Generative AI Detection Tool for Educators Using Syntactic Matching of Common ASCII Characters As Potential 'AI Traces' Within Users' Internet Browser
Abstract:
This paper introduces a simple JavaScript-based web application designed to assist educators in detecting AI-generated content in student essays and written assignments. Unlike existing AI detection tools that rely on obfuscated machine learning models, AIDetection.info employs a heuristic-based approach to identify common syntactic traces left by generative AI models, such as ChatGPT, Claude, Grok, DeepSeek, Gemini, Llama/Meta, Microsoft Copilot, Grammarly AI, and other text-generating models and wrapper applications. The tool scans documents in bulk for potential AI artifacts, as well as AI citations and acknowledgments, and provides a visual summary with downloadable Excel and CSV reports. This article details its methodology, functionalities, limitations, and applications within educational settings.

Authors:Birger Moell
Title: Artificial Humans
Abstract:
This study investigates the development and assessment of an artificial human designed as a conversational AI chatbot, focusing on its role as a clinical psychologist. The project involved creating a specialized chatbot using the Character.ai platform. The chatbot was designed to engage users in psychological discussions, providing advice and support with a human-like touch. The study involved participants (N=27) from diverse backgrounds, including psychologists, AI researchers, and the general public, who interacted with the chatbot and provided feedback on its human-likeness, empathy, and engagement levels. Results indicate that while many users found the chatbot engaging and somewhat human-like, limitations were noted in areas such as empathy and nuanced understanding. The findings suggest that although conversational AI has made strides, it remains far from achieving the true human-like interaction necessary for Artificial General Intelligence (AGI). The study highlights the challenges and potential of AI in human-computer interactions, suggesting directions for future research and development to bridge the gap between current capabilities and AGI. The project was completed in November of 2022 before the release of chatGPT.

Authors:Priyanshu Chaubey
Title: Virtual Reality in Social Media: A New Era of Immersive Social Interactions
Abstract:
Human communication has been profoundly changed by social media, which allows users to engage in previously unheard-of ways, such as text-based conversations, video chats, and live streaming. The digital landscape has started to change in recent years as a result of the introduction of Virtual Reality (VR) to these platforms. Instead of using conventional 2D screens, VR offers a completely immersive experience that lets users interact with content and one another in 3D spaces. This study examines the integration of virtual reality (VR) technology into social media applications, evaluating their potential to provide more dynamic and captivating digital spaces. Globally, social media sites like Facebook, Instagram, and Twitter have already changed the nature of communication. Immersion technologies like virtual reality (VR) represent the next stage, though, as they have the ability to change how we interact, connect, and share in social settings in addition to improving user experience.

Authors:Sirinda Palahan
Title: PythonPal: Enhancing Online Programming Education through Chatbot-Driven Personalized Feedback
Abstract:
The rise of online programming education has necessitated more effective, personalized interactions, a gap that PythonPal aims to fill through its innovative learning system integrated with a chatbot. This research delves into PythonPal's potential to enhance the online learning experience, especially in contexts with high student-to-teacher ratios where there is a need for personalized feedback. PythonPal's design, featuring modules for conversation, tutorials, and exercises, was evaluated through student interactions and feedback. Key findings reveal PythonPal's proficiency in syntax error recognition and user query comprehension, with its intent classification model showing high accuracy. The system's performance in error feedback, though varied, demonstrates both strengths and areas for enhancement. Student feedback indicated satisfactory query understanding and feedback accuracy but also pointed out the need for faster responses and improved interaction quality. PythonPal's deployment promises to significantly enhance online programming education by providing immediate, personalized feedback and interactive learning experiences, fostering a deeper understanding of programming concepts among students. These benefits mark a step forward in addressing the challenges of distance learning, making programming education more accessible and effective.

Authors:S M Taslim Uddin Raju
Title: Enhancing Human-Robot Interaction in Healthcare: A Study on Nonverbal Communication Cues and Trust Dynamics with NAO Robot Caregivers
Abstract:
As the population of older adults increases, so will the need for both human and robot care providers. While traditional practices involve hiring human caregivers to serve meals and attend to basic needs, older adults often require continuous companionship and health monitoring. However, hiring human caregivers for this job costs a lot of money. However, using a robot like Nao could be cheaper and still helpful. This study explores the integration of humanoid robots, particularly Nao, in health monitoring and caregiving for older adults. Using a mixed-methods approach with a within-subject factorial design, we investigated the effectiveness of nonverbal communication modalities, including touch, gestures, and LED patterns, in enhancing human-robot interactions. Our results indicate that Nao's touch-based health monitoring was well-received by participants, with positive ratings across various dimensions. LED patterns were perceived as more effective and accurate compared to hand and head gestures. Moreover, longer interactions were associated with higher trust levels and perceived empathy, highlighting the importance of prolonged engagement in fostering trust in human-robot interactions. Despite limitations, our study contributes valuable insights into the potential of humanoid robots to improve health monitoring and caregiving for older adults.

Authors:Katie Seaborn
Title: ChatGPT and U(X): A Rapid Review on Measuring the User Experience
Abstract:
ChatGPT, powered by a large language model (LLM), has revolutionized everyday human-computer interaction (HCI) since its 2022 release. While now used by millions around the world, a coherent pathway for evaluating the user experience (UX) ChatGPT offers remains missing. In this rapid review (N = 58), I explored how ChatGPT UX has been approached quantitatively so far. I focused on the independent variables (IVs) manipulated, the dependent variables (DVs) measured, and the methods used for measurement. Findings reveal trends, gaps, and emerging consensus in UX assessments. This work offers a first step towards synthesizing existing approaches to measuring ChatGPT UX, urgent trajectories to advance standardization and breadth, and two preliminary frameworks aimed at guiding future research and tool development. I seek to elevate the field of ChatGPT UX by empowering researchers and practitioners in optimizing user interactions with ChatGPT and similar LLM-based systems.

Authors:Sean Koon
Title: A Beautiful Mind: Principles and Strategies for AI-Augmented Human Reasoning
Abstract:
Amidst the race to create more intelligent machines there is a risk that we will rely on AI in ways that reduce our own agency as humans. To reduce this risk, we could aim to create tools that prioritize and enhance the human role in human-AI interactions. This paper outlines a human-centered augmented reasoning paradigm by 1. Articulating fundamental principles for augmented reasoning tools, emphasizing their ergonomic, pre-conclusive, directable, exploratory, enhancing, and integrated nature; 2. Proposing a 'many tasks, many tools' approach to ensuring human influence and control, and 3. Offering examples of interaction modes that can serve as bridges between human reasoning and AI algorithms.

Authors:Mandar Kulkarni
Title: Agent-S: LLM Agentic workflow to automate Standard Operating Procedures
Abstract:
AI agents using Large Language Models (LLMs) as foundations have shown promise in solving complex real-world tasks. In this paper, we propose an LLM-based agentic workflow for automating Standard Operating Procedures (SOP). For customer care operations, an SOP defines a logical step-by-step process for human agents to resolve customer issues. We observe that any step in the SOP can be categorized as user interaction or API call, while the logical flow in the SOP defines the navigation. We use LLMs augmented with memory and environments (API tools, user interface, external knowledge source) for SOP automation. Our agentic architecture consists of three task-specific LLMs, a Global Action Repository (GAR), execution memory, and multiple environments. SOP workflow is written as a simple logical block of text. Based on the current execution memory and the SOP, the agent chooses the action to execute; it interacts with an appropriate environment (user/API) to collect observations and feedback, which are, in turn, inputted to memory to decide the next action. The agent is designed to be fault-tolerant, where it dynamically decides to repeat an action or seek input from an external knowledge source. We demonstrate the efficacy of the proposed agent on the three SOPs from the e-commerce seller domain. The experimental results validate the agent's performance under complex real-world scenarios.

Authors:Oleksandr Korostin
Title: Analysis of AI Effectiveness in Reducing Human Errors in Processing Transportation Requests
Abstract:
This article examines the characteristics of human errors in processing transportation requests. The role of artificial intelligence (AI) in maritime transportation is explored. The main methods and technologies used for automating and optimizing the handling of transportation requests are analyzed, along with their impact on reducing the number of errors. Examples of successful AI implementation in large companies are provided, confirming the positive influence of these technologies on overall operational efficiency and customer service levels.

Authors:Kenta Kitamura
Title: Assessing Human Intelligence Augmentation Strategies Using Brain Machine Interfaces and Brain Organoids in the Era of AI Advancement
Abstract:
The rapid advancement of Artificial Intelligence (AI) technologies, including the potential emergence of Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI), has raised concerns about AI surpassing human cognitive capabilities. To address this challenge, intelligence augmentation approaches, such as Brain Machine Interfaces (BMI) and Brain Organoid (BO) integration have been proposed. In this study, we compare three intelligence augmentation strategies, namely BMI, BO, and a hybrid approach combining both. These strategies are evaluated from three key perspectives that influence user decisions in selecting an augmentation method: information processing capacity, identity risk, and consent authenticity risk. First, we model these strategies and assess them across the three perspectives. The results reveal that while BO poses identity risks and BMI has limitations in consent authenticity capacity, the hybrid approach mitigates these weaknesses by striking a balance between the two. Second, we investigate how users might choose among these intelligence augmentation strategies in the context of evolving AI capabilities over time. As the result, we find that BMI augmentation alone is insufficient to compete with advanced AI, and while BO augmentation offers scalability, BO increases identity risks as the scale grows. Moreover, the hybrid approach provides a balanced solution by adapting to AI advancements. This study provides a novel framework for human capability augmentation in the era of advancing AI and serves as a guideline for adapting to AI development.

Authors:Jimi Togni
Title: Development of an Inclusive Educational Platform Using Open Technologies and Machine Learning: A Case Study on Accessibility Enhancement
Abstract:
This study addresses the pressing challenge of educational inclusion for students with special needs by proposing and developing an inclusive educational platform. Integrating machine learning, natural language processing, and cross-platform interfaces, the platform features key functionalities such as speech recognition functionality to support voice commands and text generation via voice input; real-time object recognition using the YOLOv5 model, adapted for educational environments; Grapheme-to-Phoneme (G2P) conversion for Text-to-Speech systems using seq2seq models with attention, ensuring natural and fluent voice synthesis; and the development of a cross-platform mobile application in Flutter with on-device inference execution using TensorFlow Lite. The results demonstrated high accuracy, usability, and positive impact in educational scenarios, validating the proposal as an effective tool for educational inclusion. This project underscores the importance of open and accessible technologies in promoting inclusive and quality education.

Authors:Roman Laas
Title: Entwicklung einer Webanwendung zur Generierung von skolemisierten RDF Daten für die Verwaltung von Lieferketten
Abstract:
Für eine frühzeitige Erkennung von Lieferengpässen müssen Lieferketten in einer geeigneten digitalen Form vorliegen, damit sie verarbeitet werden können. Der für die Datenmodellierung benötigte Arbeitsaufwand ist jedoch, gerade IT-fremden Personen, nicht zuzumuten. Es wurde deshalb im Rahmen dieser Arbeit eine Webanwendung entwickelt, welche die zugrunde liegende Komplexität für den Benutzer verschleiern soll. Konkret handelt es sich dabei um eine grafische Benutzeroberfläche, auf welcher Templates instanziiert und miteinander verknüpft werden können. Für die Definition dieser Templates wurden in dieser Arbeit geeignete Konzepte erarbeitet und erweitert. Zur Erhebung der Benutzerfreundlichkeit der Webanwendung wurde abschließend eine Nutzerstudie mit mehreren Testpersonen durchgeführt. Diese legte eine Vielzahl von nützlichen Verbesserungsvorschlägen offen. -- For early detection of supply bottlenecks, supply chains must be available in a suitable digital form so that they can be processed. However, the amount of work required for data modeling cannot be expected of people who are not familiar with IT topics. Therefore, a web application was developed in the context of this thesis, which is supposed to disguise the underlying complexity for the user. Specifically, this is a graphical user interface on which templates can be instantiated and linked to each other. Suitable concepts for the definition of these templates were developed and extended in this thesis. Finally, a user study with several test persons was conducted to determine the usability of the web application. This revealed a large number of useful suggestions for improvement.

Authors:Andrew M. Lydner
Title: Human-AI Collaboration for Wearable Technology Component Standardization
Abstract:
Due to the multidisciplinary nature of wearable technology, the industry faces potential limitations in innovation. The wearable technology industry is still in its infancy and increased applicable use faces stagnation despite the plethora of technologies that have been largely wrist worn. This could be a result of the lack of multidisciplinary expert knowledge disseminating through the industry. Unlike other technologies which have standardizations and processes for how they are developed, wearable technologies exist in a realm of perpetual change as given the various materials and subcomponents that continue to be developed. It is essential that expert opinions form a collaborative foundation, and even more so that intelligent systems foster that collaboration. The caveat though, is likeliness of these artificial intelligence (AI) collaboration tools to be utilized by industry experts. Mental model development for AI tool usage could be applied to wearable technology innovation in this regard, thus the goal of this paper and focus of research.

Authors:Jingrui An
Title: Authenticity as Aesthetics: Enabling the Client to Dominate Decision-making in Co-design
Abstract:
This paper revises aesthetics theory through the lens of authenticity and investigates practical applications using a co-design approach. We encourage designers to include ordinary clients as co-creators in the co-design process, guiding them in expressing their aesthetics, values, and preferences while stimulating their creativity. This paper proposes a bespoke design process framework for authenticity aesthetics that incorporates empathy, defining, ideating, prototyping, and testing. This framework delineates the roles and responsibilities of clients and designers at different phases and highlights evolving material mediums that enable their communication. The paper concludes by reflecting on consumerist aesthetics, advocating for designers to focus on the insights of ordinary clients, design for their authentic uniqueness, and recognize the broad prospects of bespoke design methods.

Authors:Shahmar Mirishli
Title: The Role of Legal Frameworks in Shaping Ethical Artificial Intelligence Use in Corporate Governance
Abstract:
This article examines the evolving role of legal frameworks in shaping ethical artificial intelligence (AI) use in corporate governance. As AI systems become increasingly prevalent in business operations and decision-making, there is a growing need for robust governance structures to ensure their responsible development and deployment. Through analysis of recent legislative initiatives, industry standards, and scholarly perspectives, this paper explores key legal and regulatory approaches aimed at promoting transparency, accountability, and fairness in corporate AI applications. It evaluates the strengths and limitations of current frameworks, identifies emerging best practices, and offers recommendations for developing more comprehensive and effective AI governance regimes. The findings highlight the importance of adaptable, principle-based regulations coupled with sector-specific guidance to address the unique challenges posed by AI technologies in the corporate sphere.

Authors:Shahmar Mirishli
Title: Ethical Implications of AI in Data Collection: Balancing Innovation with Privacy
Abstract:
This article examines the ethical and legal implications of artificial intelligence (AI) driven data collection, focusing on developments from 2023 to 2024. It analyzes recent advancements in AI technologies and their impact on data collection practices across various sectors. The study compares regulatory approaches in the European Union, the United States, and China, highlighting the challenges in creating a globally harmonized framework for AI governance. Key ethical issues, including informed consent, algorithmic bias, and privacy protection, are critically assessed in the context of increasingly sophisticated AI systems. The research explores case studies in healthcare, finance, and smart cities to illustrate the practical challenges of AI implementation. It evaluates the effectiveness of current legal frameworks and proposes solutions encompassing legal and policy recommendations, technical safeguards, and ethical frameworks. The article emphasizes the need for adaptive governance and international cooperation to address the global nature of AI development while balancing innovation with the protection of individual rights and societal values.

Authors:Yi Wang
Title: EmotionCarrier: A Multimodality 'Mindfulness-Training' Tool for Positive Emotional Value
Abstract:
This study introduced a Multimodal Mindfulness-Training System. Our installation, 'EmotionCarrier', correlates traditional calligraphy interactions with real-time physiological data from an Apple Watch. We aim to enhance mindfulness training effectiveness, aiding in achieving physiological calmness through calligraphy practice. Our experiments with varied participant groups focused on data diversity, usability, and stability. We adopted methods like using EmotionCarrier for Heart Sutra transcription and adjusting installation placement for optimal user experience. Our primary finding was a correlation between calligraphy performance data and emotional responses during the transcription of the Heart Sutra.

Authors:Jonas Oppenlaender
Title: DangerMaps: Personalized Safety Advice for Travel in Urban Environments using a Retrieval-Augmented Language Model
Abstract:
Planning a trip into a potentially unsafe area is a difficult task. We conducted a formative study on travelers' information needs, finding that most of them turn to search engines for trip planning. Search engines, however, fail to provide easily interpretable results adapted to the context and personal information needs of a traveler. Large language models (LLMs) create new possibilities for providing personalized travel safety advice. To explore this idea, we developed DangerMaps, a mapping system that assists its users in researching the safety of an urban travel destination, whether it is pre-travel or on-location. DangerMaps plots safety ratings onto a map and provides explanations on demand. This late breaking work specifically emphasizes the challenges of designing real-world applications with large language models. We provide a detailed description of our approach to prompt design and highlight future areas of research.

Authors:Hussein Naeem Hasan
Title: A Wearable Rehabilitation System to Assist Partially Hand Paralyzed Patients in Repetitive Exercises
Abstract:
The main purpose of the paper is development, implementation, and testing of a low-cost portable system to assist partially paralyzed patients in their hand rehabilitation after strokes or some injures. Rehabilitation includes time consuming and repetitive exercises which are costly and demotivating as well as the requirements of clinic attending and direct supervision of physiotherapists. In this work, the system consists of a graphical user interface (GUI) on a smartphone screen to instruct and motivate the patients to do their exercises by themselves. Through the GUI, the patients are instructed to do a sequence of exercises step by step, and the system measures the electrical activities (electromyographic signals EMG) of the user's forearm muscles by Myo armband. Depending on d database, the system can tell whether the patients have done correct movements or not. If a correct movement is detected, the system will inform the user through the GUI and move to the next exercise. For preliminary results, the system was extensively tested on a healthy person.

Authors:Zhang di
Title: Effect factors of motion aftereffect in depth: Adaptation direction and induced eyes vergence
Abstract:
Motion aftereffect (MAE) offers valuable insights into the mechanisms underlying motion-in-depth (MID) perception. This study investigates two critical aspects of MAE in depth: (1) the potential directional asymmetry between motion toward versus away from the observer, and (2) the effect of induced eye vergence on MAE magnitude. We conducted two experiments using random dot stereograms (RDS) to isolate the interocular velocity difference (IOVD) mechanism. In Experiment 1, we compared MAE magnitude following adaptation to motion-toward versus motion-away stimuli with a static fixation point. In Experiment 2, we introduced a fixation point oscillating in depth to induce vergence eye movements during adaptation and testing. Our results revealed a directional asymmetry in MAE strength, with motion-toward adaptation producing stronger aftereffects than motion-away adaptation in Experiment 1. When eye vergence was induced in Experiment 2, this pattern was reversed, with motion-away adaptation yielding stronger MAEs. These findings suggest an important interaction between adaptation direction and eye vergence state in MID perception, highlighting the complex integration of retinal and extra-retinal signals in the visual system's processing of motion through depth.

Authors:Melvin Mokhtari
Title: Human Digital Twins in Personalized Healthcare: An Overview and Future Perspectives
Abstract:
Digital twins (DTs) are redefining healthcare by paving the way for more personalized, proactive, and intelligent medical interventions. As the shift toward personalized care intensifies, there is a growing need for an individual's virtual replica that delivers the right treatment at the optimal time and in the most effective manner. The emerging concept of a Human Digital Twin (HDT) holds the potential to revolutionize the traditional healthcare system much like digital twins have transformed manufacturing and aviation. An HDT mirrors the physical entity of a human body through a dynamic virtual model that continuously reflects changes in molecular, physiological, emotional, and lifestyle factors. This digital representation not only supports remote monitoring, diagnosis, and prescription but also facilitates surgery, rehabilitation, and overall personalized care, thereby relieving pressure on conventional healthcare frameworks. Despite its promising advantages, there are considerable research challenges to overcome as HDT technology evolves. In this study, I will initially delineate the distinctions between traditional digital twins and HDTs, followed by an exploration of the networking architecture integral to their operation--from data acquisition and communication to computation, management, and decision-making--thereby offering insights into how these innovations may reshape the modern healthcare industry.

Authors:Krishna Subedi
Title: The Reliability of LLMs for Medical Diagnosis: An Examination of Consistency, Manipulation, and Contextual Awareness
Abstract:
Universal healthcare access is critically needed, especially in resource-limited settings. Large Language Models (LLMs) offer promise for democratizing healthcare with advanced diagnostics, but their reliability requires thorough evaluation, especially in trust-dependent environments. This study assesses LLMs' diagnostic reliability focusing on consistency, manipulation resilience, and contextual integration, crucial for safe and ethical use in universal healthcare. We evaluated leading LLMs using 52 patient cases, expanded into variants with demographic changes, symptom rewordings, and exam modifications, while keeping core diagnoses constant. Manipulation susceptibility was tested by inserting misleading narratives and irrelevant details. Contextual awareness was rvaluated by comparing diagnoses with and without patient history. We analyzed diagnostic change rates and response patterns across manipulations. LLMs showed perfect diagnostic consistency for identical data but significant manipulation susceptibility. Gemini had a 40% diagnosis change rate and ChatGPT 30% with irrelevant details. ChatGPT had a higher context influence rate (77.8% vs. Gemini's 55.6%), but both showed limited nuanced contextual integration, exhibiting anchoring bias by prioritizing salient data over context. LLMs' vulnerability to manipulation and limited contextual awareness pose challenges in clinical use. Unlike clinicians, they may overstate diagnostic certainty without validation. Safeguards and domain-specific designs are crucial for reliable healthcare applications. Broad clinical use without oversight is premature and risky. LLMs can enhance diagnostics with responsible use, but future research is needed to improve manipulation resistance and contextual understanding for safe healthcare democratization.

Authors:Niels J. Gommesen
Title: Entangled responsibility: an analysis of citizen science communication and scientific citizenship
Abstract:
The notion of citizen science is often referred to as the means of engaging public members in scientific research activities that can advance the reach and impact of technoscience. Despite this, few studies have addressed how human-machine collaborations in a citizen science context enable and constrain scientific citizenship and citizens' epistemic agencies and reconfigure science-citizen relations, including the process of citizens' engagement in scientific knowledge production. The following will address this gap by analysing the human and nonhuman material and discursive engagements in the citizen science project The Sound of Denmark. Doing so contributes to new knowledge on designing more responsible forms of citizen science engagement that advance civic agencies. Key findings emphasise that citizen science development can benefit from diverse fields such as participatory design research and feminist technoscience. Finally, the paper contributes to a broader debate on the formation of epistemic subjects, scientific citizenship, and responsible designing and evaluation of citizen science. Keywords: scientific citizenship, citizen science communication, epistemic agency, co-design, material-discursive practices, response-ability.

Authors:Marco Giunti
Title: ChatGPT-4 in the Turing Test: A Critical Analysis
Abstract:
This paper critically examines the recent publication "ChatGPT-4 in the Turing Test" by Restrepo Echavarría (2025), challenging its central claims regarding the absence of minimally serious test implementations and the conclusion that ChatGPT-4 fails the Turing Test. The analysis reveals that the criticisms based on rigid criteria and limited experimental data are not fully justified. More importantly, the paper makes several constructive contributions that enrich our understanding of Turing Test implementations. It demonstrates that two distinct formats--the three-player and two-player tests--are both valid, each with unique methodological implications. The work distinguishes between absolute criteria (reflecting an optimal 50% identification rate in a three-player format) and relative criteria (which measure how closely a machine's performance approximates that of a human), offering a more nuanced evaluation framework. Furthermore, the paper clarifies the probabilistic underpinnings of both test types by modeling them as Bernoulli experiments--correlated in the three-player version and uncorrelated in the two-player version. This formalization allows for a rigorous separation between the theoretical criteria for passing the test, defined in probabilistic terms, and the experimental data that require robust statistical methods for proper interpretation. In doing so, the paper not only refutes key aspects of the criticized study but also lays a solid foundation for future research on objective measures of how closely an AI's behavior aligns with, or deviates from, that of a human being.

Authors:S M Sarwar
Title: FedMentalCare: Towards Privacy-Preserving Fine-Tuned LLMs to Analyze Mental Health Status Using Federated Learning Framework
Abstract:
With the increasing prevalence of mental health conditions worldwide, AI-powered chatbots and conversational agents have emerged as accessible tools to support mental health. However, deploying Large Language Models (LLMs) in mental healthcare applications raises significant privacy concerns, especially regarding regulations like HIPAA and GDPR. In this work, we propose FedMentalCare, a privacy-preserving framework that leverages Federated Learning (FL) combined with Low-Rank Adaptation (LoRA) to fine-tune LLMs for mental health analysis. We investigate the performance impact of varying client data volumes and model architectures (e.g., MobileBERT and MiniLM) in FL environments. Our framework demonstrates a scalable, privacy-aware approach for deploying LLMs in real-world mental healthcare scenarios, addressing data security and computational efficiency challenges.

Authors:Raj Korpan
Title: Encoding Inequity: Examining Demographic Bias in LLM-Driven Robot Caregiving
Abstract:
As robots take on caregiving roles, ensuring equitable and unbiased interactions with diverse populations is critical. Although Large Language Models (LLMs) serve as key components in shaping robotic behavior, speech, and decision-making, these models may encode and propagate societal biases, leading to disparities in care based on demographic factors. This paper examines how LLM-generated responses shape robot caregiving characteristics and responsibilities when prompted with different demographic information related to sex, gender, sexuality, race, ethnicity, nationality, disability, and age. Findings show simplified descriptions for disability and age, lower sentiment for disability and LGBTQ+ identities, and distinct clustering patterns reinforcing stereotypes in caregiving narratives. These results emphasize the need for ethical and inclusive HRI design.

Authors:Shadeeb Hossain
Title: Using Artificial Intelligence to Improve Classroom Learning Experience
Abstract:
This paper explores advancements in Artificial Intelligence technologies to enhance classroom learning, highlighting contributions from companies like IBM, Microsoft, Google, and ChatGPT, as well as the potential of brain signal analysis. The focus is on improving students learning experiences by using Machine Learning algorithms to : identify a student preferred learning style and predict academic dropout risk. A Logistic Regression algorithm is applied for binary classification using six predictor variables, such as assessment scores, lesson duration, and preferred learning style, to accurately identify learning preferences. A case study, with 76,519 candidates and 35 predictor variables, assesses academic dropout risk using Logistic Regression, achieving a test accuracy of 87.39%. In comparison, the Stochastic Gradient Descent classifier achieved an accuracy of 83.1% on the same dataset.

Authors:Prashant Mahajan
Title: What is Ethical: AIHED Driving Humans or Human-Driven AIHED? A Conceptual Framework enabling the Ethos of AI-driven Higher education
Abstract:
The rapid integration of Artificial Intelligence (AI) in Higher Education (HE) is transforming personalized learning, administrative automation, and decision-making. However, this progress presents a duality, as AI adoption also introduces ethical and institutional challenges, including algorithmic bias, data privacy risks, and governance inconsistencies. To address these concerns, this study introduces the Human-Driven AI in Higher Education (HD-AIHED) Framework, ensuring compliance with UNESCO and OECD ethical standards. This conceptual research employs a qualitative meta-synthesis approach, integrating qualitative and quantitative studies to identify patterns, contradictions, and gaps in AI adoption within HE. It reinterprets existing datasets through theoretical and ethical lenses to develop governance frameworks. The study applies a participatory integrated co-system, Phased Human Intelligence, SWOC analysis, and AI ethical review boards to assess AI readiness and governance strategies for universities and HE institutions. The HD-AIHED model bridges AI research gaps, addresses global real-time challenges, and provides tailored, scalable, and ethical strategies for diverse educational contexts. By emphasizing interdisciplinary collaboration among stakeholders, this study envisions AIHED as a transparent and equitable force for innovation. The HD-AIHED framework ensures AI acts as a collaborative and ethical enabler rather than a disruptive replacement for human intelligence while advocating for responsible AI implementation in HE.

Authors:Matteo Grella
Title: Preliminary Report: Enhancing Role Differentiation in Conversational HCI Through Chromostereopsis
Abstract:
We propose leveraging chromostereopsis, a perceptual phenomenon inducing depth perception through color contrast, as a novel approach to visually differentiating conversational roles in text-based AI interfaces. This method aims to implicitly communicate role hierarchy and add a subtle sense of physical space.

Authors:David S. Johnson
Title: Higher Stakes, Healthier Trust? An Application-Grounded Approach to Assessing Healthy Trust in High-Stakes Human-AI Collaboration
Abstract:
Human-AI collaboration is increasingly promoted to improve high-stakes decision-making, yet its benefits have not been fully realized. Application-grounded evaluations are needed to better evaluate methods for improving collaboration but often require domain experts, making studies costly and limiting their generalizability. Current evaluation methods are constrained by limited public datasets and reliance on proxy tasks. To address these challenges, we propose an application-grounded framework for large-scale, online evaluations of vision-based decision-making tasks. The framework introduces Blockies, a parametric approach for generating datasets of simulated diagnostic tasks, offering control over the traits and biases in the data used to train real-world models. These tasks are designed to be easy to learn but difficult to master, enabling participation by non-experts. The framework also incorporates storytelling and monetary incentives to manipulate perceived task stakes. An initial empirical study demonstrated that the high-stakes condition significantly reduced healthy distrust of AI, despite longer decision-making times. These findings underscore the importance of perceived stakes in fostering healthy distrust and demonstrate the framework's potential for scalable evaluation of high-stakes Human-AI collaboration.

Authors:Tram Thi Minh Tran
Title: From Everyday Technologies to Augmented Reality: An Autoethnographic Study of Presence and Engagement
Abstract:
Digital technologies are reshaping how people experience their surroundings, often pulling focus toward virtual spaces and making it harder to stay present and engaged. Wearable augmented reality (AR), by embedding digital information into the physical world, may further immerse users in digital layers. Yet paradoxically, it also holds the potential to support presence and engagement. To explore this possibility, this study adopts an autoethnographic approach, providing a first-person perspective on how everyday technologies shape real-world engagement. Over four weeks, 20 experiences were documented, capturing interactions with phones, laptops, and fitness trackers in various contexts. The findings reveal nuanced patterns of technology use and propose design implications for wearable AR, emphasising its potential for personalised, context-aware interventions that support meaningful real-world connection. This work contributes to the discourse on digital well-being, suggesting that wearable AR can evolve beyond digital augmentation to help users reconnect with their surroundings.

Authors:Tram Thi Minh Tran
Title: Doraemon's Gadget Lab: Unpacking Human Needs and Interaction Design in Speculative Technology
Abstract:
Speculative technologies in science fiction have long inspired advancements in Human-Computer Interaction (HCI). Doraemon, a Japanese manga featuring a robotic cat from the 22nd century, presents an extensive collection of futuristic gadgets-an underexplored source of speculative technologies. This study systematically analyses 379 of these gadgets, categorising them into 33 subcategories within 10 high-level groupings, to examine the fundamental human needs they address, their parallels to contemporary technologies, and their potential insights for HCI design. The findings reveal that while human needs remain constant, the ways in which technology fulfils them differ. Doraemon's gadgets emphasise tangible, single-purpose interactions with built-in reversibility, contrasting with the increasing complexity and software-driven nature of modern systems. By examining these speculative technologies, this study highlights alternative interaction paradigms that challenge current HCI trends and offer inspiration for future user-centred innovation.

Authors:Juan Manuel Durán
Title: Beyond transparency: computational reliabilism as an externalist epistemology of algorithms
Abstract:
This chapter is interested in the epistemology of algorithms. As I intend to approach the topic, this is an issue about epistemic justification. Current approaches to justification emphasize the transparency of algorithms, which entails elucidating their internal mechanisms -- such as functions and variables -- and demonstrating how (or that) these produce outputs. Thus, the mode of justification through transparency is contingent on what can be shown about the algorithm and, in this sense, is internal to the algorithm. In contrast, I advocate for an externalist epistemology of algorithms that I term computational reliabilism (CR). While I have previously introduced and examined CR in the field of computer simulations ([42, 53, 4]), this chapter extends this reliabilist epistemology to encompass a broader spectrum of algorithms utilized in various scientific disciplines, with a particular emphasis on machine learning applications. At its core, CR posits that an algorithm's output is justified if it is produced by a reliable algorithm. A reliable algorithm is one that has been specified, coded, used, and maintained utilizing reliability indicators. These reliability indicators stem from formal methods, algorithmic metrics, expert competencies, cultures of research, and other scientific endeavors. The primary aim of this chapter is to delineate the foundations of CR, explicate its operational mechanisms, and outline its potential as an externalist epistemology of algorithms.

Authors:Linzhuo li
Title: Architectural Vulnerability and Reliability Challenges in AI Text Annotation: A Survey-Inspired Framework with Independent Probability Assessment
Abstract:
Large Language Models, despite their power, have a fundamental architectural vulnerability stemming from their causal transformer design -- order sensitivity. This architectural constraint may distorts classification outcomes when prompt elements like label options are reordered, revealing a theoretical gap between accuracy metrics and true model reliability. The paper conceptualizes this vulnerability through the lens of survey methodology, where respondent biases parallel LLM positional dependencies. Empirical evidence using the F1000 biomedical dataset across three scales of LLaMA3.1 models (8B, 70B, 405B) demonstrates that these architectural constraints produce inconsistent annotations under controlled perturbations. The paper advances a practical solution for social science - Independent Probability Assessment - which decouples label evaluation to circumvent positional bias inherent in sequential processing. This approach yields an information-theoretic reliability measure (R-score) that quantifies annotation robustness at the case level. The findings establish that architectural vulnerabilities in causal transformers require methodological innovations beyond accuracy metrics to ensure valid social science inference, as demonstrated through downstream regression analyses where order-sensitive annotations significantly alter substantive conclusions about scientific impact.

Authors:Krishnaveni Katta
Title: Analyzing User Perceptions of Large Language Models (LLMs) on Reddit: Sentiment and Topic Modeling of ChatGPT and DeepSeek Discussions
Abstract:
While there is an increased discourse on large language models (LLMs) like ChatGPT and DeepSeek, there is no comprehensive understanding of how users of online platforms, like Reddit, perceive these models. This is an important omission because public opinion can influence AI development, trust, and future policy. This study aims at analyzing Reddit discussions about ChatGPT and DeepSeek using sentiment and topic modeling to advance the understanding of user attitudes. Some of the significant topics such as trust in AI, user expectations, potential uses of the tools, reservations about AI biases, and ethical implications of their use are explored in this study. By examining these concerns, the study provides a sense of how public sentiment might shape the direction of AI development going forward. The report also mentions whether users have faith in the technology and what they see as its future. A word frequency approach is used to identify broad topics and sentiment trends. Also, topic modeling through the Latent Dirichlet Allocation (LDA) method identifies top topics in users' language, for example, potential benefits of LLMs, their technological applications, and their overall social ramifications. The study aims to inform developers and policymakers by making it easier to see how users comprehend and experience these game-changing technologies.

Authors:Jason R. C. Nurse
Title: To Patch or Not to Patch: Motivations, Challenges, and Implications for Cybersecurity
Abstract:
As technology has become more embedded into our society, the security of modern-day systems is paramount. One topic which is constantly under discussion is that of patching, or more specifically, the installation of updates that remediate security vulnerabilities in software or hardware systems. This continued deliberation is motivated by complexities involved with patching; in particular, the various incentives and disincentives for organizations and their cybersecurity teams when deciding whether to patch. In this paper, we take a fresh look at the question of patching and critically explore why organizations and IT/security teams choose to patch or decide against it (either explicitly or due to inaction). We tackle this question by aggregating and synthesizing prominent research and industry literature on the incentives and disincentives for patching, specifically considering the human aspects in the context of these motives. Through this research, this study identifies key motivators such as organizational needs, the IT/security team's relationship with vendors, and legal and regulatory requirements placed on the business and its staff. There are also numerous significant reasons discovered for why the decision is taken not to patch, including limited resources (e.g., person-power), challenges with manual patch management tasks, human error, bad patches, unreliable patch management tools, and the perception that related vulnerabilities would not be exploited. These disincentives, in combination with the motivators above, highlight the difficult balance that organizations and their security teams need to maintain on a daily basis. Finally, we conclude by discussing implications of these findings and important future considerations.

Authors:Gabrielle O'Brien
Title: How Scientists Use Large Language Models to Program
Abstract:
Scientists across disciplines write code for critical activities like data collection and generation, statistical modeling, and visualization. As large language models that can generate code have become widely available, scientists may increasingly use these models during research software development. We investigate the characteristics of scientists who are early-adopters of code generating models and conduct interviews with scientists at a public, research-focused university. Through interviews and reviews of user interaction logs, we see that scientists often use code generating models as an information retrieval tool for navigating unfamiliar programming languages and libraries. We present findings about their verification strategies and discuss potential vulnerabilities that may emerge from code generation practices unknowingly influencing the parameters of scientific analyses.

Authors:Takuya Maeda
Title: Walkthrough of Anthropomorphic Features in AI Assistant Tools
Abstract:
In this paper, we attempt to understand the anthropomorphic features of chatbot outputs and how these features provide a discursive frame for human-AI interactions. To do so, we explore the use of a prompt-based walkthrough method with two phases: (1) interview-style prompting to reveal the chatbots' context of expected use and (2) roleplaying-type prompting to evoke everyday use scenarios and typical chatbot outputs. We applied this method to catalogue anthropomorphic features across four different LLM chatbots, finding that anthropomorphism was exhibited as both subjective language and a sympathetic conversational tone. We also found that socio-emotional cues in prompts increase the incidence of anthropomorphic expressions in outputs. We argue that the prompt-based walkthrough method was successful in stimulating social role performance in LLM chatbots and in eliciting a variety of anthropomorphic features, making it useful in the study of interaction-based algorithmic harms where users project inappropriate social roles onto LLM-based tools.

Authors:Thomas Übellacker
Title: Making Sense of AI Limitations: How Individual Perceptions Shape Organizational Readiness for AI Adoption
Abstract:
This study investigates how individuals' perceptions of artificial intelligence (AI) limitations influence organizational readiness for AI adoption. Through semi-structured interviews with seven AI implementation experts, analyzed using the Gioia methodology, the research reveals that organizational readiness emerges through dynamic interactions between individual sensemaking, social learning, and formal integration processes. The findings demonstrate that hands-on experience with AI limitations leads to more realistic expectations and increased trust, mainly when supported by peer networks and champion systems. Organizations that successfully translate these individual and collective insights into formal governance structures achieve more sustainable AI adoption. The study advances theory by showing how organizational readiness for AI adoption evolves through continuous cycles of individual understanding, social learning, and organizational adaptation. These insights suggest that organizations should approach AI adoption not as a one-time implementation but as an ongoing strategic learning process that balances innovation with practical constraints. The research contributes to organizational readiness theory and practice by illuminating how micro-level perceptions and experiences shape macro-level adoption outcomes.

Authors:Majid Behravan
Title: Generative AI Framework for 3D Object Generation in Augmented Reality
Abstract:
This thesis presents a framework that integrates state-of-the-art generative AI models for real-time creation of three-dimensional (3D) objects in augmented reality (AR) environments. The primary goal is to convert diverse inputs, such as images and speech, into accurate 3D models, enhancing user interaction and immersion. Key components include advanced object detection algorithms, user-friendly interaction techniques, and robust AI models like Shap-E for 3D generation. Leveraging Vision Language Models (VLMs) and Large Language Models (LLMs), the system captures spatial details from images and processes textual information to generate comprehensive 3D objects, seamlessly integrating virtual objects into real-world environments. The framework demonstrates applications across industries such as gaming, education, retail, and interior design. It allows players to create personalized in-game assets, customers to see products in their environments before purchase, and designers to convert real-world objects into 3D models for real-time visualization. A significant contribution is democratizing 3D model creation, making advanced AI tools accessible to a broader audience, fostering creativity and innovation. The framework addresses challenges like handling multilingual inputs, diverse visual data, and complex environments, improving object detection and model generation accuracy, as well as loading 3D models in AR space in real-time. In conclusion, this thesis integrates generative AI and AR for efficient 3D model generation, enhancing accessibility and paving the way for innovative applications and improved user interactions in AR environments.

Authors:Renlong Jie
Title: Learning to Collaborate: A Capability Vectors-based Architecture for Adaptive Human-AI Decision Making
Abstract:
Human-AI collaborative decision making has emerged as a pivotal field in recent years. Existing methods treat human and AI as different entities when designing human-AI systems. However, as the decision capabilities of AI models become closer to human beings, it is necessary to build a uniform framework for capability modeling and integrating. In this study, we propose a general architecture for human-AI collaborative decision making, wherein we employ learnable capability vectors to represent the decision-making capabilities of both human experts and AI models. These capability vectors are utilized to determine the decision weights of multiple decision makers, taking into account the contextual information of each decision task. Our proposed architecture accommodates scenarios involving multiple human-AI decision makers with varying capabilities. Furthermore, we introduce a learning-free approach to establish a baseline using global collaborative weights. Experiments on image classification and hate speech detection demonstrate that our proposed architecture significantly outperforms the current state-of-the-art methods in image classification and sentiment analysis, especially for the case with large non-expertise capability levels. Overall, our method provides an effective and robust collaborative decision-making approach that integrates diverse human/AI capabilities within a unified framework.

Authors:Severin Field
Title: Why do Experts Disagree on Existential Risk and P(doom)? A Survey of AI Experts
Abstract:
The development of artificial general intelligence (AGI) is likely to be one of humanity's most consequential technological advancements. Leading AI labs and scientists have called for the global prioritization of AI safety citing existential risks comparable to nuclear war. However, research on catastrophic risks and AI alignment is often met with skepticism, even by experts. Furthermore, online debate over the existential risk of AI has begun to turn tribal (e.g. name-calling such as "doomer" or "accelerationist"). Until now, no systematic study has explored the patterns of belief and the levels of familiarity with AI safety concepts among experts. I surveyed 111 AI experts on their familiarity with AI safety concepts, key objections to AI safety, and reactions to safety arguments. My findings reveal that AI experts cluster into two viewpoints -- an "AI as controllable tool" and an "AI as uncontrollable agent" perspective -- diverging in beliefs toward the importance of AI safety. While most experts (78%) agreed or strongly agreed that "technical AI researchers should be concerned about catastrophic risks", many were unfamiliar with specific AI safety concepts. For example, only 21% of surveyed experts had heard of "instrumental convergence," a fundamental concept in AI safety predicting that advanced AI systems will tend to pursue common sub-goals (such as self-preservation). The least concerned participants were the least familiar with concepts like this, suggesting that effective communication of AI safety should begin with establishing clear conceptual foundations in the field.

Authors:Jennifer Haase
Title: Augmenting Coaching with GenAI: Insights into Use, Effectiveness, and Future Potential
Abstract:
The integration of generative AI (GenAI) tools, particularly large language models (LLMs), is transforming professional coaching workflows. This study explores how coaches use GenAI, the perceived benefits and limitations of these tools, and broader attitudes toward AI-assisted coaching. A survey of 205 coaching professionals reveals widespread adoption of GenAI for research, content creation, and administrative support, while its role in relational and interpretative coaching remains limited. Findings indicate that AI literacy and perceived AI impact strongly predict GenAI adoption, with positive attitudes fostering greater use. Ethical considerations, particularly transparency and data privacy, are a key concern, with frequent AI users demonstrating greater ethical awareness. Regression analyses show that while perceived effectiveness drives GenAI adoption, concerns about AI replacing human coaches do not significantly influence usage. Coaches express interest in future AI capabilities that enhance personalization, real-time feedback, and administrative automation while maintaining human oversight. The study highlights that GenAI functions best as an augmentation tool rather than a replacement, emphasizing the need for AI literacy training, ethical guidelines, and human-centered AI integration. These findings contribute to the ongoing discourse on human-AI collaboration, advocating for responsible and effective AI adoption in professional coaching.

Authors:Karl John Villardar
Title: Semantic Decomposition and Selective Context Filtering -- Text Processing Techniques for Context-Aware NLP-Based Systems
Abstract:
In this paper, we present two techniques for use in context-aware systems: Semantic Decomposition, which sequentially decomposes input prompts into a structured and hierarchal information schema in which systems can parse and process easily, and Selective Context Filtering, which enables systems to systematically filter out specific irrelevant sections of contextual information that is fed through a system's NLP-based pipeline. We will explore how context-aware systems and applications can utilize these two techniques in order to implement dynamic LLM-to-system interfaces, improve an LLM's ability to generate more contextually cohesive user-facing responses, and optimize complex automated workflows and pipelines.

Authors:Thomas Langerak
Title: User Agency and System Automation in Interactive Intelligent Systems
Abstract:
Balancing user agency and system automation is essential for effective human-AI interactions. Fully automated systems can deliver efficiency but risk undermining usability and user autonomy, while purely manual tools are often inefficient and fail to enhance user capabilities. This dissertation addresses the question: "How can we balance user agency and system automation for interactions with intelligent systems?" We present four main contributions. First, we develop a spherical electromagnet that provides adjustable forces on an untethered tool, allowing haptic feedback while preserving user agency. Second, we create an integrated sensing and actuation system that tracks a passive magnetic tool in 3D and delivers haptic feedback without external tracking. Third, we propose an optimal control method for electromagnetic haptic guidance that balances user input with system control, enabling users to adjust trajectories and speed. Finally, we introduce a model-free reinforcement learning approach for adaptive interfaces that learns interface adaptations without heuristics or real user data. Our simulations and user studies show that shared control significantly outperforms naive strategies. By incorporating explicit or implicit models of human behavior into control strategies, intelligent systems can better account for user agency. We demonstrate that the trade-off between agency and automation is both an algorithmic challenge and an engineering concern, shaped by the design of physical devices and user interfaces. We advocate an integrated, end-to-end approach-combining algorithmic, engineering, and design perspectives-to enable more intuitive and effective interactions with intelligent systems.

Authors:Alexander Lengert
Title: 2FA: Navigating the Challenges and Solutions for Inclusive Access
Abstract:
The digital age requires strong security measures to protect online activities. Two-Factor Authentication (2FA) has emerged as a critical solution. However, its implementation presents significant challenges, particularly in terms of accessibility for people with disabilities. This paper examines the intricacies of deploying 2FA in a way that is secure and accessible to all users by outlining the concrete challenges for people who are affected by various types of impairments. This research investigates the implications of 2FA on digital inclusivity and proposes solutions to enhance accessibility. An analysis was conducted to examine the implementation and availability of various 2FA methods across popular online platforms. The results reveal a diverse landscape of authentication strategies. While 2FA significantly improves account security, its current adoption is hampered by inconsistencies across platforms and a lack of standardised, accessible options for users with disabilities. Future advancements in 2FA technologies, including but not limited to autofill capabilities and the adoption of Fast IDentity Onlines (FIDO) protocols, offer possible directions for more inclusive authentication mechanisms. However, ongoing research is necessary to address the evolving needs of users with disabilities and to mitigate new security challenges. This paper proposes a collaborative approach among stakeholders to ensure that security improvements do not compromise accessibility. It promotes a digital environment where security and inclusivity mutually reinforce each other.

Authors:Pawel Weichbroth
Title: Factors influencing the perceived usability of mobile applications
Abstract:
The advent of mobile applications has brought new frontiers to usability studies. So far, the ongoing research has undertaken considerable efforts to model usability in such new challenging context. One of these endeavors is the PACMAD+3 model, which consists of a total of ten unique factors. However, to the best of our knowledge, little or no effort has been made to empirically evaluate these factors against perceived influence. With this in mind, the objective of this study is to explore this issue by evaluating the selected factors. To achieve this goal in a reliable and reproducible manner, we took advantage of previous attempts to conceptualize the mobile usability factors, but we contribute by operationalizing these theoretical constructs into observable and measurable phenomena. In this sense, the survey was designed and carried out on the sample of 838 users in order to evaluate the significance of the PACMAD+3 factors on the perceived usability of mobile applications. Our findings show that, on average, users rated efficiency as highly important, while the remaining seven, namely: cognitive load, errors, learnability, operability, effectiveness, memorability, and understandability, were rated as moderately important. The discussed results provide insight into the importance of usability attributes and quality criteria from both perspectives, ultimately facilitating and securing the design and development of mobile applications. Therefore, our research contributes to the field of human-computer interaction, with both theoretical and practical implications for mobile usability researchers, UX designers, and quality assurance engineers.

Authors:Ankolika De
Title: "Business on WhatsApp is tough now -- but am I really a businesswoman?" Exploring Challenges with Adapting to Changes in WhatsApp Business
Abstract:
This study examines how WhatsApp has evolved from a personal communication tool to a professional platform, focusing on its use by small business owners in India. Initially embraced in smaller, rural communities for its ease of use and familiarity, WhatsApp played a crucial role in local economies. However, as Meta introduced WhatsApp Business with new, formalized features, users encountered challenges in adapting to the more complex and costly platform. Interviews with 14 small business owners revealed that while they adapted creatively, they felt marginalized by the advanced tools. This research contributes to HCI literature by exploring the transition from personal to professional use and introduces the concept of Coercive Professionalization. It highlights how standardization by large tech companies affects marginalized users, exacerbating power imbalances and reinforcing digital colonialism, concluding with design implications for supporting community-based appropriations.

Authors:Wei Dong
Title: The Ann Arbor Architecture for Agent-Oriented Programming
Abstract:
In this paper, we reexamine prompt engineering for large language models through the lens of automata theory. We argue that language models function as automata and, like all automata, should be programmed in the languages they accept, a unified collection of all natural and formal languages. Therefore, traditional software engineering practices--conditioned on the clear separation of programming languages and natural languages--must be rethought. We introduce the Ann Arbor Architecture, a conceptual framework for agent-oriented programming of language models, as a higher-level abstraction over raw token generation, and provide a new perspective on in-context learning. Based on this framework, we present the design of our agent platform Postline, and report on our initial experiments in agent training.

Authors:Staas de Jong
Title: Human noise at the fingertip: Positional (non)control under varying haptic $\times$ musical conditions (Appendices included)
Abstract:
As technologies and interfaces for the instrumental control of musical sound get ever better at tracking aspects of human position and motion in space, a fundamental problem emerges: Unintended or even counter-intentional control may result when humans themselves become a source of positional noise. A clear case of what is meant by this is the "stillness movement" of a body part occurring despite the simultaneous explicit intention for that body part to remain still. In this paper, we present the results of a randomized, controlled experiment investigating this phenomenon along a vertical axis relative to the human fingertip. The results include characterizations of both the spatial distribution and frequency distribution of the stillness movement observed. Also included are results indicating a possible role for constant forces and viscosities in reducing stillness movement amplitude, thereby potentially enabling the implementation of more positional control of musical sound within the same available spatial range. Importantly, the above is summarized in a form that is directly interpretable for anyone designing technologies, interactions, or performances that involve fingertip control of musical sound. Also, a complete data set of the experimental results is included in the separate Appendices to this paper, again in a format that is directly interpretable.

Authors:Dongrui Wu
Title: Revisiting Euclidean Alignment for Transfer Learning in EEG-Based Brain-Computer Interfaces
Abstract:
Due to large intra-subject and inter-subject variabilities of electroencephalogram (EEG) signals, EEG-based brain-computer interfaces (BCIs) usually need subject-specific calibration to tailor the decoding algorithm for each new subject, which is time-consuming and user-unfriendly, hindering their real-world applications. Transfer learning (TL) has been extensively used to expedite the calibration, by making use of EEG data from other subjects/sessions. An important consideration in TL for EEG-based BCIs is to reduce the data distribution discrepancies among different subjects/sessions, to avoid negative transfer. Euclidean alignment (EA) was proposed in 2020 to address this challenge. Numerous experiments from 13 different BCI paradigms demonstrated its effectiveness and efficiency. This paper revisits EA, explaining its procedure and correct usage, introducing its applications and extensions, and pointing out potential new research directions. It should be very helpful to BCI researchers, especially those who are working on EEG signal decoding.

Authors:Chak Man Lam
Title: LayeredSense: Hierarchical Recognition of Complex Daily Activities Using Wearable Sensors
Abstract:
Daily activity recognition has gained prominence due to its applications in context-aware computing. Current methods primarily rely on supervised learning for detecting simple, repetitive activities. This paper introduces LayeredSense, a novel framework designed to recognize complex activities by decomposing them into smaller, easily identifiable unit patterns. Utilizing a Myo armband for data collection, our system processes inertial measurement unit (IMU) data to identify basic actions like walking, running, and jumping. These actions are then aggregated to infer more intricate activities such as playing sports or working. LayeredSense employs Gaussian Mixture Models for new pattern detection and machine learning algorithms, including Random Forests, for real-time activity recognition. Our system demonstrates high accuracy in identifying both unit patterns and complex activities, providing a scalable solution for comprehensive daily activity monitoring

Authors:Staas de Jong
Title: Computed fingertip touch for the instrumental control of musical sound with an excursion on the computed retinal afterimage
Abstract:
In this thesis, we present an articulated, empirical view on what human music making is, and on how this fundamentally relates to computation. The experimental evidence which we obtained seems to indicate that this view can be used as a tool, to systematically generate models, hypotheses and new technologies that enable an ever more complete answer to the fundamental question as to what forms of instrumental control of musical sound are possible to implement. This also entails the development of two novel transducer technologies for computed fingertip touch: The cyclotactor (CT) system, which provides fingerpad-orthogonal force output while tracking surface-orthogonal fingertip movement; and the kinetic surface friction transducer (KSFT) system, which provides fingerpad-parallel force output while tracking surface-parallel fingertip movement. In addition to the main research, the thesis also contains two research excursions, which are due to the nature of the Ph.D. position. The first excursion shows how repeated and varying pressing movements on the already held-down key of a computer keyboard can be used both to simplify existing user interactions and to implement new ones, that allow the rapid yet detailed navigation of multiple possible interaction outcomes. The second excursion shows that automated computational techniques can display shape specifically in the retinal afterimage, a well-known effect in the human visual system.

Authors:Swaroop Panda
Title: A Framework for LLM-powered Design Assistants
Abstract:
Design assistants are frameworks, tools or applications intended to facilitate both the creative and technical facets of design processes. Large language models (LLMs) are AI systems engineered to analyze and produce text resembling human language, leveraging extensive datasets. This study introduces a framework wherein LLMs are employed as Design Assistants, focusing on three key modalities within the Design Process: Idea Exploration, Dialogue with Designers, and Design Evaluation. Importantly, our framework is not confined to a singular design process but is adaptable across various processes.

Authors:Murat Kurt
Title: Mul2MAR: A Multi-Marker Mobile Augmented Reality Application for Improved Visual Perception
Abstract:
This paper presents an inexpensive Augmented Reality (AR) application which is aimed to use with mobile devices. Our application is a marker based AR application, and it can be used by inexpensive three dimensional (3D) red-cyan glasses. In our AR application, we combine left and right views without creating any uncomfortable situation for human eyes. We validate our mobile AR application on several objects, scenes, and views. We show that 3D AR perception can be obtained by using our inexpensive AR application [Güngör and Kurt 2014].

Authors:Yan Zhang
Title: Implicit Communication of Contextual Information in Human-Robot Collaboration
Abstract:
Implicit communication is crucial in human-robot collaboration (HRC), where contextual information, such as intentions, is conveyed as implicatures, forming a natural part of human interaction. However, enabling robots to appropriately use implicit communication in cooperative tasks remains challenging. My research addresses this through three phases: first, exploring the impact of linguistic implicatures on collaborative tasks; second, examining how robots' implicit cues for backchanneling and proactive communication affect team performance and perception, and how they should adapt to human teammates; and finally, designing and evaluating a multi-LLM robotics system that learns from human implicit communication. This research aims to enhance the natural communication abilities of robots and facilitate their integration into daily collaborative activities.

Authors:Pawel Weichbroth
Title: Usability Issues With Mobile Applications: Insights From Practitioners and Future Research Directions
Abstract:
This study is motivated by two key considerations: the significant benefits mobile applications offer individuals and businesses, and the limited empirical research on usability challenges. To address this gap, we conducted structured interviews with twelve experts to identify common usability issues. Our findings highlight the top five concerns related to: information architecture, user interface design, performance, interaction patterns, and aesthetics. In addition, we identify five key directions for future research: usability in AI-powered mobile applications, augmented reality (AR) and virtual reality (VR), multimodal interactions, personalized mobile ecosystems, and accessibility. Our study provides insights into emerging usability challenges and trends, contributing to both the theory and practice of mobile human-computer interaction.

Authors:Rommel Salas-Guerra
Title: Cognitive AI framework: advances in the simulation of human thought
Abstract:
The Human Cognitive Simulation Framework represents a significant advancement in integrating human cognitive capabilities into artificial intelligence systems. By merging short-term memory (conversation context), long-term memory (interaction context), advanced cognitive processing, and efficient knowledge management, it ensures contextual coherence and persistent data storage, enhancing personalization and continuity in human-AI interactions. The framework employs a unified database that synchronizes these contexts while incorporating logical, creative, and analog processing modules inspired by human brain hemispheric functions to perform structured tasks and complex inferences. Dynamic knowledge updates enable real-time integration, improving adaptability and fostering applications in education, behavior analysis, and knowledge management. Despite its potential to process vast data volumes and enhance user experience, challenges remain in scalability, cognitive bias mitigation, and ethical compliance. This framework lays the foundation for future research in continuous learning algorithms, sustainability, and multimodal adaptability, positioning Cognitive AI as a transformative model in emerging fields.

Authors:Julian Tagnin
Title: Elucidation of the Concept of Consciousness from the Theory of Non-Human Communication Agents
Abstract:
This article focuses on elucidating the concept of consciousness from a relational and post-phenomenological theory of non-human communication agents (ANHC). Specifically, we explore the contributions of Thomas Metzinger s Self Model Theory, Katherine Hayles conceptualizations of non-conscious cognitive processes centered on knowledge processing phenomena shared between biological and technical systems and Lenore and Manuel Blum s theoretical perspective on computation, which defines consciousness as an emergent phenomenon of complex computational systems, arising from the appropriate organization of their inorganic materiality. Building on interactions with non-human cognitive agents, among other factors, the explainability of sociotechnical systems challenges the humanistic common sense of modern philosophy and science. This critical integration of various approaches ultimately questions other concepts associated with consciousness, such as autonomy, freedom, and mutual responsibility. The aim is to contribute to a necessary discussion for designing new frameworks of understanding that pave the way toward an ethical and pragmatic approach to addressing contemporary challenges in the design, regulation, and interaction with ANHC. Such frameworks, in turn, enable a more inclusive and relational understanding of agency in an interconnected world.

Authors:Leonel Morgado
Title: Immersion for AI: Immersive Learning with Artificial Intelligence
Abstract:
This work reflects upon what Immersion can mean from the perspective of an Artificial Intelligence (AI). Applying the lens of immersive learning theory, it seeks to understand whether this new perspective supports ways for AI participation in cognitive ecologies. By treating AI as a participant rather than a tool, it explores what other participants (humans and other AIs) need to consider in environments where AI can meaningfully engage and contribute to the cognitive ecology, and what the implications are for designing such learning environments. Drawing from the three conceptual dimensions of immersion - System, Narrative, and Agency - this work reinterprets AIs in immersive learning contexts. It outlines practical implications for designing learning environments where AIs are surrounded by external digital services, can interpret a narrative of origins, changes, and structural developments in data, and dynamically respond, making operational and tactical decisions that shape human-AI collaboration. Finally, this work suggests how these insights might influence the future of AI training, proposing that immersive learning theory can inform the development of AIs capable of evolving beyond static models. This paper paves the way for understanding AI as an immersive learner and participant in evolving human-AI cognitive ecosystems.

Authors:Aung Pyae
Title: What is Human-Centeredness in Human-Centered AI? Development of Human-Centeredness Framework and AI Practitioners' Perspectives
Abstract:
There is no consensus on what constitutes human-centeredness in AI, and existing frameworks lack empirical validation. This study addresses this gap by developing a hierarchical framework of 26 attributes of human-centeredness, validated through practitioner input. The framework prioritizes ethical foundations (e.g., fairness, transparency), usability, and emotional intelligence, organized into four tiers: ethical foundations, usability, emotional and cognitive dimensions, and personalization. By integrating theoretical insights with empirical data, this work offers actionable guidance for AI practitioners, promoting inclusive design, rigorous ethical standards, and iterative user feedback. The framework provides a robust foundation for creating AI systems that enhance human well-being and align with societal values. Future research should explore how these attributes evolve across cultural and industrial contexts, ensuring the framework remains relevant as AI technologies advance.

Authors:Christopher J. MacLellan
Title: Model Human Learners: Computational Models to Guide Instructional Design
Abstract:
Instructional designers face an overwhelming array of design choices, making it challenging to identify the most effective interventions. To address this issue, I propose the concept of a Model Human Learner, a unified computational model of learning that can aid designers in evaluating candidate interventions. This paper presents the first successful demonstration of this concept, showing that a computational model can accurately predict the outcomes of two human A/B experiments -- one testing a problem sequencing intervention and the other testing an item design intervention. It also demonstrates that such a model can generate learning curves without requiring human data and provide theoretical insights into why an instructional intervention is effective. These findings lay the groundwork for future Model Human Learners that integrate cognitive and learning theories to support instructional design across diverse tasks and interventions.

Authors:Aung Pyae
Title: The Human-AI Handshake Framework: A Bidirectional Approach to Human-AI Collaboration
Abstract:
Human-AI collaboration is evolving from a tool-based perspective to a partnership model where AI systems complement and enhance human capabilities. Traditional approaches often limit AI to a supportive role, missing the potential for reciprocal relationships where both human and AI inputs contribute to shared goals. Although Human-Centered AI (HcAI) frameworks emphasize transparency, ethics, and user experience, they often lack mechanisms for genuine, dynamic collaboration. The "Human-AI Handshake Model" addresses this gap by introducing a bi-directional, adaptive framework with five key attributes: information exchange, mutual learning, validation, feedback, and mutual capability augmentation. These attributes foster balanced interaction, enabling AI to act as a responsive partner, evolving with users over time. Human enablers like user experience and trust, alongside AI enablers such as explainability and responsibility, facilitate this collaboration, while shared values of ethics and co-evolution ensure sustainable growth. Distinct from existing frameworks, this model is reflected in tools like GitHub Copilot and ChatGPT, which support bi-directional learning and transparency. Challenges remain, including maintaining ethical standards and ensuring effective user oversight. Future research will explore these challenges, aiming to create a truly collaborative human-AI partnership that leverages the strengths of both to achieve outcomes beyond what either could accomplish alone.

Authors:Ignasi Sole
Title: Evolving Performance Practices in Beethoven's Cello Sonatas: Tempo, Portamento, and Historical Interpretation of the First Movements
Abstract:
This paper examines the evolving performance practices of Ludwig van Beethoven's cello sonatas, with a particular focus on tempo and portamento between 1930 and 2012. It integrates analyses of 22 historical recordings, advancements in recording technology to shed light on changes in interpretative approaches. By comparing Beethoven's metronome markings, as understood through contemporaries such as Czerny and Moscheles, with their application in modern performances, my research highlights notable deviations. These differences prove the challenges performers face in reconciling historical tempos with the demands of contemporary performance practice. My study pays special attention to the diminishing use of audible portamento in the latter half of the 20th century, contrasted with a gradual increase in tempo after 1970. This development is linked to broader cultural and pedagogical shifts, including the adoption of fingering techniques that reduce hand shifts, thereby facilitating greater technical precision at faster tempos. Nonetheless, my study identifies the persistence of 'silent portamento' as an expressive device, allowing performers to retain stylistic expression without compromising rhythmic integrity. My paper offers valuable insights for performers and scholars alike, advocating a critical reassessment of Beethoven's tempo markings and the nuanced application of portamento in modern performance practice.

Authors:Dian Tjondronegoro
Title: TOAST Framework: A Multidimensional Approach to Ethical and Sustainable AI Integration in Organizations
Abstract:
Artificial Intelligence (AI) has emerged as a transformative technology with the potential to revolutionize various sectors, from healthcare to finance, education, and beyond. However, successfully implementing AI systems remains a complex challenge, requiring a comprehensive and methodologically sound framework. This paper contributes to this challenge by introducing the Trustworthy, Optimized, Adaptable, and Socio-Technologically harmonious (TOAST) framework. It draws on insights from various disciplines to align technical strategy with ethical values, societal responsibilities, and innovation aspirations. The TOAST framework is a novel approach designed to guide the implementation of AI systems, focusing on reliability, accountability, technical advancement, adaptability, and socio-technical harmony. By grounding the TOAST framework in healthcare case studies, this paper provides a robust evaluation of its practicality and theoretical soundness in addressing operational, ethical, and regulatory challenges in high-stakes environments, demonstrating how adaptable AI systems can enhance institutional efficiency, mitigate risks like bias and data privacy, and offer a replicable model for other sectors requiring ethically aligned and efficient AI integration.

Authors:Christina Lukas
Title: A Study about Distribution and Acceptance of Conversational Agents for Mental Health in Germany: Keep the Human in the Loop?
Abstract:
Good mental health enables individuals to cope with the normal stresses of life. In Germany, approximately one-quarter of the adult population is affected by mental illnesses. Teletherapy and digital health applications are available to bridge gaps in care and relieve healthcare professionals. The acceptance of these tools is a strongly influencing factor for their effectiveness, which also needs to be evaluated for AI-based conversational agents (CAs) (e. g. ChatGPT, Siri) to assess the risks and potential for integration into therapeutic practice. This study investigates the perspectives of both the general population and healthcare professionals with the following questions: 1. How frequently are CAs used for mental health? 2. How high is the acceptance of CAs in the field of mental health? 3. To what extent is the use of CAs in counselling, diagnosis, and treatment acceptable? To address these questions, two quantitative online surveys were conducted with 444 participants from the general population and 351 healthcare professionals. Statistical analyses show that 27 % of the surveyed population already confide their concerns to CAs. Not only experience with this technology but also experience with telemedicine shows a higher acceptance among both groups for using CAs for mental health. Additionally, participants from the general population were more likely to support CAs as companions controlled by healthcare professionals rather than as additional experts for the professionals. CAs have the potential to support mental health, particularly in counselling. Future research should examine the influence of different communication media and further possibilities of augmented intelligence. With the right balance between technology and human care, integration into patient-professional interaction can be achieved.

Authors:Bhaskar Mitra
Title: Emancipatory Information Retrieval
Abstract:
Our world today is facing a confluence of several mutually reinforcing crises each of which intersects with concerns of social justice and emancipation. This paper is a provocation for the role of computer-mediated information access in our emancipatory struggles. We define emancipatory information retrieval as the study and development of information access methods that challenge various forms of human oppression, and situates its activities within broader collective emancipatory praxis. The term "emancipatory" here signifies the moral concerns of universal humanization of all peoples and the elimination of oppression to create the conditions under which we can collectively flourish. To develop an emancipatory research agenda for information retrieval (IR), in this paper we speculate about the practices that the community can adopt, enumerate some of the projects that the field should undertake, and discuss provocations to spark new ideas and directions for research. We challenge the field of IR research to embrace humanistic values and commit to universal emancipation and social justice. We also invite scholars from fields such as human-computer interaction, information sciences, media studies, design, science and technology studies, social and political sciences, philosophy, law, environmental sciences, public health, educational sciences, as well as legal and policy experts, civil rights advocates, social justice activists and movement organizers, and artists to join us in realizing this transformation. In this process, we must both imagine post-oppressive worlds, and reimagine the role of IR in that world and in the journey that leads us there.

Authors:Katja Rogers
Title: The Shiny Scary Future of Automated Research Synthesis in HCI
Abstract:
Automation and semi-automation through computational tools like LLMs are also making their way to deployment in research synthesis and secondary research, such as systematic reviews. In some steps of research synthesis, this has the opportunity to provide substantial benefits by saving time that previously was spent on repetitive tasks. The screening stages in particular may benefit from carefully vetted computational support. However, this position paper argues for additional caution when bringing in such tools to the analysis and synthesis phases, where human judgement and expertise should be paramount throughout the process.

Authors:Bojun Zhang
Title: A Low-Cost, High-Precision Human-Machine Interaction Solution Based on Multi-Coil Wireless Charging Pads
Abstract:
Wireless charging pads are common, yet their functionality is mainly restricted to charging. Existing gesture recognition techniques, such as those based on machine vision and WiFi, have drawbacks like high costs and poor precision. This paper presents a new human machine interaction solution using multicoil wireless charging pads. The proposed approach leverages the pads existing modules without additional wearable sensors. It determines gestures by monitoring current and power changes in different coils. The data processing includes noise removal, sorting, highpass filtering, and slicing. A Bayesian network and particle filtering are employed for motion tracking. Through experiments, this solution proves to have wide applications, high recognition accuracy, and low cost. It can effectively identify diverse gestures, increasing the value of wireless charging pads. It outperforms traditional methods, with a 0.73 improvement in recognition accuracy and better environmental adaptability.

Authors:Tadahiro Taniguchi
Title: On Parallelism in Music and Language: A Perspective from Symbol Emergence Systems based on Probabilistic Generative Models
Abstract:
Music and language are structurally similar. Such structural similarity is often explained by generative processes. This paper describes the recent development of probabilistic generative models (PGMs) for language learning and symbol emergence in robotics. Symbol emergence in robotics aims to develop a robot that can adapt to real-world environments and human linguistic communications and acquire language from sensorimotor information alone (i.e., in an unsupervised manner). This is regarded as a constructive approach to symbol emergence systems. To this end, a series of PGMs have been developed, including those for simultaneous phoneme and word discovery, lexical acquisition, object and spatial concept formation, and the emergence of a symbol system. By extending the models, a symbol emergence system comprising a multi-agent system in which a symbol system emerges is revealed to be modeled using PGMs. In this model, symbol emergence can be regarded as collective predictive coding. This paper expands on this idea by combining the theory that ''emotion is based on the predictive coding of interoceptive signals'' and ''symbol emergence systems,'' and describes the possible hypothesis of the emergence of meaning in music.

Authors:Yunge Wen
Title: "See What I Imagine, Imagine What I See": Human-AI Co-Creation System for 360$^\circ$ Panoramic Video Generation in VR
Abstract:
The emerging field of panoramic video generation from text and image prompts unlocks new creative possibilities in virtual reality (VR), addressing the limitations of current immersive experiences, which are constrained by pre-designed environments that restrict user creativity. To advance this frontier, we present Imagine360, a proof-of-concept prototype that integrates co-creation principles with AI agents. This system enables refined speech-based text prompts, egocentric perspective adjustments, and real-time customization of virtual surroundings based on user perception and intent. An eight-participant pilot study comparing non-AI and linear AI-driven workflows demonstrates that Imagine360's co-creative approach effectively integrates temporal and spatial creative controls. This introduces a transformative VR paradigm, allowing users to seamlessly transition between 'seeing' and 'imagining,' thereby shaping virtual reality through the creations of their minds.

Authors:Aju Ani Justus
Title: Music Generation using Human-In-The-Loop Reinforcement Learning
Abstract:
This paper presents an approach that combines Human-In-The-Loop Reinforcement Learning (HITL RL) with principles derived from music theory to facilitate real-time generation of musical compositions. HITL RL, previously employed in diverse applications such as modelling humanoid robot mechanics and enhancing language models, harnesses human feedback to refine the training process. In this study, we develop a HILT RL framework that can leverage the constraints and principles in music theory. In particular, we propose an episodic tabular Q-learning algorithm with an epsilon-greedy exploration policy. The system generates musical tracks (compositions), continuously enhancing its quality through iterative human-in-the-loop feedback. The reward function for this process is the subjective musical taste of the user.

Authors:Noel P. Caliston
Title: Evaluating the Effectiveness of Mobile Game-Based Learning for Raising Adolescent Health Awareness: The Case of "AHlam Na 2.0"
Abstract:
This study addresses a critical gap in adolescent health education strategies in the Philippines, as highlighted by the Young Adult Fertility and Sexuality (YAFS) survey series, which overlooks the use of games as a medium for disseminating health information. To bridge this gap, the research introduces AHlam Na, a game-based mobile application designed to enhance adolescents' awareness and understanding of key health-related topics. Using a single-group pretest-posttest design, the study involved forty junior high school students from a randomly selected school in the Philippines. They interacted with the application that embedded adolescent health topics into its gameplay. Data collected through pretest and post-test surveys revealed a significant improvement in the student's knowledge and attitudes toward adolescent health after engaging in the game, indicating that game-based learning effectively enhances their learning experience. The positive reception and knowledge gains suggest that AHlam Na is a promising tool for promoting adolescent health awareness. Based on these findings, it is recommended that the application be integrated into the adolescent health curriculum in schools across the Philippines. Future studies should examine the long-term impact of game-based learning on health behaviors and expand the sample size to include more diverse demographic groups. This research contributes to the growing body of literature on game-based learning in health education by demonstrating the potential of digital games to address the limitations of traditional teaching methods. The successful implementation of AHlam Na underscores the importance of exploring gamified learning tools to deliver critical health information to young people effectively.

Authors:Eric Gilbert
Title: Capital and CHI: Technological Capture and How It Structures CHI Research
Abstract:
This paper advances a theoretical argument about the role capital plays in structuring CHI research. We introduce the concept of technological capture to theorize the mechanism by which this happens. Using this concept, we decompose the effect on CHI into four broad forms: technological capture creates market-creating, market-expanding, market-aligned, and externality-reducing CHI research. We place different CHI subcommunities into these forms -- arguing that many of their values are inherited from capital underlying the field. Rather than a disciplinary- or conference-oriented conceptualization of the field, this work theorizes CHI as tightly-coupled with capital via technological capture. The paper concludes by discussing some implications for CHI.

Authors:Aaditya Shankar Majumder
Title: The Influence of UX Design on User Retention and Conversion Rates in Mobile Apps
Abstract:
This paper explores the profound impact of User Experience (UX) design on user retention and conversion rates in mobile applications. As the mobile app market becomes increasingly competitive, understanding how UX design can enhance user satisfaction, engagement, and loyalty is crucial for developers and businesses. Through a comprehensive review of existing literature and statistical insights, this study identifies key UX design principles that contribute to improved user retention and conversion rates. Intuitive navigation, appealing visuals, performance optimization, and integration of user feedback emerge as essential components of effective UX design that drive app success. Applications that prioritize these elements foster a positive user experience, leading to higher engagement and greater retention. Additionally, UX design strategies, such as personalization and customization, have been shown to significantly increase conversion rates, demonstrating the critical the role that tailored experiences play in app success. By analyzing these principles and their impact, this paper provides valuable insights for developers aiming to enhance user satisfaction, optimize app performance, and ultimately improve business outcomes.

Authors:Vincenzo Calderonio
Title: A Basis for Human Responsibility in Artificial Intelligence Computation
Abstract:
Recent advancements in artificial intelligence have reopened the question about the boundaries of AI autonomy, particularly in discussions around artificial general intelligence (AGI) and its potential to act independently across varied purposes. This paper explores these boundaries through the analysis of the Alignment Research Center experiment on GPT-4 and introduces the Start Button Problem, a thought experiment that examines the origins and limits of AI autonomy. By examining the thought experiment and its counterarguments will be enlightened how in the need for human activation and purpose definition lies the AI's inherent dependency on human-initiated actions, challenging the assumption of AI as an agent. Finally, the paper addresses the implications of this dependency on human responsibility, questioning the measure of the extension of human responsibility when using AI systems.

Authors:Giorgio Robino
Title: Conversation Routines: A Prompt Engineering Framework for Task-Oriented Dialog Systems
Abstract:
This study introduces Conversation Routines (CR), a structured prompt engineering framework for developing task-oriented dialog systems using Large Language Models (LLMs). While LLMs demonstrate remarkable natural language understanding capabilities, engineering them to reliably execute complex business workflows remains challenging. The proposed CR framework enables the development of Conversation Agentic Systems (CAS) through natural language specifications, embedding task-oriented logic within LLM prompts. This approach provides a systematic methodology for designing and implementing complex conversational workflows while maintaining behavioral consistency. We demonstrate the framework's effectiveness through two proof-of-concept implementations: a Train Ticket Booking System and an Interactive Troubleshooting Copilot. These case studies validate CR's capability to encode sophisticated behavioral patterns and decision logic while preserving natural conversational flexibility. Results show that CR enables domain experts to design conversational workflows in natural language while leveraging custom functions (tools) developed by software engineers, creating an efficient division of responsibilities where developers focus on core API implementation and domain experts handle conversation design. While the framework shows promise in accessibility and adaptability, we identify key challenges including computational overhead, non-deterministic behavior, and domain-specific logic optimization. Future research directions include CR evaluation methods based on prompt engineering frameworks driven by goal-oriented grading criteria, improving scalability for complex multi-agent interactions, and enhancing system robustness to address the identified limitations across diverse business applications.

Authors:Sergei Mironov
Title: Litrepl: Literate Paper Processor Promoting Transparency More Than Reproducibility
Abstract:
Litrepl is a lightweight text processing tool designed to recognize and evaluate code sections within Markdown or Latex documents. This functionality is useful for both batch document section evaluation and interactive coding within a text editor, provided a straightforward integration is established. Inspired by Project Jupyter, Litrepl aims to facilitate the creation of research documents. In the light of recent developments in software deployment, however, we have shifted our focus from informal reproducibility to enhancing transparency in communication with programming language interpreters, by either eliminating or clearly exposing mutable states within the communication process.

Authors:BG Tong
Title: Perception-Guided EEG Analysis: A Deep Learning Approach Inspired by Level of Detail (LOD) Theory
Abstract:
Objective: This study explores a novel deep learning approach for EEG analysis and perceptual state guidance, inspired by Level of Detail (LOD) theory. The goal is to improve perceptual state identification accuracy and advance personalized psychological therapy. Methods: Portable EEG devices and music rhythm signals were used for data collection. LOD theory was applied to dynamically adjust EEG signal processing, extracting core perceptual features. A Unity-based software system integrated EEG data with audio materials. The deep learning model combined a CNN for feature extraction and classification, and a DQN for reinforcement learning to optimize rhythm adjustments. Results: The CNN achieved 94.05% accuracy in perceptual state classification. The DQN guided subjects to target states with a 92.45% success rate, averaging 13.2 rhythm cycles. However, only 50% of users reported psychological alignment with the target state, indicating room for improvement. Discussion: The results validate the potential of LOD-based EEG biofeedback. Limitations include dataset source, label subjectivity, and reward function optimization. Future work will expand to diverse subjects, incorporate varied musical elements, and refine reward functions for better generalization and personalization.

Authors:Jiawei Zhang
Title: Nirvana AI Governance: How AI Policymaking Is Committing Three Old Fallacies
Abstract:
This research applies Harold Demsetz's concept of the nirvana approach to the realm of AI governance and debunks three common fallacies in various AI policy proposals--"the grass is always greener on the other side," "free lunch," and "the people could be different." Through this, I expose fundamental flaws in the current AI regulatory proposal. First, some commentators intuitively believe that people are more reliable than machines and that government works better in risk control than companies' self-regulation, but they do not fully compare the differences between the status quo and the proposed replacements. Second, when proposing some regulatory tools, some policymakers and researchers do not realize and even gloss over the fact that harms and costs are also inherent in their proposals. Third, some policy proposals are initiated based on a false comparison between the AI-driven world, where AI does lead to some risks, and an entirely idealized world, where no risk exists at all. However, the appropriate approach is to compare the world where AI causes risks to the real world where risks are everywhere, but people can live well with these risks. The prevalence of these fallacies in AI governance underscores a broader issue: the tendency to idealize potential solutions without fully considering their real-world implications. This idealization can lead to regulatory proposals that are not only impractical but potentially harmful to innovation and societal progress.

Authors:Tom Holberton
Title: Creative Loss: Ambiguity, Uncertainty and Indeterminacy
Abstract:
This article evaluates how creative uses of machine learning can address three adjacent terms: ambiguity, uncertainty and indeterminacy. Through the progression of these concepts it reflects on increasing ambitions for machine learning as a creative partner, illustrated with research from Unit 21 at the Bartlett School of Architecture, UCL. Through indeterminacy are potential future approaches to machine learning and design.

Authors:Aniruddha Srinivas Joshi
Title: Reinforcement Learning-Enhanced Procedural Generation for Dynamic Narrative-Driven AR Experiences
Abstract:
Procedural Content Generation (PCG) is widely used to create scalable and diverse environments in games. However, existing methods, such as the Wave Function Collapse (WFC) algorithm, are often limited to static scenarios and lack the adaptability required for dynamic, narrative-driven applications, particularly in augmented reality (AR) games. This paper presents a reinforcement learning-enhanced WFC framework designed for mobile AR environments. By integrating environment-specific rules and dynamic tile weight adjustments informed by reinforcement learning (RL), the proposed method generates maps that are both contextually coherent and responsive to gameplay needs. Comparative evaluations and user studies demonstrate that the framework achieves superior map quality and delivers immersive experiences, making it well-suited for narrative-driven AR games. Additionally, the method holds promise for broader applications in education, simulation training, and immersive extended reality (XR) experiences, where dynamic and adaptive environments are critical.

Authors:Hendrik Heuer
Title: The Phase Model of Misinformation Interventions
Abstract:
Misinformation is a challenging problem. This paper provides the first systematic interdisciplinary investigation of technical and non-technical interventions against misinformation. It combines interviews and a survey to understand which interventions are accepted across academic disciplines and approved by misinformation experts. Four interventions are supported by more than two in three misinformation experts: promoting media literacy, education in schools and universities, finding information about claims, and finding sources for claims. The most controversial intervention is deleting misinformation. We discuss the potentials and risks of all interventions. Education-based interventions are perceived as the most helpful by misinformation experts. Interventions focused on providing evidence are also widely perceived as helpful. We discuss them as scalable and always available interventions that empower users to independently identify misinformation. We also introduce the Phase Model of Misinformation Interventions that helps practitioners make informed decisions about which interventions to focus on and how to best combine interventions.

Authors:Rozin Hasin
Title: Unveiling Voices: A Co-Hashtag Analysis of TikTok Discourse on the 2023 Israel-Palestine Crisis
Abstract:
TikTok has gradually become one of the most pervasive social media platforms in our daily lives. While much can be said about the merits of platforms such as TikTok, there is a different kind of attention paid towards the political affect of social media today compared to its impact on other aspects of modern networked reality. I explored how users on TikTok discussed the crisis in Palestine that worsened in 2023. Using network analysis, I situate keywords representing the conflict and categorize them thematically based on a coding schema derived from politically and ideologically differentiable stances. I conclude that activism and propaganda are contending amongst themselves in the thriving space afforded by TikTok today.

Authors:Nicole C. Wang
Title: Scaffolding Creativity: Integrating Generative AI Tools and Real-world Experiences in Business Education
Abstract:
This exploratory study investigates the intersection of Generative AI tools and experiential learning in business education. Through a case study of an innovative undergraduate course, we examine how students interact with and adapt to various AI modalities-from text-based tools to image generation-alongside real-world experiences. Our findings reveal how this integrated approach enables novice users to overcome creative barriers, accelerates skill acquisition, and creates a dynamic interplay between AI-generated insights and real-world validation. We identify critical interaction challenges, including prompt engineering patterns and the need for more intuitive AI interfaces in educational contexts. These insights inform the design of future AI tools for creative learning and contribute to broader HCI discussions about human-AI collaboration in educational settings.

Authors:Munmun De Choudhury
Title: Employing Social Media to Improve Mental Health Outcomes
Abstract:
As social media platforms are increasingly adopted, the data the data people leave behind is shining new light into our understanding of phenomena, ranging from socio-economic-political events to the spread of infectious diseases. This chapter presents research conducted in the past decade that has harnessed social media data in the service of mental health and well-being. The discussion is organized along three thrusts: a first that highlights how social media data has been utilized to detect and predict risk to varied mental health concerns; a second thrust that focuses on translation paradigms that can enable to use of such social media based algorithms in the real-world; and the final thrust that brings to the fore the ethical considerations and challenges that engender the conduct of this research as well as its translation. The chapter concludes by noting open questions and problems in this emergent area, emphasizing the need for deeper interdisciplinary collaborations and participatory research design, incorporating and centering on human agency, and attention to societal inequities and harms that may result from or be exacerbated in this line of computational social science research.

Authors:Yuhao Kang
Title: Human-centered Geospatial Data Science
Abstract:
This entry provides an overview of Human-centered Geospatial Data Science, highlighting the gaps it aims to bridge, its significance, and its key topics and research. Geospatial Data Science, which derives geographic knowledge and insights from large volumes of geospatial big data using advanced Geospatial Artificial Intelligence (GeoAI), has been widely used to tackle a wide range of geographic problems. However, it often overlooks the subjective human experiences that fundamentally influence human-environment interactions, and few strategies have been developed to ensure that these technologies follow ethical guidelines and prioritize human values. Human-centered Geospatial Data Science advocates for two primary focuses. First, it advances our understanding of human-environment interactions by leveraging Geospatial Data Science to measure and analyze human subjective experiences at place including emotion, perception, cognition, and creativity. Second, it advocates for the development of responsible and ethical Geospatial Data Science methods that protect geoprivacy, enhance fairness and reduce bias, and improve the explainability and transparency of geospatial technologies. With these two missions, Human-centered Geospatial Data Sciences brings a fresh perspective to develop and utilize geospatial technologies that positively impact society and benefit human well-being and the humanities.

Authors:Matthew Brehmer
Title: Video-Conferencing Beyond Screen-Sharing and Thumbnail Webcam Videos: Gesture-Aware Augmented Reality Video for Data-Rich Remote Presentations
Abstract:
Synchronous data-rich conversations are commonplace within enterprise organizations, taking place at varying degrees of formality between stakeholders at different levels of data literacy. In these conversations, representations of data are used to analyze past decisions, inform future course of action, as well as persuade customers, investors, and executives. However, it is difficult to conduct these conversations between remote stakeholders due to poor support for presenting data when video-conferencing, resulting in disappointing audience experiences. In this position statement, I reflect on our recent work incorporating multimodal interaction and augmented reality video, suggesting that video-conferencing does not need to be limited to screen-sharing and relegating a speaker's video to a separate thumbnail view. I also comment on future research directions and collaboration opportunities.

Authors:Murat Sariyar
Title: User-Centered-Design as an Empty Signifier in the Context of Developing Digital Applications
Abstract:
To reduce cycles of rejection and redesign -- especially in the absence of clear acceptance criteria and the diversity of possible development paths -- User-Centered Design (UCD) has become a central methodology in computer science, emphasizing the integration of user perspectives throughout the entire system lifecycle. Despite its widespread adoption, however, UCD remains conceptually ambiguous and theoretically underdeveloped. This paper addresses that gap by drawing on the theories of Ernesto Laclau and Jacques Lacan to analyze UCD as a potential empty signifier: a term that gains rhetorical power precisely through its semantic openness. We argue that this ambiguity enables UCD to unify diverse and sometimes conflicting expectations under a shared label, which both empowers participatory design practices and conceals underlying tensions. Acknowledging UCD as an empty signifier allows for a more critical engagement with its practical and symbolic functions, revealing how it can foster inclusivity, empathy, and user empowerment, but also how it risks ideological capture and conceptual dilution. This theoretical reframing opens new pathways for reflection and renewal within sociotechnical system design.

Authors:Stefan Pasch
Title: LLM Content Moderation and User Satisfaction: Evidence from Response Refusals in Chatbot Arena
Abstract:
LLM safety and ethical alignment are widely discussed, but the impact of content moderation on user satisfaction remains underexplored. In particular, little is known about how users respond when models refuse to answer a prompt-one of the primary mechanisms used to enforce ethical boundaries in LLMs. We address this gap by analyzing nearly 50,000 model comparisons from Chatbot Arena, a platform where users indicate their preferred LLM response in pairwise matchups, providing a large-scale setting for studying real-world user preferences. Using a novel RoBERTa-based refusal classifier fine-tuned on a hand-labeled dataset, we distinguish between refusals due to ethical concerns and technical limitations. Our results reveal a substantial refusal penalty: ethical refusals yield significantly lower win rates than both technical refusals and standard responses, indicating that users are especially dissatisfied when models decline a task for ethical reasons. However, this penalty is not uniform. Refusals receive more favorable evaluations when the underlying prompt is highly sensitive (e.g., involving illegal content), and when the refusal is phrased in a detailed and contextually aligned manner. These findings underscore a core tension in LLM design: safety-aligned behaviors may conflict with user expectations, calling for more adaptive moderation strategies that account for context and presentation.

Authors:Masaaki Fukumoto
Title: Whisphone: Whispering Input Earbuds
Abstract:
Whisphone is a novel earbud device designed for speech input via whispering. Utilizing canal-type earbuds with a unique microphone placement at the tip of the earplug, it effectively captures whispered voices radiated in the ear canal through bone conduction. This design can boost whispered voice volume with ear canal occlusion effect while simultaneously blocking external noise by sealing the ear hole. By incorporating Active Noise Canceling (ANC), Whisphone can effectively detect subtle whispers, even in noisy environments of up to 80dB(A). Its compact and comfortable design ensures discreet wearability, allowing users to interact with AI assistants hands-free without disturbing others in various daily situations such as offices, homes, or urban public spaces.

Authors:Eugene Yu Ji
Title: A Metasemantic-Metapragmatic Framework for Taxonomizing Multimodal Communicative Alignment
Abstract:
Drawing on contemporary pragmatist philosophy and linguistic theories on cognition, meaning, and communication, this paper presents a dynamic, metasemantic-metapragmatic taxonomy for grounding and conceptualizing human-like multimodal communicative alignment. The framework is rooted in contemporary developments of the three basic communicative capacities initially identified by American logician and pragmatist philosopher Charles Sanders Peirce: iconic (sensory and perceptual qualities), indexical (contextual and sociocultural associations), and rule-like (symbolic and intuitive reasoning). Expanding on these developments, I introduce the concept of indexical contextualization and propose the principle of "contextualization directionality" for characterizing the crucial metapragmatic capacity for maintaining, navigating, or transitioning between semantic and pragmatic modes of multimodal communication. I contend that current cognitive-social computational and engineering methodologies disproportionately emphasize the semantic/metasemantic domain, overlooking the pivotal role of metapragmatic indexicality in traversing the semantic-pragmatic spectrum of communication. The framework's broader implications for intentionality, identity, affect, and ethics in within-modal and cross-modal human-machine alignment are also discussed.

Authors:Shiran Dudy
Title: Search Plurality
Abstract:
In light of Phillips' contention regarding the impracticality of Search Neutrality, asserting that non-epistemic factors presently dictate result prioritization, our objective in this study is to confront this constraint by questioning prevailing design practices in search engines. We posit that the concept of prioritization warrants scrutiny, along with the consistent hierarchical ordering that underlies this lack of neutrality. We introduce the term Search Plurality to encapsulate the idea of emphasizing the various means a query can be approached. This is demonstrated in a design that prioritizes the display of categories over specific search items, helping users grasp the breadth of their search. Whether a query allows for multiple interpretations or invites diverse opinions, the presentation of categories highlights the significance of organizing data based on relevance, importance, and relative significance, akin to traditional methods. However, unlike previous approaches, this method enriches our comprehension of the overall information landscape, countering the potential bias introduced by ranked lists.